You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Yadong Xie <vt...@gmail.com> on 2020/10/27 07:59:10 UTC

Re: [VOTE] FLIP-104: Add More Metrics to Jobmanager

Hi all

There have been lots of discussions since the vote started, and FLINK-9741
has been fixed

Matthias and I had updated the FLIP-104
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager>
following
the suggestions and discussions

I want to cancel the vote here and start a new one, thanks

lining jing <ji...@gmail.com> 于2020年2月26日周三 下午7:33写道:

> Hi till,
> thanks for your reply.
>
>
> > Concerning FLINK-9741, I'm not sure whether we need to fix this issue
> > before starting this effort. The JobManager's are now running as part of
> > the cluster entrypoint process for which we should actually report the
> > metrics (memory usage).
>
>
> I have confirmed it with Zhu Zhu offline, as now dispatcher still with
> jobmanager, so it should not affect the accuracy of the metric.
>
> Till Rohrmann <tr...@apache.org> 于2020年2月26日周三 上午12:04写道:
>
> > Hi Yadong,
> >
> > thanks for creating this FLIP. I like the idea of exposing more
> > cluster information to the user.
> >
> > I share Xintong's concerns that we are about to rework the cluster
> > entrypoint's memory management. It might make sense to wait for these
> > changes before starting this effort. Otherwise, we might risk to do some
> > double work.
> >
> > Concerning FLINK-9741, I'm not sure whether we need to fix this issue
> > before starting this effort. The JobManager's are now running as part of
> > the cluster entrypoint process for which we should actually report the
> > metrics (memory usage).
> >
> > Cheers,
> > Till
> >
> > On Tue, Feb 25, 2020 at 10:52 AM Jark Wu <im...@gmail.com> wrote:
> >
> > > Thanks Xintong for the explanation.
> > >
> > > The FLIP looks good to me now. +1 from my side.
> > >
> > > Best,
> > > Jark
> > >
> > > On Tue, 25 Feb 2020 at 15:46, Xintong Song <to...@gmail.com>
> > wrote:
> > >
> > > > @Jark
> > > >
> > > > First, let me try to clarify that, while this FLIP is about adding JM
> > > > metrics, the discussion of having different colors distinguishing the
> > > > memory usage applies for both JM and TM.
> > > >
> > > > IMO, I don't think there's a good way to define how should memory
> > > > utilization be mapped to colors in general.
> > > >
> > > >    - Direct memory
> > > >       - JM: ATM, we do not specify -XX:MaxDirectMemorySize.
> > > >       - TM: Direct memory consists of network memory and
> framework/task
> > > >       off-heap memory, the former should always be 100% while the
> > latter
> > > may not.
> > > >       Therefore, the utilization of direct memory really depends on
> the
> > > >       configured size of network memory and framework/task off-heap
> > > memory.
> > > >    - Heap memory: We might observe that the memory usage keeps
> growing
> > > >    until GC is triggered, thus eventually the utilization might
> > > fluctuates at
> > > >    somewhere close to 100%.
> > > >
> > > > In general, a low memory utilization probably suggests that the
> memory
> > > > size is configured too large, but a high memory utilization does not
> > > > necessarily suggest the configured memory size need to be increased,
> > > thus,
> > > > not sure about rendering it in red.
> > > >
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Tue, Feb 25, 2020 at 3:13 PM Yadong Xie <vt...@gmail.com>
> > wrote:
> > > >
> > > >> Hi all
> > > >> we have updated the POC web, and added unit to GC metrics
> > > >> check it here http://101.132.122.69:8081/web/#/job-manager/metrics
> > > >> thanks for all the response
> > > >>
> > > >> Jark Wu <im...@gmail.com> 于2020年2月24日周一 下午8:48写道:
> > > >>
> > > >>> Hi Yadong,
> > > >>>
> > > >>> > what is the boundary between red and green?
> > > >>> Yes. I think that's the point we need to discuss. My gut feeling is
> > > >>> "<60%"
> > > >>> => green, "60%~80%" => yellow, ">80%" => red.
> > > >>> But I guess directed memory is always 100%, so it is not suitable
> for
> > > >>> that?
> > > >>> Maybe @Xintong Song <to...@gmail.com> has a better
> > understanding
> > > >>> on
> > > >>> the memory threshold.
> > > >>>
> > > >>> Best,
> > > >>> Jark
> > > >>>
> > > >>> On Mon, 24 Feb 2020 at 15:41, Yadong Xie <vt...@gmail.com>
> > wrote:
> > > >>>
> > > >>> > Hi Jark
> > > >>> > thanks for your suggestion
> > > >>> >
> > > >>> > > I think we can use different color to distinguish the memory
> > usage
> > > >>> (from
> > > >>> > green to red?).
> > > >>> >
> > > >>> > It is a good idea, but what is the boundary between red and
> green?
> > > >>> giving a
> > > >>> > magic number boundary may mislead the users. any suggestions?
> > > >>> >
> > > >>> > > Besides, I think we should add an unit on the "Garbage
> > Collection"
> > > ->
> > > >>> > "Time", it's hard to know what the value mean. Would be better to
> > > >>> display
> > > >>> > the value like "10ms", "5ns".
> > > >>> >
> > > >>> > I will add the unit later, thanks for your advice.
> > > >>> >
> > > >>> >
> > > >>> > Xintong Song <to...@gmail.com> 于2020年2月21日周五 下午6:02写道:
> > > >>> >
> > > >>> > > FYI, there's an effort planned for 1.11 to improve the memory
> > > >>> > configuration
> > > >>> > > of the Flink master process, similar to FLIP-49 but definitely
> > less
> > > >>> > > complexity.
> > > >>> > >
> > > >>> > > I would not consider the memory configuration improvement as a
> > > >>> blocker
> > > >>> > for
> > > >>> > > this effort. As far as I can see, there's nothing in conflict.
> > Just
> > > >>> after
> > > >>> > > the memory configuration improvement, we might be able to
> present
> > > >>> more
> > > >>> > > information on the JM metrics page, which are tightly
> > corresponding
> > > >>> to
> > > >>> > the
> > > >>> > > configuration options, like what we planned for the TM metrics
> > page
> > > >>> in
> > > >>> > > FLIP-102. Therefore, it might make sense to proceed this FLIP
> > > >>> afterwards.
> > > >>> > >
> > > >>> > > I'm neutral on this, and would leave the call to Yandong and
> > > Lining.
> > > >>> > >
> > > >>> > > Thank you~
> > > >>> > >
> > > >>> > > Xintong Song
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> > > On Fri, Feb 21, 2020 at 2:47 PM Jark Wu <im...@gmail.com>
> > wrote:
> > > >>> > >
> > > >>> > > > Thanks Yadong,
> > > >>> > > >
> > > >>> > > > I think we can use different color to distinguish the memory
> > > usage
> > > >>> > (from
> > > >>> > > > green to red?).
> > > >>> > > > Besides, I think we should add an unit on the "Garbage
> > > Collection"
> > > >>> ->
> > > >>> > > > "Time", it's hard to know what the value mean.
> > > >>> > > > Would be better to display the value like "10ms", "5ns".
> > > >>> > > >
> > > >>> > > > Best,
> > > >>> > > > Jark
> > > >>> > > >
> > > >>> > > > On Thu, 20 Feb 2020 at 17:58, Yadong Xie <
> vthinkxie@gmail.com>
> > > >>> wrote:
> > > >>> > > >
> > > >>> > > > > Hi all
> > > >>> > > > >
> > > >>> > > > > I want to start the vote for FLIP-104, which proposes to
> add
> > > more
> > > >>> > > metrics
> > > >>> > > > > to job manager.
> > > >>> > > > >
> > > >>> > > > > To help everyone better understand the proposal, we spent
> > some
> > > >>> > efforts
> > > >>> > > on
> > > >>> > > > > making an online POC
> > > >>> > > > >
> > > >>> > > > > previous web:
> > http://101.132.122.69:8081/#/job-manager/config
> > > >>> > > > > POC web:
> > http://101.132.122.69:8081/web/#/job-manager/metrics
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > The vote will last for at least 72 hours, following the
> > > consensus
> > > >>> > > voting
> > > >>> > > > > process.
> > > >>> > > > >
> > > >>> > > > > FLIP wiki:
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager
> > > >>> > > > >
> > > >>> > > > > Discussion thread:
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html
> > > >>> > > > >
> > > >>> > > > > Thanks,
> > > >>> > > > >
> > > >>> > > > > Yadong
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > >
> >
>