You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Jacky Lau <28...@qq.com.INVALID> on 2022/02/10 13:01:28 UTC

RE: Re: [DISCUSS] FLIP-213: TaskManager's Flame Graphs

Hi Alexander:
   Sorry for late response for Chinese Spring Festival.
   The bottleneck is rendering on the browser side.
   For 1) we support user define script capability like yarn. And the flame graph script just encapsulate async profiler. So we should make it secure.
   For 2) yeah, we use different async profiler package for  different  architectures.
   For 3) may not

On 2022/01/26 15:24:51 Alexander Fedulov wrote:
> Hi Jacky,
> 
> Could you please clarify what kind of *problems* you experience with the
> large parallelism? You referred to D3, is it something related to rendering
> on the browser side or is it about the samples collection process? Were you
> able to identify the bottleneck?
> 
> Fundamentally I have some concerns regarding the proposed approach:
> 1. Calling shell scripts triggered via the web UI is a security concern and
> it needs to be evaluated carefully if it could introduce any unexpected
> attack vectors (depending on the implementation, passed parameters etc.)
> 2. My understanding is that the async-profiler implementation is
> system-dependent. How do you propose to handle multiple architectures?
> Would you like to ship each available implementation within Flink? [1]
> 3. Do you plan to make use of full async-profiler features including native
> calls sampling with perf_events? If so, the issue I see is that some
> environments restrict ptrace calls by default [2]
> 
> [1] https://github.com/jvm-profiling-tools/async-profiler#download
> [2]
> https://kubernetes.io/docs/concepts/policy/pod-security-policy/#host-namespaces
> 
> 
> Best,
> Alexander Fedulov
> 
> On Wed, Jan 26, 2022 at 1:59 PM 李森 <li...@icloud.com.invalid> wrote:
> 
> > This is an expected feature, as we also experienced browser crashes on
> > existing operator-level flame graphs
> >
> > Best,
> > Echo Lee
> >
> > > 在 2022年1月24日,下午6:16,David Morávek <da...@gmail.com> 写道:
> > >
> > > Hi Jacky,
> > >
> > > The link seems to be broken, here is the correct one [1].
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-213%3A+TaskManager%27s+Flame+Graphs
> > >
> > > Best,
> > > D.
> > >
> > >> On Mon, Jan 24, 2022 at 9:48 AM Jacky Lau <28...@qq.com.invalid>
> > wrote:
> > >>
> > >> Hi All,
> > >> &nbsp; &nbsp; I would like to start the discussion on FLIP-213 <
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-213%3A+TaskManager%27s+Flame+Graphs&gt
> > >> ;
> > >> &nbsp;which aims to provide taskmanager level(process level) flame graph
> > >> by async profiler, which is most popular tool in java performance. and
> > the
> > >> arthas and intellij both use it.&nbsp;
> > >> And we support it in our ant group company.
> > >> &nbsp; &nbsp;And&nbsp;Flink supports FLIP-165: Operator's Flame Graphs
> > >> now. and it draw flame graph by the&nbsp;front-end
> > >> libraries&nbsp;d3-flame-graph, which has some problem in&nbsp; jobs
> > >> of&nbsp;large of parallelism.
> > >> &nbsp; &nbsp;Please be aware that the FLIP wiki area is not fully done
> > >> since i don't konw whether it will accept by flink&nbsp;community.&nbsp;
> > >> &nbsp; &nbsp;Feel free to add your thoughts to make this feature
> > better! i
> > >> am looking forward&nbsp; to all your response. Thanks too much!
> > >>
> > >>
> > >>
> > >>
> > >> Best Jacky Lau
> >
>