You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Darshan Singh <da...@gmail.com> on 2018/04/13 12:37:03 UTC

Any metrics to get the shuffled and intermediate data in flink

Hi

Is there any useful metrics in flink which tells me that a given operator
read say 1 GB of data and shuffled(or anything else) and written(in case it
was written to temp or anywhere else) say 1 or 2 GB data.

One of my job is failing with disk space and there are many sort, group and
join is happening and I would want to know which one is generating most of
the temp space.


Thanks

Re: Any metrics to get the shuffled and intermediate data in flink

Posted by Darshan Singh <da...@gmail.com>.
Thanks, I could see those on UI.

Thanks

On Fri, Apr 13, 2018 at 3:12 PM, TechnoMage <ml...@technomage.com> wrote:

> If you look at the web UI for flink it will tell you the bytes received
> and sent for each stage of a job.  I have not seen any similar metric for
> persisted state per stage, which would be nice to have as well.
>
> Michael
>
> > On Apr 13, 2018, at 6:37 AM, Darshan Singh <da...@gmail.com>
> wrote:
> >
> > Hi
> >
> > Is there any useful metrics in flink which tells me that a given
> operator read say 1 GB of data and shuffled(or anything else) and
> written(in case it was written to temp or anywhere else) say 1 or 2 GB data.
> >
> > One of my job is failing with disk space and there are many sort, group
> and join is happening and I would want to know which one is generating most
> of the temp space.
> >
> >
> > Thanks
>
>

Re: Any metrics to get the shuffled and intermediate data in flink

Posted by TechnoMage <ml...@technomage.com>.
If you look at the web UI for flink it will tell you the bytes received and sent for each stage of a job.  I have not seen any similar metric for persisted state per stage, which would be nice to have as well.

Michael

> On Apr 13, 2018, at 6:37 AM, Darshan Singh <da...@gmail.com> wrote:
> 
> Hi
> 
> Is there any useful metrics in flink which tells me that a given operator read say 1 GB of data and shuffled(or anything else) and written(in case it was written to temp or anywhere else) say 1 or 2 GB data.
> 
> One of my job is failing with disk space and there are many sort, group and join is happening and I would want to know which one is generating most of the temp space.
> 
> 
> Thanks