You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Kailash Dayanand <ka...@gmail.com> on 2019/05/15 07:43:39 UTC

[Discuss]: Adding Metrics to StreamingFileSink

Hello,

I was looking to add metrics to the streaming file sink. Currently the only
details available is the generic information about for any operator like
the number of records in, number of records out etc. I was looking at
adding some metrics and contributing back as well as enabling the metrics
which are already getting published by the aws-hadoop. Is that something
which is of value for the community?

Another change I am proposing is to make the constructor of
StreamingFileSink protected instead of private here:
https://tinyurl.com/y5vh4jn6. If we can make this as protected, then it is
possible to extend this class and have custom metrics for anyone to add in
the 'open' method.

Thanks
Kailash

Re: [Discuss]: Adding Metrics to StreamingFileSink

Posted by Till Rohrmann <tr...@apache.org>.
I think you are right that some connectors will still need some special
metrics due to their peculiarities. I guess that this won't be addressed
with the FLIP but it could be a starting point.

Cheers,
Till


On Fri, May 17, 2019 at 8:26 AM Kailash Dayanand <ka...@gmail.com>
wrote:

> Hello Till,
>
> Thanks a lot for the information. It makes a lot of sense to have generic
> sink based metrics. Some things which maybe useful for file system sinks is
> the number of files written as well. I am assuming that will be abstracted
> under number of records for something like a bulk writer ( multiple records
> will constitute a single write to the sink, hence my doubt).
>
> Thanks
> Kailash
>
> On Thu, May 16, 2019, 3:13 AM Till Rohrmann <tr...@apache.org> wrote:
>
> > Hi Kailash,
> >
> > have you seen FLIP-33 [1] and the corresponding ML thread [2]. The scope
> of
> > this improvement proposal is to extend the set of standard metrics a
> > connector should offer. Maybe this can already solve your problem.
> >
> > Concerning your second proposal for the StreamingFileSink, I think this
> > should be doable and help users to build their custom StreamingFileSink.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
> > [2] https://www.mail-archive.com/dev@flink.apache.org/msg25296.html
> >
> > Cheers,
> > Till
> >
> > On Thu, May 16, 2019 at 2:38 AM Thomas Weise <th...@apache.org> wrote:
> >
> > > +1 to both suggestions
> > >
> > > It should be possible to extend the connector (we run into the same
> > issues
> > > with KinesisConsumer).
> > >
> > > Metrics are essential to understand the performance, especially for
> > things
> > > like S3 writes, error, retries, memory buffers and so on.
> > >
> > > Thomas
> > >
> > > On 2019/05/15 07:43:39, Kailash Dayanand <ka...@gmail.com> wrote:
> > > > Hello,
> > > >
> > > > I was looking to add metrics to the streaming file sink. Currently
> the
> > > only
> > > > details available is the generic information about for any operator
> > like
> > > > the number of records in, number of records out etc. I was looking at
> > > > adding some metrics and contributing back as well as enabling the
> > metrics
> > > > which are already getting published by the aws-hadoop. Is that
> > something
> > > > which is of value for the community?
> > > >
> > > > Another change I am proposing is to make the constructor of
> > > > StreamingFileSink protected instead of private here:
> > > > https://tinyurl.com/y5vh4jn6. If we can make this as protected, then
> > it
> > > is
> > > > possible to extend this class and have custom metrics for anyone to
> add
> > > in
> > > > the 'open' method.
> > > >
> > > > Thanks
> > > > Kailash
> > > >
> > >
> >
>

Re: [Discuss]: Adding Metrics to StreamingFileSink

Posted by Kailash Dayanand <ka...@gmail.com>.
Hello Till,

Thanks a lot for the information. It makes a lot of sense to have generic
sink based metrics. Some things which maybe useful for file system sinks is
the number of files written as well. I am assuming that will be abstracted
under number of records for something like a bulk writer ( multiple records
will constitute a single write to the sink, hence my doubt).

Thanks
Kailash

On Thu, May 16, 2019, 3:13 AM Till Rohrmann <tr...@apache.org> wrote:

> Hi Kailash,
>
> have you seen FLIP-33 [1] and the corresponding ML thread [2]. The scope of
> this improvement proposal is to extend the set of standard metrics a
> connector should offer. Maybe this can already solve your problem.
>
> Concerning your second proposal for the StreamingFileSink, I think this
> should be doable and help users to build their custom StreamingFileSink.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
> [2] https://www.mail-archive.com/dev@flink.apache.org/msg25296.html
>
> Cheers,
> Till
>
> On Thu, May 16, 2019 at 2:38 AM Thomas Weise <th...@apache.org> wrote:
>
> > +1 to both suggestions
> >
> > It should be possible to extend the connector (we run into the same
> issues
> > with KinesisConsumer).
> >
> > Metrics are essential to understand the performance, especially for
> things
> > like S3 writes, error, retries, memory buffers and so on.
> >
> > Thomas
> >
> > On 2019/05/15 07:43:39, Kailash Dayanand <ka...@gmail.com> wrote:
> > > Hello,
> > >
> > > I was looking to add metrics to the streaming file sink. Currently the
> > only
> > > details available is the generic information about for any operator
> like
> > > the number of records in, number of records out etc. I was looking at
> > > adding some metrics and contributing back as well as enabling the
> metrics
> > > which are already getting published by the aws-hadoop. Is that
> something
> > > which is of value for the community?
> > >
> > > Another change I am proposing is to make the constructor of
> > > StreamingFileSink protected instead of private here:
> > > https://tinyurl.com/y5vh4jn6. If we can make this as protected, then
> it
> > is
> > > possible to extend this class and have custom metrics for anyone to add
> > in
> > > the 'open' method.
> > >
> > > Thanks
> > > Kailash
> > >
> >
>

Re: [Discuss]: Adding Metrics to StreamingFileSink

Posted by Till Rohrmann <tr...@apache.org>.
Hi Kailash,

have you seen FLIP-33 [1] and the corresponding ML thread [2]. The scope of
this improvement proposal is to extend the set of standard metrics a
connector should offer. Maybe this can already solve your problem.

Concerning your second proposal for the StreamingFileSink, I think this
should be doable and help users to build their custom StreamingFileSink.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
[2] https://www.mail-archive.com/dev@flink.apache.org/msg25296.html

Cheers,
Till

On Thu, May 16, 2019 at 2:38 AM Thomas Weise <th...@apache.org> wrote:

> +1 to both suggestions
>
> It should be possible to extend the connector (we run into the same issues
> with KinesisConsumer).
>
> Metrics are essential to understand the performance, especially for things
> like S3 writes, error, retries, memory buffers and so on.
>
> Thomas
>
> On 2019/05/15 07:43:39, Kailash Dayanand <ka...@gmail.com> wrote:
> > Hello,
> >
> > I was looking to add metrics to the streaming file sink. Currently the
> only
> > details available is the generic information about for any operator like
> > the number of records in, number of records out etc. I was looking at
> > adding some metrics and contributing back as well as enabling the metrics
> > which are already getting published by the aws-hadoop. Is that something
> > which is of value for the community?
> >
> > Another change I am proposing is to make the constructor of
> > StreamingFileSink protected instead of private here:
> > https://tinyurl.com/y5vh4jn6. If we can make this as protected, then it
> is
> > possible to extend this class and have custom metrics for anyone to add
> in
> > the 'open' method.
> >
> > Thanks
> > Kailash
> >
>

Re: [Discuss]: Adding Metrics to StreamingFileSink

Posted by Thomas Weise <th...@apache.org>.
+1 to both suggestions

It should be possible to extend the connector (we run into the same issues with KinesisConsumer).

Metrics are essential to understand the performance, especially for things like S3 writes, error, retries, memory buffers and so on.

Thomas

On 2019/05/15 07:43:39, Kailash Dayanand <ka...@gmail.com> wrote: 
> Hello,
> 
> I was looking to add metrics to the streaming file sink. Currently the only
> details available is the generic information about for any operator like
> the number of records in, number of records out etc. I was looking at
> adding some metrics and contributing back as well as enabling the metrics
> which are already getting published by the aws-hadoop. Is that something
> which is of value for the community?
> 
> Another change I am proposing is to make the constructor of
> StreamingFileSink protected instead of private here:
> https://tinyurl.com/y5vh4jn6. If we can make this as protected, then it is
> possible to extend this class and have custom metrics for anyone to add in
> the 'open' method.
> 
> Thanks
> Kailash
>