You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Alex Amato <aj...@google.com> on 2019/07/12 17:20:46 UTC

Bucketed histogram metrics in beam. Anyone currently looking into this?

Hi,

I was wondering if anyone has any plans to introduce bucketed histogram to
beam (different from Distribution, which is just min, max, sum and count
values)? I have some thoughts about how it could be done so that it
integrates with stackdriver.

Essentially I am referring to a timeseries of histograms, displaying
buckets of values at fixed windows in time.

Re: Bucketed histogram metrics in beam. Anyone currently looking into this?

Posted by Alex Amato <aj...@google.com>.
Thanks Steve, is your fork available for me to see? Would you mind linking
me to the PRs you introduced to add the histogram support to the dataflow
worker

On Fri, Jul 12, 2019 at 11:52 AM Steve Niemitz <sn...@apache.org> wrote:

> I've been doing some experiments in my own fork of the Dataflow worker
> using HdrHistogram [1] to record histograms.  I export them to our own
> stats collector, not Stackdriver, but have been having good success with
> them.
>
> The problem is that the dataflow worker metrics implementation is totally
> different than the beam metrics implementation, but the concept would
> translate pretty easily I imagine.
>
> [1] https://github.com/HdrHistogram/HdrHistogram
>
> On Fri, Jul 12, 2019 at 1:33 PM Pablo Estrada <pa...@google.com> wrote:
>
>> I am not aware of anyone working on this. I do recall a couple things:
>>
>> - These metrics can be very large in terms of space. Users may cause
>> themselves trouble if they define too many of them.
>>     - Not enough reason not to do it, but certainly worth considering.
>> - There is some code added by Boyuan to develop highly efficient
>> histogram-type metrics.
>>
>> Best
>> -P.
>>
>> On Fri, Jul 12, 2019 at 10:21 AM Alex Amato <aj...@google.com> wrote:
>>
>>> Hi,
>>>
>>> I was wondering if anyone has any plans to introduce bucketed
>>> histogram to beam (different from Distribution, which is just min, max, sum
>>> and count values)? I have some thoughts about how it could be done so that
>>> it integrates with stackdriver.
>>>
>>> Essentially I am referring to a timeseries of histograms, displaying
>>> buckets of values at fixed windows in time.
>>>
>>

Re: Bucketed histogram metrics in beam. Anyone currently looking into this?

Posted by Steve Niemitz <sn...@apache.org>.
I've been doing some experiments in my own fork of the Dataflow worker
using HdrHistogram [1] to record histograms.  I export them to our own
stats collector, not Stackdriver, but have been having good success with
them.

The problem is that the dataflow worker metrics implementation is totally
different than the beam metrics implementation, but the concept would
translate pretty easily I imagine.

[1] https://github.com/HdrHistogram/HdrHistogram

On Fri, Jul 12, 2019 at 1:33 PM Pablo Estrada <pa...@google.com> wrote:

> I am not aware of anyone working on this. I do recall a couple things:
>
> - These metrics can be very large in terms of space. Users may cause
> themselves trouble if they define too many of them.
>     - Not enough reason not to do it, but certainly worth considering.
> - There is some code added by Boyuan to develop highly efficient
> histogram-type metrics.
>
> Best
> -P.
>
> On Fri, Jul 12, 2019 at 10:21 AM Alex Amato <aj...@google.com> wrote:
>
>> Hi,
>>
>> I was wondering if anyone has any plans to introduce bucketed
>> histogram to beam (different from Distribution, which is just min, max, sum
>> and count values)? I have some thoughts about how it could be done so that
>> it integrates with stackdriver.
>>
>> Essentially I am referring to a timeseries of histograms, displaying
>> buckets of values at fixed windows in time.
>>
>

Re: Bucketed histogram metrics in beam. Anyone currently looking into this?

Posted by Pablo Estrada <pa...@google.com>.
I am not aware of anyone working on this. I do recall a couple things:

- These metrics can be very large in terms of space. Users may cause
themselves trouble if they define too many of them.
    - Not enough reason not to do it, but certainly worth considering.
- There is some code added by Boyuan to develop highly efficient
histogram-type metrics.

Best
-P.

On Fri, Jul 12, 2019 at 10:21 AM Alex Amato <aj...@google.com> wrote:

> Hi,
>
> I was wondering if anyone has any plans to introduce bucketed histogram to
> beam (different from Distribution, which is just min, max, sum and count
> values)? I have some thoughts about how it could be done so that it
> integrates with stackdriver.
>
> Essentially I am referring to a timeseries of histograms, displaying
> buckets of values at fixed windows in time.
>