You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jan Lukavský <je...@seznam.cz> on 2023/10/02 09:00:32 UTC

Re: simplest way to do exponential moving average?

Hi,

this depends on how exactly you plan to calculate the average. The 
original definition is based on exponentially decreasing weight of more 
distant (older if time is on the x-axis) data points. This (technically) 
means that this average at any point X1 depends on all values X0 <= X1. 
This would therefore require buffering (using GroupByKey) all elements 
in global window, doing the sorting manually and then computing the new 
value of the average triggering after each element. This is probably the 
technically correct, but most computationally intensive variant.

If the average is done over time intervals, then an other option could 
be to define a cut-off interval T, i.e. set the exponentially vanishing 
weight of value of data points to be zero at some T0 < T1 - T. If the 
data points come at some discrete time-intervals (minutes, hours, days), 
then this could mean you can split the data into time sliding windows 
(window interval being the cut-off interval, and slide the update 
interval) and assign weight for each data point in the particular time 
interval - i.e. how much weight does the data point have at the time of 
end of the sliding window. With this you could then using CombineFn to 
count and sum the weighted averages, which would be much more efficient.

Best,

  Jan

On 9/30/23 17:08, Balogh, György wrote:
> Hi,
> I want to calculate the exponential moving average of a signal using 
> beam in java.
> I understand there is no time order guarantee on incoming data. What 
> would be the simplest solution for this?
> Thank you,
>
> -- 
>
> György Balogh
> CTO
> E 	gyorgy.balogh@ultinous.com <ma...@ultinous.com>
> M 	+36 30 270 8342 <tel:+36%2030%20270%208342>
> A 	HU, 1117 Budapest, Budafoki út 209.
> W 	www.ultinous.com <http://www.ultinous.com>
>

Re: simplest way to do exponential moving average?

Posted by Reuven Lax via user <us...@beam.apache.org>.
On Mon, Oct 2, 2023 at 2:00 AM Jan Lukavský <je...@seznam.cz> wrote:

> Hi,
>
> this depends on how exactly you plan to calculate the average. The
> original definition is based on exponentially decreasing weight of more
> distant (older if time is on the x-axis) data points. This (technically)
> means that this average at any point X1 depends on all values X0 <= X1.
> This would therefore require buffering (using GroupByKey) all elements in
> global window, doing the sorting manually and then computing the new value
> of the average triggering after each element. This is probably the
> technically correct, but most computationally intensive variant.
>

To clarify - you would probably buffer the elements in OrderedListState,
and set periodic event-time timers to fetch them and compute the average.
OrderedListState will return the elements in order, so you wouldn't have to
sort. This is assuming you are talking about streaming pipelines.


> If the average is done over time intervals, then an other option could be
> to define a cut-off interval T, i.e. set the exponentially vanishing weight
> of value of data points to be zero at some T0 < T1 - T. If the data points
> come at some discrete time-intervals (minutes, hours, days), then this
> could mean you can split the data into time sliding windows (window
> interval being the cut-off interval, and slide the update interval) and
> assign weight for each data point in the particular time interval - i.e.
> how much weight does the data point have at the time of end of the sliding
> window. With this you could then using CombineFn to count and sum the
> weighted averages, which would be much more efficient.
>
> Best,
>
>  Jan
> On 9/30/23 17:08, Balogh, György wrote:
>
> Hi,
> I want to calculate the exponential moving average of a signal using beam
> in java.
> I understand there is no time order guarantee on incoming data. What would
> be the simplest solution for this?
> Thank you,
>
> --
>
> György Balogh
> CTO
> E gyorgy.balogh@ultinous.com <zs...@ultinous.com>
> M +36 30 270 8342 <+36%2030%20270%208342>
> A HU, 1117 Budapest, Budafoki út 209.
> W www.ultinous.com
>
>

Re: simplest way to do exponential moving average?

Posted by Kenneth Knowles <ke...@apache.org>.
Just to be pedantic about it: Jan's approach is preferred because it would
be much more _parallel_. Any actual computation that depends on everything
being in order is by definition not parallel (nothing to do with Beam).

Kenn

On Mon, Oct 2, 2023 at 5:00 AM Jan Lukavský <je...@seznam.cz> wrote:

> Hi,
>
> this depends on how exactly you plan to calculate the average. The
> original definition is based on exponentially decreasing weight of more
> distant (older if time is on the x-axis) data points. This (technically)
> means that this average at any point X1 depends on all values X0 <= X1.
> This would therefore require buffering (using GroupByKey) all elements in
> global window, doing the sorting manually and then computing the new value
> of the average triggering after each element. This is probably the
> technically correct, but most computationally intensive variant.
>
> If the average is done over time intervals, then an other option could be
> to define a cut-off interval T, i.e. set the exponentially vanishing weight
> of value of data points to be zero at some T0 < T1 - T. If the data points
> come at some discrete time-intervals (minutes, hours, days), then this
> could mean you can split the data into time sliding windows (window
> interval being the cut-off interval, and slide the update interval) and
> assign weight for each data point in the particular time interval - i.e.
> how much weight does the data point have at the time of end of the sliding
> window. With this you could then using CombineFn to count and sum the
> weighted averages, which would be much more efficient.
>
> Best,
>
>  Jan
> On 9/30/23 17:08, Balogh, György wrote:
>
> Hi,
> I want to calculate the exponential moving average of a signal using beam
> in java.
> I understand there is no time order guarantee on incoming data. What would
> be the simplest solution for this?
> Thank you,
>
> --
>
> György Balogh
> CTO
> E gyorgy.balogh@ultinous.com <zs...@ultinous.com>
> M +36 30 270 8342 <+36%2030%20270%208342>
> A HU, 1117 Budapest, Budafoki út 209.
> W www.ultinous.com
>
>