You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Chiara Troiani <t....@gmail.com> on 2021/10/28 16:59:48 UTC

[Question] Apache Beam for real time timeseries

Hello,

I am pretty new to Beam and I have a basic question.

Supposing I am ingesting streaming data at a frequency of 1Hz (event time).
I need to process them in real time by key, and output results at 1Hz
frequency.
Time order is extremely important.
I need to compute a result which depends on the current PCollectionElement
and on the previous one (in event time), per each key.

I saw I can use state and timers, but I have the impression that I cannot
compute results at the same frequency that input data is produced.

Is there a way to ensure time order in this case?
Did I misunderstand something?

Many thanks in advance,
Chiara

Re: [Question] Apache Beam for real time timeseries

Posted by Luke Cwik <lc...@google.com>.
Are you talking about actual real time in computers such as in signal
processing or something like that?  If so then Apache Beam is not a good
fit.

If not, then it depends on how much buffer time you have on when data is
ingested and when you produce output. Different runners do better/worse
here and it definitely depends on the scale you are looking at. I would
suggest trying the direct runner which runs everything in one process to
start, if that is still too slow then other runners are unlikely to be a
good fit.

As for the solution. you can use a StatefulDoFn marked
with @RequiresTimeSortedInput and use a processing time timer at a 1 second
interval to produce output. Note that not all runners support
@RequiresTimeSortedInput.



On Thu, Oct 28, 2021 at 10:00 AM Chiara Troiani <t....@gmail.com>
wrote:

> Hello,
>
> I am pretty new to Beam and I have a basic question.
>
> Supposing I am ingesting streaming data at a frequency of 1Hz (event time).
> I need to process them in real time by key, and output results at 1Hz
> frequency.
> Time order is extremely important.
> I need to compute a result which depends on the current PCollectionElement
> and on the previous one (in event time), per each key.
>
> I saw I can use state and timers, but I have the impression that I cannot
> compute results at the same frequency that input data is produced.
>
> Is there a way to ensure time order in this case?
> Did I misunderstand something?
>
> Many thanks in advance,
> Chiara
>