You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Marco Villalobos <mv...@beyond.ai> on 2020/02/24 20:23:47 UTC

Timeseries aggregation with many IoT devices off of one Kafka topic.

I need to collect timeseries data from thousands of IoT devices. Each device has name, value, and timestamp published to one Kafka topic.  The event time timestamps are in order only relation with the individual device, but out of order with respect to other devices.

Is there a way to aggregate a 15 minute window of this in which each IoT devices gets aggregated with its own event time?

I would deeply appreciate if somebody could guide me to an approach for solving this in Flink.

I wish there was a group chat for these type of problems.


Re: Timeseries aggregation with many IoT devices off of one Kafka topic.

Posted by Theo Diefenthal <th...@scoop-software.de>.
Hi, 

Ververica has great tutorials online on how to write Flink pipelines, also with a small training section with regards to Process functions: 

[ https://training.ververica.com/lessons/processfunction.html | https://training.ververica.com/lessons/processfunction.html ] 

Best regards 
Theo 


Von: "Khachatryan Roman" <kh...@gmail.com> 
An: "Avinash Tripathy" <av...@stellapps.com> 
CC: "Theo Diefenthal" <th...@scoop-software.de>, "hemant singh" <he...@gmail.com>, "Marco Villalobos" <mv...@beyond.ai>, "user" <us...@flink.apache.org> 
Gesendet: Dienstag, 25. Februar 2020 19:08:16 
Betreff: Re: Timeseries aggregation with many IoT devices off of one Kafka topic. 

Hi, 
I think conceptually the pipeline could look something like this: 
env 
.addSource(...) 
.keyBy("device_id") 
.window(SlidingEventTimeWindows.of(Time.minutes(15), Time.seconds(10))) 
.trigger(new Trigger { 
def onElement(el, timestamp, window, ctx) = { 
if (window.start == TimeWindow.getWindowStartWithOffset(timestamp, 0, 10_000)) { 
ctx.registerEventTimeTimer(window.end) 
} 
TriggerResult.CONTINUE 
} 
def onEventTime(time, window, ctx) = { 
TriggerResult.FIRE 
} 
})) 
.aggregate(...) 

(slide 10s needs to be adjusted) 

Regards, 
Roman 


On Tue, Feb 25, 2020 at 3:44 PM Avinash Tripathy < [ mailto:avinash.tripathy@stellapps.com | avinash.tripathy@stellapps.com ] > wrote: 



Hi Theo, 

We also have the same scenario. If it would be great if you could provide some examples or more details about flink process function. 

Thanks, 
Avinash 

On Tue, Feb 25, 2020 at 12:29 PM [ mailto:theo.diefenthal@scoop-software.de | theo.diefenthal@scoop-software.de ] < [ mailto:theo.diefenthal@scoop-software.de | theo.diefenthal@scoop-software.de ] > wrote: 

BQ_BEGIN

Hi, 
At last flink forward in Berlin I spoke with some persons about the same problem, where they had construction devices as IoT sensors which could even be offline for multiple days. 

They told me that the major problem for them was that the watermark in Flink is maintained per operator /subtask, even if you group by key. 

They solved their problem via a Flink process function where they have full control over state and timers, so you can deal with each device as you like and can e. g. maintain something similar to a per device watermark. I also think that it is the best way to go for this usecase. 

Best regards 
Theo 




-------- Ursprüngliche Nachricht -------- 
Von: hemant singh < [ mailto:hemant2184@gmail.com | hemant2184@gmail.com ] > 
Datum: Di., 25. Feb. 2020, 06:19 
An: Marco Villalobos < [ mailto:mvillalobos@beyond.ai | mvillalobos@beyond.ai ] > 
Cc: [ mailto:user@flink.apache.org | user@flink.apache.org ] 
Betreff: Re: Timeseries aggregation with many IoT devices off of one Kafka topic. 

BQ_BEGIN

Hello, 
I am also working on something similar. Below is the pipeline design I have, sharing may be it can be helpful. 

topic -> keyed stream on device-id -> window operation -> sink. 

You can PM me on further details. 

Thanks, 
Hemant 

On Tue, Feb 25, 2020 at 1:54 AM Marco Villalobos < [ mailto:mvillalobos@beyond.ai | mvillalobos@beyond.ai ] > wrote: 

BQ_BEGIN



I need to collect timeseries data from thousands of IoT devices. Each device has name, value, and timestamp published to one Kafka topic. The event time timestamps are in order only relation with the individual device, but out of order with respect to other devices. 



Is there a way to aggregate a 15 minute window of this in which each IoT devices gets aggregated with its own event time? 



I would deeply appreciate if somebody could guide me to an approach for solving this in Flink. 



I wish there was a group chat for these type of problems. 






BQ_END


BQ_END


BQ_END



-- 
SCOOP Software GmbH - Gut Maarhausen - Eiler Straße 3 P - D-51107 Köln 
Theo Diefenthal 

T +49 221 801916-196 - F +49 221 801916-17 - M +49 160 90506575 
theo.diefenthal@scoop-software.de - www.scoop-software.de 
Sitz der Gesellschaft: Köln, Handelsregister: Köln, 
Handelsregisternummer: HRB 36625 
Geschäftsführung: Dr. Oleg Balovnev, Frank Heinen, 
Martin Müller-Rohde, Dr. Wolfgang Reddig, Roland Scheel 

Re: Timeseries aggregation with many IoT devices off of one Kafka topic.

Posted by Khachatryan Roman <kh...@gmail.com>.
Hi,

I think conceptually the pipeline could look something like this:
env
  .addSource(...)
  .keyBy("device_id")
  .window(SlidingEventTimeWindows.of(Time.minutes(15), Time.seconds(10)))
  .trigger(new Trigger {
    def onElement(el, timestamp, window, ctx) = {
        if (window.start == TimeWindow.getWindowStartWithOffset(timestamp,
0, 10_000)) {
            ctx.registerEventTimeTimer(window.end)
        }
        TriggerResult.CONTINUE
    }
    def onEventTime(time, window, ctx) = {
        TriggerResult.FIRE
    }
  }))
  .aggregate(...)

(slide 10s needs to be adjusted)

Regards,
Roman


On Tue, Feb 25, 2020 at 3:44 PM Avinash Tripathy <
avinash.tripathy@stellapps.com> wrote:

> Hi Theo,
>
> We also have the same scenario. If it would be great if you could provide
> some examples or more details about flink process function.
>
> Thanks,
> Avinash
>
> On Tue, Feb 25, 2020 at 12:29 PM theo.diefenthal@scoop-software.de <
> theo.diefenthal@scoop-software.de> wrote:
>
>> Hi,
>>
>> At last flink forward in Berlin I spoke with some persons about the same
>> problem, where they had construction devices as IoT sensors which could
>> even be offline for multiple days.
>>
>> They told me that the major problem for them was that the watermark in
>> Flink is maintained per operator /subtask, even if you group by key.
>>
>> They solved their problem via a Flink process function where they have
>> full control over state and timers, so you can deal with each device as you
>> like and can e. g. maintain something similar to a per device watermark. I
>> also think that it is the best way to go for this usecase.
>>
>> Best regards
>> Theo
>>
>>
>>
>>
>> -------- Ursprüngliche Nachricht --------
>> Von: hemant singh <he...@gmail.com>
>> Datum: Di., 25. Feb. 2020, 06:19
>> An: Marco Villalobos <mv...@beyond.ai>
>> Cc: user@flink.apache.org
>> Betreff: Re: Timeseries aggregation with many IoT devices off of one
>> Kafka topic.
>>
>> Hello,
>>
>> I am also working on something similar. Below is the pipeline design I
>> have, sharing may be it can be helpful.
>>
>> topic -> keyed stream on device-id -> window operation -> sink.
>>
>> You can PM me on further details.
>>
>> Thanks,
>> Hemant
>>
>> On Tue, Feb 25, 2020 at 1:54 AM Marco Villalobos <mv...@beyond.ai>
>> wrote:
>>
>> I need to collect timeseries data from thousands of IoT devices. Each
>> device has name, value, and timestamp published to one Kafka topic.  The
>> event time timestamps are in order only relation with the individual
>> device, but out of order with respect to other devices.
>>
>>
>>
>> Is there a way to aggregate a 15 minute window of this in which each IoT
>> devices gets aggregated with its own event time?
>>
>>
>>
>> I would deeply appreciate if somebody could guide me to an approach for
>> solving this in Flink.
>>
>>
>>
>> I wish there was a group chat for these type of problems.
>>
>>
>>
>>

Re: Timeseries aggregation with many IoT devices off of one Kafka topic.

Posted by Avinash Tripathy <av...@stellapps.com>.
Hi Theo,

We also have the same scenario. If it would be great if you could provide
some examples or more details about flink process function.

Thanks,
Avinash

On Tue, Feb 25, 2020 at 12:29 PM theo.diefenthal@scoop-software.de <
theo.diefenthal@scoop-software.de> wrote:

> Hi,
>
> At last flink forward in Berlin I spoke with some persons about the same
> problem, where they had construction devices as IoT sensors which could
> even be offline for multiple days.
>
> They told me that the major problem for them was that the watermark in
> Flink is maintained per operator /subtask, even if you group by key.
>
> They solved their problem via a Flink process function where they have
> full control over state and timers, so you can deal with each device as you
> like and can e. g. maintain something similar to a per device watermark. I
> also think that it is the best way to go for this usecase.
>
> Best regards
> Theo
>
>
>
>
> -------- Ursprüngliche Nachricht --------
> Von: hemant singh <he...@gmail.com>
> Datum: Di., 25. Feb. 2020, 06:19
> An: Marco Villalobos <mv...@beyond.ai>
> Cc: user@flink.apache.org
> Betreff: Re: Timeseries aggregation with many IoT devices off of one Kafka
> topic.
>
> Hello,
>
> I am also working on something similar. Below is the pipeline design I
> have, sharing may be it can be helpful.
>
> topic -> keyed stream on device-id -> window operation -> sink.
>
> You can PM me on further details.
>
> Thanks,
> Hemant
>
> On Tue, Feb 25, 2020 at 1:54 AM Marco Villalobos <mv...@beyond.ai>
> wrote:
>
> I need to collect timeseries data from thousands of IoT devices. Each
> device has name, value, and timestamp published to one Kafka topic.  The
> event time timestamps are in order only relation with the individual
> device, but out of order with respect to other devices.
>
>
>
> Is there a way to aggregate a 15 minute window of this in which each IoT
> devices gets aggregated with its own event time?
>
>
>
> I would deeply appreciate if somebody could guide me to an approach for
> solving this in Flink.
>
>
>
> I wish there was a group chat for these type of problems.
>
>
>
>

AW: Timeseries aggregation with many IoT devices off of one Kafka topic.

Posted by "theo.diefenthal@scoop-software.de" <th...@scoop-software.de>.
Hi,

  

At last flink forward in Berlin I spoke with some persons about the same
problem, where they had construction devices as IoT sensors which could even
be offline for multiple days.

  

They told me that the major problem for them was that the watermark in Flink
is maintained per operator /subtask, even if you group by key.

  

They solved their problem via a Flink process function where they have full
control over state and timers, so you can deal with each device as you like
and can e. g. maintain something similar to a per device watermark. I also
think that it is the best way to go for this usecase.

  

Best regards

Theo  
  
  

  
  
\-------- Ursprüngliche Nachricht --------  
Von: hemant singh <he...@gmail.com>  
Datum: Di., 25. Feb. 2020, 06:19  
An: Marco Villalobos <mv...@beyond.ai>  
Cc: user@flink.apache.org  
Betreff: Re: Timeseries aggregation with many IoT devices off of one Kafka
topic.  

> Hello,

>

>  
>

>

> I am also working on something similar. Below is the pipeline design I have,
sharing may be it can be helpful.

>

>  
>

>

> topic -> keyed stream on device-id -> window operation -> sink.

>

>  
>

>

> You can PM me on further details.

>

>  
>

>

> Thanks,

>

> Hemant

>

>  
>

>

> On Tue, Feb 25, 2020 at 1:54 AM Marco Villalobos
<[mvillalobos@beyond.ai](mailto:mvillalobos@beyond.ai)> wrote:  
>

>

>> I need to collect timeseries data from thousands of IoT devices. Each
device has name, value, and timestamp published to one Kafka topic.  The event
time timestamps are in order only relation with the individual device, but out
of order with respect to other devices. __ __

>>

>> __  __

>>

>> Is there a way to aggregate a 15 minute window of this in which each IoT
devices gets aggregated with its own event time? __ __

>>

>> __  __

>>

>> I would deeply appreciate if somebody could guide me to an approach for
solving this in Flink. __ __

>>

>> __  __

>>

>> I wish there was a group chat for these type of problems.  ____

>>

>> __  __


Re: Timeseries aggregation with many IoT devices off of one Kafka topic.

Posted by hemant singh <he...@gmail.com>.
Hello,

I am also working on something similar. Below is the pipeline design I
have, sharing may be it can be helpful.

topic -> keyed stream on device-id -> window operation -> sink.

You can PM me on further details.

Thanks,
Hemant

On Tue, Feb 25, 2020 at 1:54 AM Marco Villalobos <mv...@beyond.ai>
wrote:

> I need to collect timeseries data from thousands of IoT devices. Each
> device has name, value, and timestamp published to one Kafka topic.  The
> event time timestamps are in order only relation with the individual
> device, but out of order with respect to other devices.
>
>
>
> Is there a way to aggregate a 15 minute window of this in which each IoT
> devices gets aggregated with its own event time?
>
>
>
> I would deeply appreciate if somebody could guide me to an approach for
> solving this in Flink.
>
>
>
> I wish there was a group chat for these type of problems.
>
>
>