You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Dave Greeko <da...@yahoo.com.INVALID> on 2022/05/02 23:56:10 UTC

Event Grouping

Hi there,
We have a number of IoT devices that send multiple related events in separate TCP packets to a Kafka broker. The  IoT devices have access to a PLC system that monitors a set of sensors for an industrial steam boiler. The PLC raises multiple independent events for a random number of sensors per cycle where each cycle is uniquely identified by an id.  Each PLC cycle may have a reading from 1 to 10 different sensors each of which is sent separately to an IoT device. The IoT device simply relay this event (one per sensor) to a Kafka broker. It’s a one-to-many relation (1 PLC cycle with a unique id = many events from multiple sensors) our goal is group the events per IoT device and plc_cycle_id and the order is constraint by a sequence number.

Here is a data sample:
TCP packet #1:
{plc_ cycle _id:”AABB”, sensors_per_cycle:3, seq_number:1, iot_device_id:1, sensor_data: {name:”pressure”, data:{value:16.3, si_unit:”bar”}}

TCP packet #2:
{ plc_ cycle _id:” AABB”, sensors_per_cycle:3, seq_number:2, iot_device_id:1, sensor_data: {name:”watter_level”, data:{value:3.8, si_unit:”m”}}

TCP packet #3:
{ plc_ cycle _id:” AABB”, sensors_per_cycle:3, seq_number:3, iot_device_id:1, sensor_data: {name:”steam_drum_temp”, data:{value:99.6, si_unit:”c”}}

Each kafka event has a reference to the total events to be grouped (sensors_per_cycle) and the order for which those events will be grouped by, is taken from seq_number. 

Regards,
Dave


Re: Event Grouping

Posted by Liam Clarke-Hutchinson <lc...@redhat.com>.
Hi Dave,

Okay, fair enough. Given that you've got a key to group on, and a
sensors_per_cycle to tell you how many events to expect, contingent on data
volumes you could easily aggregate in a regular ol Python map of key to a
list of events using Python clients to consume them and then produce the
aggregated results. You'd have to answer the question of "what if the
expected number of sensor readings in the cycle don't occur", and evict as
needed.

Alternatively, if the scale of volume is such that you need to go towards a
distributed solution, both Spark Streaming and Flink have Python APIs :)

Cheers,

Liam



On Tue, 3 May 2022 at 15:24, Dave Greeko <da...@yahoo.com.invalid>
wrote:

>  Hi Liam,
> Thanks for the answer and you are right; the question is how to aggregate
> input events into another topic. I have looked into streams but my initial
> understanding is that the stream API can be done in Java only. Our
> development environment is strictly Python and while at it, we also looked
> into Faust but we're preferring something that is pure Kafka without Java
> development.
>
>
>
>      On Monday, May 2, 2022, 07:11:55 PM PDT, Liam Clarke-Hutchinson <
> lclarkeh@redhat.com> wrote:
>
>  Hi Dave,
>
> I think you forgot to ask your question :D  However, if you're looking to
> group those events from the input topic and then putting the aggregated
> event onto another topic, have you looked into Kafka Streams?
> https://kafka.apache.org/31/documentation/streams/
>
> Cheers,
>
> Liam
>
> On Tue, 3 May 2022 at 12:05, Dave Greeko <da...@yahoo.com.invalid>
> wrote:
>
> > Hi there,
> > We have a number of IoT devices that send multiple related events in
> > separate TCP packets to a Kafka broker. The  IoT devices have access to a
> > PLC system that monitors a set of sensors for an industrial steam boiler.
> > The PLC raises multiple independent events for a random number of sensors
> > per cycle where each cycle is uniquely identified by an id.  Each PLC
> cycle
> > may have a reading from 1 to 10 different sensors each of which is sent
> > separately to an IoT device. The IoT device simply relay this event (one
> > per sensor) to a Kafka broker. It’s a one-to-many relation (1 PLC cycle
> > with a unique id = many events from multiple sensors) our goal is group
> the
> > events per IoT device and plc_cycle_id and the order is constraint by a
> > sequence number.
> >
> > Here is a data sample:
> > TCP packet #1:
> > {plc_ cycle _id:”AABB”, sensors_per_cycle:3, seq_number:1,
> > iot_device_id:1, sensor_data: {name:”pressure”, data:{value:16.3,
> > si_unit:”bar”}}
> >
> > TCP packet #2:
> > { plc_ cycle _id:” AABB”, sensors_per_cycle:3, seq_number:2,
> > iot_device_id:1, sensor_data: {name:”watter_level”, data:{value:3.8,
> > si_unit:”m”}}
> >
> > TCP packet #3:
> > { plc_ cycle _id:” AABB”, sensors_per_cycle:3, seq_number:3,
> > iot_device_id:1, sensor_data: {name:”steam_drum_temp”, data:{value:99.6,
> > si_unit:”c”}}
> >
> > Each kafka event has a reference to the total events to be grouped
> > (sensors_per_cycle) and the order for which those events will be grouped
> > by, is taken from seq_number.
> >
> > Regards,
> > Dave
> >
> >
>

Re: Event Grouping

Posted by Dave Greeko <da...@yahoo.com.INVALID>.
 Hi Liam,
Thanks for the answer and you are right; the question is how to aggregate input events into another topic. I have looked into streams but my initial understanding is that the stream API can be done in Java only. Our development environment is strictly Python and while at it, we also looked into Faust but we're preferring something that is pure Kafka without Java development. 



     On Monday, May 2, 2022, 07:11:55 PM PDT, Liam Clarke-Hutchinson <lc...@redhat.com> wrote:  
 
 Hi Dave,

I think you forgot to ask your question :D  However, if you're looking to
group those events from the input topic and then putting the aggregated
event onto another topic, have you looked into Kafka Streams?
https://kafka.apache.org/31/documentation/streams/

Cheers,

Liam

On Tue, 3 May 2022 at 12:05, Dave Greeko <da...@yahoo.com.invalid>
wrote:

> Hi there,
> We have a number of IoT devices that send multiple related events in
> separate TCP packets to a Kafka broker. The  IoT devices have access to a
> PLC system that monitors a set of sensors for an industrial steam boiler.
> The PLC raises multiple independent events for a random number of sensors
> per cycle where each cycle is uniquely identified by an id.  Each PLC cycle
> may have a reading from 1 to 10 different sensors each of which is sent
> separately to an IoT device. The IoT device simply relay this event (one
> per sensor) to a Kafka broker. It’s a one-to-many relation (1 PLC cycle
> with a unique id = many events from multiple sensors) our goal is group the
> events per IoT device and plc_cycle_id and the order is constraint by a
> sequence number.
>
> Here is a data sample:
> TCP packet #1:
> {plc_ cycle _id:”AABB”, sensors_per_cycle:3, seq_number:1,
> iot_device_id:1, sensor_data: {name:”pressure”, data:{value:16.3,
> si_unit:”bar”}}
>
> TCP packet #2:
> { plc_ cycle _id:” AABB”, sensors_per_cycle:3, seq_number:2,
> iot_device_id:1, sensor_data: {name:”watter_level”, data:{value:3.8,
> si_unit:”m”}}
>
> TCP packet #3:
> { plc_ cycle _id:” AABB”, sensors_per_cycle:3, seq_number:3,
> iot_device_id:1, sensor_data: {name:”steam_drum_temp”, data:{value:99.6,
> si_unit:”c”}}
>
> Each kafka event has a reference to the total events to be grouped
> (sensors_per_cycle) and the order for which those events will be grouped
> by, is taken from seq_number.
>
> Regards,
> Dave
>
>
  

Re: Event Grouping

Posted by Liam Clarke-Hutchinson <lc...@redhat.com>.
Hi Dave,

I think you forgot to ask your question :D  However, if you're looking to
group those events from the input topic and then putting the aggregated
event onto another topic, have you looked into Kafka Streams?
https://kafka.apache.org/31/documentation/streams/

Cheers,

Liam

On Tue, 3 May 2022 at 12:05, Dave Greeko <da...@yahoo.com.invalid>
wrote:

> Hi there,
> We have a number of IoT devices that send multiple related events in
> separate TCP packets to a Kafka broker. The  IoT devices have access to a
> PLC system that monitors a set of sensors for an industrial steam boiler.
> The PLC raises multiple independent events for a random number of sensors
> per cycle where each cycle is uniquely identified by an id.  Each PLC cycle
> may have a reading from 1 to 10 different sensors each of which is sent
> separately to an IoT device. The IoT device simply relay this event (one
> per sensor) to a Kafka broker. It’s a one-to-many relation (1 PLC cycle
> with a unique id = many events from multiple sensors) our goal is group the
> events per IoT device and plc_cycle_id and the order is constraint by a
> sequence number.
>
> Here is a data sample:
> TCP packet #1:
> {plc_ cycle _id:”AABB”, sensors_per_cycle:3, seq_number:1,
> iot_device_id:1, sensor_data: {name:”pressure”, data:{value:16.3,
> si_unit:”bar”}}
>
> TCP packet #2:
> { plc_ cycle _id:” AABB”, sensors_per_cycle:3, seq_number:2,
> iot_device_id:1, sensor_data: {name:”watter_level”, data:{value:3.8,
> si_unit:”m”}}
>
> TCP packet #3:
> { plc_ cycle _id:” AABB”, sensors_per_cycle:3, seq_number:3,
> iot_device_id:1, sensor_data: {name:”steam_drum_temp”, data:{value:99.6,
> si_unit:”c”}}
>
> Each kafka event has a reference to the total events to be grouped
> (sensors_per_cycle) and the order for which those events will be grouped
> by, is taken from seq_number.
>
> Regards,
> Dave
>
>