You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Chris Madge <ch...@madg.es> on 2020/01/04 14:53:57 UTC

designing a streaming task for count and event time difference

Hi there,

It’s my first voyage into stream processing - I’ve tried a few things but I think I’m struggling to think in the streams way. I wondered if I could be cheeky and ask if someone could give me some clues as to the correct design for my first task to get me started?

I have application events coming in like:

<timestamp>,type:start,<user_id>
<timestamp>,type:action,<user_id>
<timestamp>,type:action,<user_id>
<timestamp>,type:action,<user_id>
<timestamp>,type:end,<user_id>

each one represents a single user session.

I need to output:
<timestamp of start event>,<duration between start and end event>,<user_id>,<count_of_action_events>

I’m working with event time (specified by the application) and I can’t trust the application to close sessions gracefully (I’m happy for those to be thrown out, but cool ideas for alternatives are very welcome!).

Any advice would be much appreciated.

Chris Madge