You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Dan Pettersson <da...@gmail.com> on 2019/11/06 21:03:12 UTC

Stateful functions

Hello,

I've started to play around with Stateful functions and I like it a lot :-)
Also Thanks for the comprehensive documentation and your very good talk Igal.

I would appreciate if you could give some hints/ideas over how to structure
an application with the following criteria:

One kafka ingress with billions of incoming messages per day.
These are messages from stock exchanges and id is the stock id.

There are around 30 independent functions that will subscribe to these messages.

Is it better to only have one module that via its router sends each message (async)
to all the other functions or is it better that each module subscribes to the same
kafka ingress? With the first solution, only one deserialization will be done per message
but there is only one checkpointing for all the 30 functions.. With the second solution,
there will be a more fine-grained fault tolerance and I guess one can deliver/patch modules
independently but the deserialization for each module will be an overhead compared to
solution 1.

I've just started learning about Flink and its Table and SQL API for the last 6 months and now the last month about Stateful functions. So sorry if my questions are unclear but I would really appreciate if someone could give some short advice on how to structure an application as described above. Throughput is important and not so much the ability to restart with check-/savepoints. If having only one router for all functions is the best option how can one register each Function Type to the global router in an elegant way?

Any guidance would be helpful so Thanks in advance,

Regards
Dan