You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Vishwas Siravara <vs...@gmail.com> on 2019/11/01 15:24:49 UTC

Re ordering events with flink

Hi guys,
I want to know if it's possible to sort events in a flink data stream. I
know I can't sort a stream but is there a way in which I can buffer for a
very short time and sort those events before sending it to a data sink.

In our scenario we consume from a kafka topic which has multiple partitions
but the data in these brokers are *not* partitioned by a key(its round
robin) , for example we want to time order transactions associated with a
particular account but since the same account number ends up in
different partitions at the source for different transactions we are not
able to maintain event time order in our stream processing system since the
same account number ends up in different task managers and slots. We do
however partition by account number when we send the events to downstream
kafka sink so that transactions from the same account number end up in the
same partition. This is however not good enough since the events are not
sorted at the source.

Any ideas for doing this is much appreciated.


Best,
Vishwas

Re: Re ordering events with flink

Posted by David Anderson <da...@ververica.com>.
Yes, it's possible to sort a stream by the event time timestamps on
the events. This depends on having reliable watermarks -- as events
that are late can not be emitted in order.

There are several ways to accomplish this. You could, for example, use
a ProcessFunction, and implement the sorting yourself, or use a
Window, or take advantage of the sorting that's already been
implemented in either the SQL or CEP libraries.

For example of doing this with windows, see
https://stackoverflow.com/questions/58539379/guarantee-of-event-time-order-in-flinkkafkaconsumer.
For an implementation using SQL, see
https://stackoverflow.com/a/54970489/2000823. For an implementation
using the CEP library, including the use of side outputs for late
events, see https://github.com/ververica/flink-training-exercises/blob/master/src/main/java/com/ververica/flinktraining/examples/datastream_java/cep/Sort.java.

Best,
David

On Fri, Nov 1, 2019 at 10:21 PM Vishwas Siravara <vs...@gmail.com> wrote:
>
> Hi guys,
> I want to know if it's possible to sort events in a flink data stream. I know I can't sort a stream but is there a way in which I can buffer for a very short time and sort those events before sending it to a data sink.
>
> In our scenario we consume from a kafka topic which has multiple partitions but the data in these brokers are not partitioned by a key(its round robin) , for example we want to time order transactions associated with a particular account but since the same account number ends up in different partitions at the source for different transactions we are not able to maintain event time order in our stream processing system since the same account number ends up in different task managers and slots. We do however partition by account number when we send the events to downstream kafka sink so that transactions from the same account number end up in the same partition. This is however not good enough since the events are not sorted at the source.
>
> Any ideas for doing this is much appreciated.
>
>
> Best,
> Vishwas