You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Dumitru-Nicolae Marasoui <Ni...@kaluza.com> on 2020/07/13 20:18:08 UTC

kafka-streams merge + aggregate vs merge + to topic + from topic + aggregate

Hi,
I would like to understand the ordering guarantees if any at the merge
operator level. I think unless I am writing into a topic, there can be no
ordering guarantees, is that so?

I would normally do a key transformation, merge, write to an output topic,
and i know the partition (key) ordering guarantees, and have a second
kafka-streams from that topic to the aggregation and to the final topic.

Is there a way to make kafka-streams do this behind the scenes? I do not
see a way to guarantee the ordering outside of an intermediate topic,
implicit or explicit?
Thank you,

-- 

Dumitru-Nicolae Marasoui

Software Engineer



w kaluza.com <https://www.kaluza.com/>

LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter
<https://twitter.com/Kaluza_tech>

Kaluza Ltd. registered in England and Wales No. 08785057

VAT No. 100119879

Help save paper - do you need to print this email?

Re: kafka-streams merge + aggregate vs merge + to topic + from topic + aggregate

Posted by "Matthias J. Sax" <mj...@apache.org>.
If you use `merge()` merge preserve the (relative) order for each input,
but the result will contain records for both inputs interleaved.

For example:

topicA-p0:  A  B  C
topicB-p0:  X  Y  Z

In the output KStream, A will be before B, and B will be before C.
Similar for X, Y, Z.

How A,B,C and X,Y,Z interleave in the output stream, depends on the
timestamps of the records. Kafka Streams processed records in timestamp
order and thus alternates between both input.

Example with timestamps:

topicA-p0:  A(3)  B(7)  C(10)
topicB-p0:  X(2)  Y(8)  Z(9)

The output KStream would be exactly:

  X(2)  A(3)  B(7)  Y(8)  Z(9)  C(10)

As mentioned in a previous answer to another email, the only thing that
could "mess up" the order of the output stream is if one input is empty
and the other is processed and later the first empty input get data that
could be out-of-order now. Using `max.task.idle.ms` you can pause
processing for a certain amount of time though if one input becomes
empty to work against this potential issue.


-Matthias

On 7/13/20 1:18 PM, Dumitru-Nicolae Marasoui wrote:
> Hi,
> I would like to understand the ordering guarantees if any at the merge
> operator level. I think unless I am writing into a topic, there can be no
> ordering guarantees, is that so?
> 
> I would normally do a key transformation, merge, write to an output topic,
> and i know the partition (key) ordering guarantees, and have a second
> kafka-streams from that topic to the aggregation and to the final topic.
> 
> Is there a way to make kafka-streams do this behind the scenes? I do not
> see a way to guarantee the ordering outside of an intermediate topic,
> implicit or explicit?
> Thank you,
>