You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Dumitru-Nicolae Marasoui <Ni...@kaluza.com> on 2020/07/13 18:55:30 UTC

ktable - ktable join

Hello kafka community,
In a ktable-ktable join, assuming that kt1 has k1 and v1 that contains k2
and kt2 has k2 and v2,
is it possible that the (k1, v1, k2, v2) pair is never emitted?
I am trying to understand how it works and if any race condition would be
possible.
If race conditions would not be possible, then, ignoring any deduplication
or filtering, what I imagine would be emitted is one of:
(k1, v1, null, null), (k1, v1, k2, v2) or
(null, null, k2, v2), (k1, v1, k2, v2)
If this would be the case, and given our data is immutable in this
particular k1 - k2 - v2, I could just filter in cases k1, v1, k2, v2 are
all non null and emit only those tuples.
But would it be possible that the two messages that are linked, that come
to different topics, are concurrently processed? Theoretically I think so?
Which would be the solution to this? A linearizable store like a single
threaded Redis?
Thank you,

-- 

Dumitru-Nicolae Marasoui

Software Engineer



w kaluza.com <https://www.kaluza.com/>

LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter
<https://twitter.com/Kaluza_tech>

Kaluza Ltd. registered in England and Wales No. 08785057

VAT No. 100119879

Help save paper - do you need to print this email?

Re: ktable - ktable join

Posted by "Matthias J. Sax" <mj...@apache.org>.
I am not sure if I understand the question correctly, but a 1:n join in
Kafka Streams does not miss any data.

As explained in a previous answer, you can consider the join eventual
consistent and if you stop sending new data to the input topic, the
"final" join result will be exact.

Also, records in a join a not processed concurrently but a single thread
is handling the join and thus no race conditions applies.


-Matthias

On 7/13/20 11:55 AM, Dumitru-Nicolae Marasoui wrote:
> Hello kafka community,
> In a ktable-ktable join, assuming that kt1 has k1 and v1 that contains k2
> and kt2 has k2 and v2,
> is it possible that the (k1, v1, k2, v2) pair is never emitted?
> I am trying to understand how it works and if any race condition would be
> possible.
> If race conditions would not be possible, then, ignoring any deduplication
> or filtering, what I imagine would be emitted is one of:
> (k1, v1, null, null), (k1, v1, k2, v2) or
> (null, null, k2, v2), (k1, v1, k2, v2)
> If this would be the case, and given our data is immutable in this
> particular k1 - k2 - v2, I could just filter in cases k1, v1, k2, v2 are
> all non null and emit only those tuples.
> But would it be possible that the two messages that are linked, that come
> to different topics, are concurrently processed? Theoretically I think so?
> Which would be the solution to this? A linearizable store like a single
> threaded Redis?
> Thank you,
>