You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Dmitry Minkovsky <dm...@gmail.com> on 2018/01/23 15:46:46 UTC

Re: Merging Two KTables

> Merging two tables does not make too much sense because each table might
contain an entry for the same key. So it's unclear, which of both values
the merged table should contain.

Which of both values should the table contain? Seems straightforward: it
should contain the value with the highest timestamp, with non-deterministic
behavior when two timestamps are the same.


ср, 26 июля 2017 г. в 9:42, Matthias J. Sax <ma...@confluent.io>:

> Merging two tables does not make too much sense because each table might
> contain an entry for the same key. So it's unclear, which of both values
> the merged table should contain.
>
> KTable.toStream() is just a semantic change and has no runtime overhead.
>
> -Matthias
>
>
> On 7/26/17 1:34 PM, Sameer Kumar wrote:
> > Hi,
> >
> > Is there a way I can merge two KTables just like I have in KStreams api.
> > KBuilder.merge().
> >
> > I understand I can use KTable.toStream(), if I choose to use it, is there
> > any performance cost associated with this conversion or is it just a API
> > conversion.
> >
> > -Sameer.
> >
>
>

Re: Merging Two KTables

Posted by Guozhang Wang <wa...@gmail.com>.
Hi Sameer, Dmitry:

Just a side note that for KStream.merge(), we do not guarantee timestamp
ordering, so the resulted KStream may likely have out-of-ordering regarding
the timestamps. If you do want to have some merging operations that
respects the timestamps of the input streams because you believe they are
well aligned, you need to either assume that all input streams do not have
any out-of-ordering data, so some online merge-sort can be applied, or you
assume the out of time range has some upper bound in practice so you can
bookkeep and wait. As said, there is no golden standard rules for merging
and hence we leave it to users to customize in the "process(Processor)
API", or use "merge" if they are tolerable about timestamp ordering in the
resulted stream.


Guozhang


On Tue, Jan 23, 2018 at 1:12 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Well. That is one possibility I guess. But some other way might be to
> "merge both values" into a single one... There is no "straight forward"
> best semantics IMHO.
>
> If you really need this, you can build it via Processor API.
>
>
> -Matthias
>
>
> On 1/23/18 7:46 AM, Dmitry Minkovsky wrote:
> >> Merging two tables does not make too much sense because each table might
> > contain an entry for the same key. So it's unclear, which of both values
> > the merged table should contain.
> >
> > Which of both values should the table contain? Seems straightforward: it
> > should contain the value with the highest timestamp, with
> non-deterministic
> > behavior when two timestamps are the same.
> >
> >
> > ср, 26 июля 2017 г. в 9:42, Matthias J. Sax <ma...@confluent.io>:
> >
> >> Merging two tables does not make too much sense because each table might
> >> contain an entry for the same key. So it's unclear, which of both values
> >> the merged table should contain.
> >>
> >> KTable.toStream() is just a semantic change and has no runtime overhead.
> >>
> >> -Matthias
> >>
> >>
> >> On 7/26/17 1:34 PM, Sameer Kumar wrote:
> >>> Hi,
> >>>
> >>> Is there a way I can merge two KTables just like I have in KStreams
> api.
> >>> KBuilder.merge().
> >>>
> >>> I understand I can use KTable.toStream(), if I choose to use it, is
> there
> >>> any performance cost associated with this conversion or is it just a
> API
> >>> conversion.
> >>>
> >>> -Sameer.
> >>>
> >>
> >>
> >
>
>


-- 
-- Guozhang

Re: Merging Two KTables

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Well. That is one possibility I guess. But some other way might be to
"merge both values" into a single one... There is no "straight forward"
best semantics IMHO.

If you really need this, you can build it via Processor API.


-Matthias


On 1/23/18 7:46 AM, Dmitry Minkovsky wrote:
>> Merging two tables does not make too much sense because each table might
> contain an entry for the same key. So it's unclear, which of both values
> the merged table should contain.
> 
> Which of both values should the table contain? Seems straightforward: it
> should contain the value with the highest timestamp, with non-deterministic
> behavior when two timestamps are the same.
> 
> 
> ср, 26 июля 2017 г. в 9:42, Matthias J. Sax <ma...@confluent.io>:
> 
>> Merging two tables does not make too much sense because each table might
>> contain an entry for the same key. So it's unclear, which of both values
>> the merged table should contain.
>>
>> KTable.toStream() is just a semantic change and has no runtime overhead.
>>
>> -Matthias
>>
>>
>> On 7/26/17 1:34 PM, Sameer Kumar wrote:
>>> Hi,
>>>
>>> Is there a way I can merge two KTables just like I have in KStreams api.
>>> KBuilder.merge().
>>>
>>> I understand I can use KTable.toStream(), if I choose to use it, is there
>>> any performance cost associated with this conversion or is it just a API
>>> conversion.
>>>
>>> -Sameer.
>>>
>>
>>
>