You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Navneeth Krishnan <re...@gmail.com> on 2019/08/04 06:31:15 UTC

Kafka Streams & Processor API

Hi All,

I'm new to kafka streams and I'm trying to model my use case that is
currently written on flink to kafka streams. I have couple of questions.

- How can I join two kafka topics using the processor API? I have data
stream and change stream which needs to be joined based on a key.

- I read about the "STREAM_TIME" punctuation and it seems unless all
partitions have new records there will be not be an advance in time. Is
there a way to overcome this?

- Is there a way to read the state store data through an API externally? If
so then how does it know where the state resides and if the state is across
multiple machines how does that work?

Thanks for all your help.

Re: Kafka Streams & Processor API

Posted by Boyang Chen <re...@gmail.com>.
Hey Navneeth,

thank you for your interest in trying out Kafka Streams! Normally we will
redirect new folks to the stream FAQ
<https://docs.confluent.io/current/streams/faq.html> first in case you
haven't checked it out. For details to your question:

1. Joining 2 topics using processor API (we call PAPI), you have to create
2 stream nodes, each of which needs to read in one join topic and
materialize data into local state store. At the same time, they should
cross reference other's state store to perform the joining in #process()
upon receiving new records. You could check out KStreamKStreamJoin and
KStreamImpl#join for more details.

2. If you are concerned about delaying due to one partition, I would
recommend you try out wall clock time which advances based on real-world
time. Check out here
<https://docs.confluent.io/current/streams/faq.html#why-is-punctuate-not-called>

3. Not sure what specific read pattern you are looking for, but if you only
need to read single key-value, we have interactive query support
<https://kafka.apache.org/10/documentation/streams/developer-guide/interactive-queries.html>
built in (which potentially solves your remote referral question). If you
want to access the entire state store for debugging, you could opt in to
dump the result like here
<https://docs.confluent.io/current/streams/faq.html#inspecting-streams-and-tables>.
And there is no state store across multiple machines, each individual state
store belongs to one stream task, and they are isolated even on local.

Hope this helps!

Boyang

On Sat, Aug 3, 2019 at 11:31 PM Navneeth Krishnan <re...@gmail.com>
wrote:

> Hi All,
>
> I'm new to kafka streams and I'm trying to model my use case that is
> currently written on flink to kafka streams. I have couple of questions.
>
> - How can I join two kafka topics using the processor API? I have data
> stream and change stream which needs to be joined based on a key.
>
> - I read about the "STREAM_TIME" punctuation and it seems unless all
> partitions have new records there will be not be an advance in time. Is
> there a way to overcome this?
>
> - Is there a way to read the state store data through an API externally? If
> so then how does it know where the state resides and if the state is across
> multiple machines how does that work?
>
> Thanks for all your help.
>