You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Peter Kopias <ko...@gmail.com> on 2017/01/19 12:07:25 UTC

Streams: Global state & topic multiplication questions

Greetings Everyone,

 I'm just getting into the kafka world with a sample project, and I've got
two conceptional issues, you might have a trivial answer already at hand to.

 Scenario: multiuser painting webapp, with N user working on M images
simultaneously.   The "brush" events go to one single kafka topic, in a
format: imageid,x,y -> brushevent  , that I aggregate to imageid,x,y

Q1:
 It would be nice to separate the stream to M output topics, so that would
work nice as "partitioning", and also we could just subscribe to update
events of a specific image maybe. How can I fan out the records to
different (maybe not yet existing) topics by using DSL?

 Is that a good idea? (If I can solve every processing in a common
processing graph that would be the best, but then I'd need some high
performance solution of filtering out the noise, as the subscribers are
only interested in a very small subset of the soup.)

Q2:
 - When a new user comes, I'd like give him the latest full image?
 (I could do a "fullimages" output topic, but then also comes the problem
of serious overhead on each incoming update, and also the newcomer should
somehow only get the image he's interested in, not read all the images, and
ignore the others.)

 I know I'm still new to this, but I'd like to learn the best practices you
might already tried.

 Thank you,

 Peter

Re: Streams: Global state & topic multiplication questions

Posted by Peter Kopias <ko...@gmail.com>.

Thank you both for the directions, I'll dive into these.

Peter

On Jan 20, 2017 9:55 AM, "Michael Noll" <mi...@confluent.io> wrote:

> As Eno said I'd use the interactive queries API for Q2.
>
> Demo apps:
> -
> https://github.com/confluentinc/examples/blob/3.
> 1.x/kafka-streams/src/main/java/io/confluent/examples/
> streams/interactivequeries/kafkamusic/KafkaMusicExample.java
> -
> https://github.com/confluentinc/examples/blob/3.
> 1.x/kafka-streams/src/main/java/io/confluent/examples/
> streams/interactivequeries/WordCountInteractiveQueriesExample.java
>
> Further docs:
> http://docs.confluent.io/current/streams/developer-
> guide.html#interactive-queries
>
> (FYI: We have begun moving the info in these docs to the Apache Kafka docs,
> too, but it will take a while.)
>
> -Michael
>
>
>
>
> On Thu, Jan 19, 2017 at 5:23 PM, Eno Thereska <en...@gmail.com>
> wrote:
>
> > For Q2: one way to export the state on demand would be to use the
> > Interactive Queries API (https://www.confluent.io/blog/unifying-stream-
> > processing-and-interactive-queries-in-apache-kafka/ <
> > https://www.confluent.io/blog/unifying-stream-processing-
> and-interactive-
> > queries-in-apache-kafka/>). That can be seen as a current, materialized
> > view of the state. There is some example code in the blog.
> >
> > Eno
> >
> >
> > > On 19 Jan 2017, at 16:11, Peter Kopias <ko...@gmail.com> wrote:
> > >
> > > Q1: Thank you, the branch() is what I'm looking for, I just missed it
> > > somehow.
> > >
> > > Q2:
> > >   I receive something like "imageid,x,y" as key, and a color as value.
> I
> > > aggregate this to something like average color for example.
> > >  So technically I do not have images, I have colored pixels with 3
> > > dimensions one being the image...
> > >  And then my newcomer user wants to join the fun, so we'd need to serve
> > > him an image with the latest current state (all pixels of imageid=x),
> and
> > > of course everything that comes later (updates on the output of the
> > > aggregate, but filtered to that imageid).
> > >
> > > Problem is that how to get the stream api node to "export" part of it's
> > > current state on demand. (All imageid=x keys with values).
> > >
> > > There could be a "request topic" that I could join together with the
> > > aggregated ktable maybe?
> > >
> > > Other problem is following the updates without getting the details of
> > > other images (99% of records are not interesting for the specific
> user).
> > >
> > > Thanks,
> > >
> > > Peter
> > >
> > >
> > >
> > > On Thu, Jan 19, 2017 at 4:55 PM, Eno Thereska <en...@gmail.com>
> > > wrote:
> > >
> > >> Hi Peter,
> > >>
> > >> About Q1: The DSL has the "branch" API, where one stream is branched
> to
> > >> several streams, based on a predicate. I think that could help.
> > >>
> > >> About Q2: I'm not entirely sure I understand the problem space. What
> is
> > >> the definition of a "full image"?
> > >>
> > >> Thanks
> > >> Eno
> > >>> On 19 Jan 2017, at 12:07, Peter Kopias <ko...@gmail.com>
> wrote:
> > >>>
> > >>> Greetings Everyone,
> > >>>
> > >>> I'm just getting into the kafka world with a sample project, and I've
> > got
> > >>> two conceptional issues, you might have a trivial answer already at
> > hand
> > >> to.
> > >>>
> > >>> Scenario: multiuser painting webapp, with N user working on M images
> > >>> simultaneously.   The "brush" events go to one single kafka topic,
> in a
> > >>> format: imageid,x,y -> brushevent  , that I aggregate to imageid,x,y
> > >>>
> > >>> Q1:
> > >>> It would be nice to separate the stream to M output topics, so that
> > would
> > >>> work nice as "partitioning", and also we could just subscribe to
> update
> > >>> events of a specific image maybe. How can I fan out the records to
> > >>> different (maybe not yet existing) topics by using DSL?
> > >>>
> > >>> Is that a good idea? (If I can solve every processing in a common
> > >>> processing graph that would be the best, but then I'd need some high
> > >>> performance solution of filtering out the noise, as the subscribers
> are
> > >>> only interested in a very small subset of the soup.)
> > >>>
> > >>> Q2:
> > >>> - When a new user comes, I'd like give him the latest full image?
> > >>> (I could do a "fullimages" output topic, but then also comes the
> > problem
> > >>> of serious overhead on each incoming update, and also the newcomer
> > should
> > >>> somehow only get the image he's interested in, not read all the
> images,
> > >> and
> > >>> ignore the others.)
> > >>>
> > >>> I know I'm still new to this, but I'd like to learn the best
> practices
> > >> you
> > >>> might already tried.
> > >>>
> > >>> Thank you,
> > >>>
> > >>> Peter
> > >>
> > >>
> >
> >
>

Re: Streams: Global state & topic multiplication questions

Posted by Michael Noll <mi...@confluent.io>.

As Eno said I'd use the interactive queries API for Q2.

Demo apps:
-
https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/java/io/confluent/examples/streams/interactivequeries/kafkamusic/KafkaMusicExample.java
-
https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/java/io/confluent/examples/streams/interactivequeries/WordCountInteractiveQueriesExample.java

Further docs:
http://docs.confluent.io/current/streams/developer-guide.html#interactive-queries

(FYI: We have begun moving the info in these docs to the Apache Kafka docs,
too, but it will take a while.)

-Michael




On Thu, Jan 19, 2017 at 5:23 PM, Eno Thereska <en...@gmail.com>
wrote:

> For Q2: one way to export the state on demand would be to use the
> Interactive Queries API (https://www.confluent.io/blog/unifying-stream-
> processing-and-interactive-queries-in-apache-kafka/ <
> https://www.confluent.io/blog/unifying-stream-processing-and-interactive-
> queries-in-apache-kafka/>). That can be seen as a current, materialized
> view of the state. There is some example code in the blog.
>
> Eno
>
>
> > On 19 Jan 2017, at 16:11, Peter Kopias <ko...@gmail.com> wrote:
> >
> > Q1: Thank you, the branch() is what I'm looking for, I just missed it
> > somehow.
> >
> > Q2:
> >   I receive something like "imageid,x,y" as key, and a color as value. I
> > aggregate this to something like average color for example.
> >  So technically I do not have images, I have colored pixels with 3
> > dimensions one being the image...
> >  And then my newcomer user wants to join the fun, so we'd need to serve
> > him an image with the latest current state (all pixels of imageid=x), and
> > of course everything that comes later (updates on the output of the
> > aggregate, but filtered to that imageid).
> >
> > Problem is that how to get the stream api node to "export" part of it's
> > current state on demand. (All imageid=x keys with values).
> >
> > There could be a "request topic" that I could join together with the
> > aggregated ktable maybe?
> >
> > Other problem is following the updates without getting the details of
> > other images (99% of records are not interesting for the specific user).
> >
> > Thanks,
> >
> > Peter
> >
> >
> >
> > On Thu, Jan 19, 2017 at 4:55 PM, Eno Thereska <en...@gmail.com>
> > wrote:
> >
> >> Hi Peter,
> >>
> >> About Q1: The DSL has the "branch" API, where one stream is branched to
> >> several streams, based on a predicate. I think that could help.
> >>
> >> About Q2: I'm not entirely sure I understand the problem space. What is
> >> the definition of a "full image"?
> >>
> >> Thanks
> >> Eno
> >>> On 19 Jan 2017, at 12:07, Peter Kopias <ko...@gmail.com> wrote:
> >>>
> >>> Greetings Everyone,
> >>>
> >>> I'm just getting into the kafka world with a sample project, and I've
> got
> >>> two conceptional issues, you might have a trivial answer already at
> hand
> >> to.
> >>>
> >>> Scenario: multiuser painting webapp, with N user working on M images
> >>> simultaneously.   The "brush" events go to one single kafka topic, in a
> >>> format: imageid,x,y -> brushevent  , that I aggregate to imageid,x,y
> >>>
> >>> Q1:
> >>> It would be nice to separate the stream to M output topics, so that
> would
> >>> work nice as "partitioning", and also we could just subscribe to update
> >>> events of a specific image maybe. How can I fan out the records to
> >>> different (maybe not yet existing) topics by using DSL?
> >>>
> >>> Is that a good idea? (If I can solve every processing in a common
> >>> processing graph that would be the best, but then I'd need some high
> >>> performance solution of filtering out the noise, as the subscribers are
> >>> only interested in a very small subset of the soup.)
> >>>
> >>> Q2:
> >>> - When a new user comes, I'd like give him the latest full image?
> >>> (I could do a "fullimages" output topic, but then also comes the
> problem
> >>> of serious overhead on each incoming update, and also the newcomer
> should
> >>> somehow only get the image he's interested in, not read all the images,
> >> and
> >>> ignore the others.)
> >>>
> >>> I know I'm still new to this, but I'd like to learn the best practices
> >> you
> >>> might already tried.
> >>>
> >>> Thank you,
> >>>
> >>> Peter
> >>
> >>
>
>

Re: Streams: Global state & topic multiplication questions

Posted by Eno Thereska <en...@gmail.com>.

For Q2: one way to export the state on demand would be to use the Interactive Queries API (https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/ <https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/>). That can be seen as a current, materialized view of the state. There is some example code in the blog.

Eno


> On 19 Jan 2017, at 16:11, Peter Kopias <ko...@gmail.com> wrote:
> 
> Q1: Thank you, the branch() is what I'm looking for, I just missed it
> somehow.
> 
> Q2:
>   I receive something like "imageid,x,y" as key, and a color as value. I
> aggregate this to something like average color for example.
>  So technically I do not have images, I have colored pixels with 3
> dimensions one being the image...
>  And then my newcomer user wants to join the fun, so we'd need to serve
> him an image with the latest current state (all pixels of imageid=x), and
> of course everything that comes later (updates on the output of the
> aggregate, but filtered to that imageid).
> 
> Problem is that how to get the stream api node to "export" part of it's
> current state on demand. (All imageid=x keys with values).
> 
> There could be a "request topic" that I could join together with the
> aggregated ktable maybe?
> 
> Other problem is following the updates without getting the details of
> other images (99% of records are not interesting for the specific user).
> 
> Thanks,
> 
> Peter
> 
> 
> 
> On Thu, Jan 19, 2017 at 4:55 PM, Eno Thereska <en...@gmail.com>
> wrote:
> 
>> Hi Peter,
>> 
>> About Q1: The DSL has the "branch" API, where one stream is branched to
>> several streams, based on a predicate. I think that could help.
>> 
>> About Q2: I'm not entirely sure I understand the problem space. What is
>> the definition of a "full image"?
>> 
>> Thanks
>> Eno
>>> On 19 Jan 2017, at 12:07, Peter Kopias <ko...@gmail.com> wrote:
>>> 
>>> Greetings Everyone,
>>> 
>>> I'm just getting into the kafka world with a sample project, and I've got
>>> two conceptional issues, you might have a trivial answer already at hand
>> to.
>>> 
>>> Scenario: multiuser painting webapp, with N user working on M images
>>> simultaneously.   The "brush" events go to one single kafka topic, in a
>>> format: imageid,x,y -> brushevent  , that I aggregate to imageid,x,y
>>> 
>>> Q1:
>>> It would be nice to separate the stream to M output topics, so that would
>>> work nice as "partitioning", and also we could just subscribe to update
>>> events of a specific image maybe. How can I fan out the records to
>>> different (maybe not yet existing) topics by using DSL?
>>> 
>>> Is that a good idea? (If I can solve every processing in a common
>>> processing graph that would be the best, but then I'd need some high
>>> performance solution of filtering out the noise, as the subscribers are
>>> only interested in a very small subset of the soup.)
>>> 
>>> Q2:
>>> - When a new user comes, I'd like give him the latest full image?
>>> (I could do a "fullimages" output topic, but then also comes the problem
>>> of serious overhead on each incoming update, and also the newcomer should
>>> somehow only get the image he's interested in, not read all the images,
>> and
>>> ignore the others.)
>>> 
>>> I know I'm still new to this, but I'd like to learn the best practices
>> you
>>> might already tried.
>>> 
>>> Thank you,
>>> 
>>> Peter
>> 
>>

Re: Streams: Global state & topic multiplication questions

Posted by Peter Kopias <ko...@gmail.com>.

Q1: Thank you, the branch() is what I'm looking for, I just missed it
somehow.

Q2:
   I receive something like "imageid,x,y" as key, and a color as value. I
aggregate this to something like average color for example.
  So technically I do not have images, I have colored pixels with 3
dimensions one being the image...
  And then my newcomer user wants to join the fun, so we'd need to serve
him an image with the latest current state (all pixels of imageid=x), and
of course everything that comes later (updates on the output of the
aggregate, but filtered to that imageid).

 Problem is that how to get the stream api node to "export" part of it's
current state on demand. (All imageid=x keys with values).

 There could be a "request topic" that I could join together with the
aggregated ktable maybe?

 Other problem is following the updates without getting the details of
other images (99% of records are not interesting for the specific user).

 Thanks,

 Peter

On Thu, Jan 19, 2017 at 4:55 PM, Eno Thereska <en...@gmail.com>
wrote:

> Hi Peter,
>
> About Q1: The DSL has the "branch" API, where one stream is branched to
> several streams, based on a predicate. I think that could help.
>
> About Q2: I'm not entirely sure I understand the problem space. What is
> the definition of a "full image"?
>
> Thanks
> Eno
> > On 19 Jan 2017, at 12:07, Peter Kopias <ko...@gmail.com> wrote:
> >
> > Greetings Everyone,
> >
> > I'm just getting into the kafka world with a sample project, and I've got
> > two conceptional issues, you might have a trivial answer already at hand
> to.
> >
> > Scenario: multiuser painting webapp, with N user working on M images
> > simultaneously.   The "brush" events go to one single kafka topic, in a
> > format: imageid,x,y -> brushevent  , that I aggregate to imageid,x,y
> >
> > Q1:
> > It would be nice to separate the stream to M output topics, so that would
> > work nice as "partitioning", and also we could just subscribe to update
> > events of a specific image maybe. How can I fan out the records to
> > different (maybe not yet existing) topics by using DSL?
> >
> > Is that a good idea? (If I can solve every processing in a common
> > processing graph that would be the best, but then I'd need some high
> > performance solution of filtering out the noise, as the subscribers are
> > only interested in a very small subset of the soup.)
> >
> > Q2:
> > - When a new user comes, I'd like give him the latest full image?
> > (I could do a "fullimages" output topic, but then also comes the problem
> > of serious overhead on each incoming update, and also the newcomer should
> > somehow only get the image he's interested in, not read all the images,
> and
> > ignore the others.)
> >
> > I know I'm still new to this, but I'd like to learn the best practices
> you
> > might already tried.
> >
> > Thank you,
> >
> > Peter
>
>

Re: Streams: Global state & topic multiplication questions

Posted by Eno Thereska <en...@gmail.com>.

Hi Peter,

About Q1: The DSL has the "branch" API, where one stream is branched to several streams, based on a predicate. I think that could help.

About Q2: I'm not entirely sure I understand the problem space. What is the definition of a "full image"?

Thanks
Eno
> On 19 Jan 2017, at 12:07, Peter Kopias <ko...@gmail.com> wrote:
> 
> Greetings Everyone,
> 
> I'm just getting into the kafka world with a sample project, and I've got
> two conceptional issues, you might have a trivial answer already at hand to.
> 
> Scenario: multiuser painting webapp, with N user working on M images
> simultaneously.   The "brush" events go to one single kafka topic, in a
> format: imageid,x,y -> brushevent  , that I aggregate to imageid,x,y
> 
> Q1:
> It would be nice to separate the stream to M output topics, so that would
> work nice as "partitioning", and also we could just subscribe to update
> events of a specific image maybe. How can I fan out the records to
> different (maybe not yet existing) topics by using DSL?
> 
> Is that a good idea? (If I can solve every processing in a common
> processing graph that would be the best, but then I'd need some high
> performance solution of filtering out the noise, as the subscribers are
> only interested in a very small subset of the soup.)
> 
> Q2:
> - When a new user comes, I'd like give him the latest full image?
> (I could do a "fullimages" output topic, but then also comes the problem
> of serious overhead on each incoming update, and also the newcomer should
> somehow only get the image he's interested in, not read all the images, and
> ignore the others.)
> 
> I know I'm still new to this, but I'd like to learn the best practices you
> might already tried.
> 
> Thank you,
> 
> Peter