You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Gleb Stsenov <gl...@gmail.com> on 2018/07/04 15:34:06 UTC

Processor API, how to get last N hours word count

Hello,
Just started with Kafka, took 2.0 because it has better unit test support.
Built custom processor, which is basically same as WordCountProcessorDemo
example (github https://goo.gl/XSh7iW ) and can be treated as equal to
that.

Built topology by adding source topic, processor, state store, and sink (to
topic).
Played with KafkaTool and console consumers, can see my wordcounts since
beginning of publisher life.

*Question*:
what would be the correct way to get wordcounts for last 24h?

On topology creation, on processor init, or somehow related to "interactive
queries" feature?
First goal is answering to "what are the most common words during last 24h
(having word count > configured_X)".
Second goal is custom time window. Kinda if word "user" is the most common
in 24h, I want to know its word count for last 36h, or so. Sounds like
"interactive query" for me, but not sure.

Read description of window types in doc, but can't get the idea of applying
them to processor API.

Thank you.
BR,
Gleb.

Re: Processor API, how to get last N hours word count

Posted by Guozhang Wang <wa...@gmail.com>.
We do not have in-memory window stores implemented yet:
https://issues.apache.org/jira/browse/KAFKA-4730

On Wed, Jul 4, 2018 at 12:05 PM, Gleb Stsenov <gl...@gmail.com>
wrote:

> Hello Guozhang,
> Thank you!
> One thing to clarify: so, only persistent store can be windowed, right?
> Demo default in-memory key-value store won't work?
> /G.
>
> On Wed, Jul 4, 2018 at 9:45 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
> > Hello Gleb,
> >
> > For the first question, you should use a windowed store in your topology
> > builder.
> >
> > And for the first / second question, I think using interactive query
> should
> > be fine, i.e. you can create different windows for different length, and
> > use interactive queries to get the count on different window spans.
> >
> >
> > Guozhang
> >
> >
> >
> > On Wed, Jul 4, 2018 at 8:34 AM, Gleb Stsenov <gl...@gmail.com>
> > wrote:
> >
> > > Hello,
> > > Just started with Kafka, took 2.0 because it has better unit test
> > support.
> > > Built custom processor, which is basically same as
> WordCountProcessorDemo
> > > example (github https://goo.gl/XSh7iW ) and can be treated as equal to
> > > that.
> > >
> > > Built topology by adding source topic, processor, state store, and sink
> > (to
> > > topic).
> > > Played with KafkaTool and console consumers, can see my wordcounts
> since
> > > beginning of publisher life.
> > >
> > > *Question*:
> > > what would be the correct way to get wordcounts for last 24h?
> > >
> > > On topology creation, on processor init, or somehow related to
> > "interactive
> > > queries" feature?
> > > First goal is answering to "what are the most common words during last
> > 24h
> > > (having word count > configured_X)".
> > > Second goal is custom time window. Kinda if word "user" is the most
> > common
> > > in 24h, I want to know its word count for last 36h, or so. Sounds like
> > > "interactive query" for me, but not sure.
> > >
> > > Read description of window types in doc, but can't get the idea of
> > applying
> > > them to processor API.
> > >
> > > Thank you.
> > > BR,
> > > Gleb.
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: Processor API, how to get last N hours word count

Posted by Gleb Stsenov <gl...@gmail.com>.
Hello Guozhang,
Thank you!
One thing to clarify: so, only persistent store can be windowed, right?
Demo default in-memory key-value store won't work?
/G.

On Wed, Jul 4, 2018 at 9:45 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Hello Gleb,
>
> For the first question, you should use a windowed store in your topology
> builder.
>
> And for the first / second question, I think using interactive query should
> be fine, i.e. you can create different windows for different length, and
> use interactive queries to get the count on different window spans.
>
>
> Guozhang
>
>
>
> On Wed, Jul 4, 2018 at 8:34 AM, Gleb Stsenov <gl...@gmail.com>
> wrote:
>
> > Hello,
> > Just started with Kafka, took 2.0 because it has better unit test
> support.
> > Built custom processor, which is basically same as WordCountProcessorDemo
> > example (github https://goo.gl/XSh7iW ) and can be treated as equal to
> > that.
> >
> > Built topology by adding source topic, processor, state store, and sink
> (to
> > topic).
> > Played with KafkaTool and console consumers, can see my wordcounts since
> > beginning of publisher life.
> >
> > *Question*:
> > what would be the correct way to get wordcounts for last 24h?
> >
> > On topology creation, on processor init, or somehow related to
> "interactive
> > queries" feature?
> > First goal is answering to "what are the most common words during last
> 24h
> > (having word count > configured_X)".
> > Second goal is custom time window. Kinda if word "user" is the most
> common
> > in 24h, I want to know its word count for last 36h, or so. Sounds like
> > "interactive query" for me, but not sure.
> >
> > Read description of window types in doc, but can't get the idea of
> applying
> > them to processor API.
> >
> > Thank you.
> > BR,
> > Gleb.
> >
>
>
>
> --
> -- Guozhang
>

Re: Processor API, how to get last N hours word count

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Gleb,

For the first question, you should use a windowed store in your topology
builder.

And for the first / second question, I think using interactive query should
be fine, i.e. you can create different windows for different length, and
use interactive queries to get the count on different window spans.


Guozhang



On Wed, Jul 4, 2018 at 8:34 AM, Gleb Stsenov <gl...@gmail.com> wrote:

> Hello,
> Just started with Kafka, took 2.0 because it has better unit test support.
> Built custom processor, which is basically same as WordCountProcessorDemo
> example (github https://goo.gl/XSh7iW ) and can be treated as equal to
> that.
>
> Built topology by adding source topic, processor, state store, and sink (to
> topic).
> Played with KafkaTool and console consumers, can see my wordcounts since
> beginning of publisher life.
>
> *Question*:
> what would be the correct way to get wordcounts for last 24h?
>
> On topology creation, on processor init, or somehow related to "interactive
> queries" feature?
> First goal is answering to "what are the most common words during last 24h
> (having word count > configured_X)".
> Second goal is custom time window. Kinda if word "user" is the most common
> in 24h, I want to know its word count for last 36h, or so. Sounds like
> "interactive query" for me, but not sure.
>
> Read description of window types in doc, but can't get the idea of applying
> them to processor API.
>
> Thank you.
> BR,
> Gleb.
>



-- 
-- Guozhang