You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Andrea Spina <an...@radicalbit.io> on 2018/07/24 10:53:54 UTC

Storing a list of values for the same key

Dear community,
I'd add to my topology a stateful operator - graced with a Store - demanded
to save some compuation A.

I'd like to implement it so that it can store, by the same key, a list of
values by appending A by events come in. Something similar e.g. in Apache
Flink, this can be achieved by the ListState construct. When the semantic
of the events change somehow, I'll trigger the emission of what I stored in
my list state.

I went through the window store code because I thought the base concept was
quite simliar (appending events as they come and computing updates with the
related iterator) but I wasn't able to find any inspiration.

By now I'm considering a key value store, with a surrogate key like key =
(event_key, nth-id-before-emission), which allows me to retrieve the list
of computations when the trigger is fired.

Are there better approaches by which achieving this task, either are there
construct already making this possible?

Thank you everybody.

-- 
*Andrea Spina*
Software Engineer @ Radicalbit Srl
Via Borsieri 41, 20159, Milano - IT

Re: Storing a list of values for the same key

Posted by Guozhang Wang <wa...@gmail.com>.
Yes currently Kafka Streams does not provide natural construct for a list
store. Personally I'd still recommend considering option 2) if your
computational pattern falls in that category.


Guozhang

On Wed, Jul 25, 2018 at 1:43 AM, Andrea Spina <an...@radicalbit.io>
wrote:

> Hi Guozhang,
> thanks for the answer. I meant that I was considering the idea to use a
> surrogate key X= (A,B) where A is the actual key and B is something let me
> understanding it belongs to the list B, which is the last list collected
> until the n-th emission.
> Anyway, 1) is far away from RocksDB optimization layer AFAIK, 2) comes
> little tricky, third approach sounds really interesting and I think I'll
> give a shot.
>
> So AFAIU kafka streams does not provide any construct allowing this task,
> is this right?
>
> Thanks,
>
> Andrea
>
> 2018-07-24 23:54 GMT+02:00 Guozhang Wang <wa...@gmail.com>:
>
> > Hello Andrea,
> >
> > I do not fully understand what does `nth-id-before-emission` mean here,
> but
> > I can think of a couple of options here:
> >
> > 1) Just use a key-value store, with the value encoding the list of events
> > for that key. Whenever a new event of the key gets in, you retrieve the
> > current list for that key, update the list, and put it back into the
> > key-value store. This is programmably most simple, but may not be ideal
> in
> > performance since the default persistent store RocksDB is a
> log-structured
> > store.
> >
> > 2) If your computation is commutative and associative, you can just
> update
> > your computed result for a key whenever a new event of that key is being
> > received.
> >
> > 3) It is more complicated: you can use two stores, where the first store
> is
> > just a plain persistent buffer of all the events not processed / emitted
> so
> > far, and second store is an index from key to the locations of the list
> of
> > events for this key. Its efficiency should be better than 1) but also
> more
> > complicated to program.
> >
> >
> >
> > Guozhang
> >
> >
> >
> > On Tue, Jul 24, 2018 at 3:53 AM, Andrea Spina <
> andrea.spina@radicalbit.io>
> > wrote:
> >
> > > Dear community,
> > > I'd add to my topology a stateful operator - graced with a Store -
> > demanded
> > > to save some compuation A.
> > >
> > > I'd like to implement it so that it can store, by the same key, a list
> of
> > > values by appending A by events come in. Something similar e.g. in
> Apache
> > > Flink, this can be achieved by the ListState construct. When the
> semantic
> > > of the events change somehow, I'll trigger the emission of what I
> stored
> > in
> > > my list state.
> > >
> > > I went through the window store code because I thought the base concept
> > was
> > > quite simliar (appending events as they come and computing updates with
> > the
> > > related iterator) but I wasn't able to find any inspiration.
> > >
> > > By now I'm considering a key value store, with a surrogate key like
> key =
> > > (event_key, nth-id-before-emission), which allows me to retrieve the
> list
> > > of computations when the trigger is fired.
> > >
> > > Are there better approaches by which achieving this task, either are
> > there
> > > construct already making this possible?
> > >
> > > Thank you everybody.
> > >
> > > --
> > > *Andrea Spina*
> > > Software Engineer @ Radicalbit Srl
> > > Via Borsieri 41, 20159, Milano - IT
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> *Andrea Spina*
> Software Engineer @ Radicalbit Srl
> Via Borsieri 41, 20159, Milano - IT
>



-- 
-- Guozhang

Re: Storing a list of values for the same key

Posted by Andrea Spina <an...@radicalbit.io>.
Hi Guozhang,
thanks for the answer. I meant that I was considering the idea to use a
surrogate key X= (A,B) where A is the actual key and B is something let me
understanding it belongs to the list B, which is the last list collected
until the n-th emission.
Anyway, 1) is far away from RocksDB optimization layer AFAIK, 2) comes
little tricky, third approach sounds really interesting and I think I'll
give a shot.

So AFAIU kafka streams does not provide any construct allowing this task,
is this right?

Thanks,

Andrea

2018-07-24 23:54 GMT+02:00 Guozhang Wang <wa...@gmail.com>:

> Hello Andrea,
>
> I do not fully understand what does `nth-id-before-emission` mean here, but
> I can think of a couple of options here:
>
> 1) Just use a key-value store, with the value encoding the list of events
> for that key. Whenever a new event of the key gets in, you retrieve the
> current list for that key, update the list, and put it back into the
> key-value store. This is programmably most simple, but may not be ideal in
> performance since the default persistent store RocksDB is a log-structured
> store.
>
> 2) If your computation is commutative and associative, you can just update
> your computed result for a key whenever a new event of that key is being
> received.
>
> 3) It is more complicated: you can use two stores, where the first store is
> just a plain persistent buffer of all the events not processed / emitted so
> far, and second store is an index from key to the locations of the list of
> events for this key. Its efficiency should be better than 1) but also more
> complicated to program.
>
>
>
> Guozhang
>
>
>
> On Tue, Jul 24, 2018 at 3:53 AM, Andrea Spina <an...@radicalbit.io>
> wrote:
>
> > Dear community,
> > I'd add to my topology a stateful operator - graced with a Store -
> demanded
> > to save some compuation A.
> >
> > I'd like to implement it so that it can store, by the same key, a list of
> > values by appending A by events come in. Something similar e.g. in Apache
> > Flink, this can be achieved by the ListState construct. When the semantic
> > of the events change somehow, I'll trigger the emission of what I stored
> in
> > my list state.
> >
> > I went through the window store code because I thought the base concept
> was
> > quite simliar (appending events as they come and computing updates with
> the
> > related iterator) but I wasn't able to find any inspiration.
> >
> > By now I'm considering a key value store, with a surrogate key like key =
> > (event_key, nth-id-before-emission), which allows me to retrieve the list
> > of computations when the trigger is fired.
> >
> > Are there better approaches by which achieving this task, either are
> there
> > construct already making this possible?
> >
> > Thank you everybody.
> >
> > --
> > *Andrea Spina*
> > Software Engineer @ Radicalbit Srl
> > Via Borsieri 41, 20159, Milano - IT
> >
>
>
>
> --
> -- Guozhang
>



-- 
*Andrea Spina*
Software Engineer @ Radicalbit Srl
Via Borsieri 41, 20159, Milano - IT

Re: Storing a list of values for the same key

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Andrea,

I do not fully understand what does `nth-id-before-emission` mean here, but
I can think of a couple of options here:

1) Just use a key-value store, with the value encoding the list of events
for that key. Whenever a new event of the key gets in, you retrieve the
current list for that key, update the list, and put it back into the
key-value store. This is programmably most simple, but may not be ideal in
performance since the default persistent store RocksDB is a log-structured
store.

2) If your computation is commutative and associative, you can just update
your computed result for a key whenever a new event of that key is being
received.

3) It is more complicated: you can use two stores, where the first store is
just a plain persistent buffer of all the events not processed / emitted so
far, and second store is an index from key to the locations of the list of
events for this key. Its efficiency should be better than 1) but also more
complicated to program.



Guozhang



On Tue, Jul 24, 2018 at 3:53 AM, Andrea Spina <an...@radicalbit.io>
wrote:

> Dear community,
> I'd add to my topology a stateful operator - graced with a Store - demanded
> to save some compuation A.
>
> I'd like to implement it so that it can store, by the same key, a list of
> values by appending A by events come in. Something similar e.g. in Apache
> Flink, this can be achieved by the ListState construct. When the semantic
> of the events change somehow, I'll trigger the emission of what I stored in
> my list state.
>
> I went through the window store code because I thought the base concept was
> quite simliar (appending events as they come and computing updates with the
> related iterator) but I wasn't able to find any inspiration.
>
> By now I'm considering a key value store, with a surrogate key like key =
> (event_key, nth-id-before-emission), which allows me to retrieve the list
> of computations when the trigger is fired.
>
> Are there better approaches by which achieving this task, either are there
> construct already making this possible?
>
> Thank you everybody.
>
> --
> *Andrea Spina*
> Software Engineer @ Radicalbit Srl
> Via Borsieri 41, 20159, Milano - IT
>



-- 
-- Guozhang