You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Chris Toomey <ct...@gmail.com> on 2018/11/13 23:17:34 UTC

Using GlobalKTable/KeyValueStore for topic cache

We're considering using GlobalKTables / KeyValueStores for locally caching
topic content in services. The topics would be compacted such that only the
latest key/value pair would exist for a given key.

One question that's come up is how to determine, when bootstrapping the
app, when the cache has been populated with the latest content from the
topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
an approximateNumEntries() method that we could use to see how much we've
got, but trying to figure out how much there is in the topic looks much
more difficult -- the only way I can see via the APIs / code is to use an
AdminClient to get the topic partitions and then the KafkaConsumer to get
the end offsets for those.

Does anyone have experience doing this kind of caching? How did you handle
the bootstrapping issue?

Any thoughts on easier or better ways to determine when the cache is warm?

thx,
Chris

Re: Using GlobalKTable/KeyValueStore for topic cache

Posted by Chris Toomey <ct...@gmail.com>.
Thanks Bill, the StateRestoreListener is exactly the tool needed for my use
case.

Patrik, thanks for the heads-up on that issue. I guess until it's fixed
that makes it even easier to wait until the cache is warmed :-).

Chris

On Tue, Nov 13, 2018 at 10:40 PM Patrik Kleindl <pk...@gmail.com> wrote:

> Hi Chris
>
> We are using them like you described.
> Performance is very good compared to the database used before.
> Beware that until https://issues.apache.org/jira/browse/KAFKA-7380
> is done the startup will be blocked until all global stores are restored
> (sequentially).
> This can take a little for larger topic and/or multiple global stores.
>
> We are blocking access until they are available although this is not ideal
> in terms of timeout tuning.
>
> Any ideas are welcome.
>
> Best regards
>
> Patrik
>
> > Am 14.11.2018 um 00:17 schrieb Chris Toomey <ct...@gmail.com>:
> >
> > We're considering using GlobalKTables / KeyValueStores for locally
> caching
> > topic content in services. The topics would be compacted such that only
> the
> > latest key/value pair would exist for a given key.
> >
> > One question that's come up is how to determine, when bootstrapping the
> > app, when the cache has been populated with the latest content from the
> > topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
> > an approximateNumEntries() method that we could use to see how much we've
> > got, but trying to figure out how much there is in the topic looks much
> > more difficult -- the only way I can see via the APIs / code is to use an
> > AdminClient to get the topic partitions and then the KafkaConsumer to get
> > the end offsets for those.
> >
> > Does anyone have experience doing this kind of caching? How did you
> handle
> > the bootstrapping issue?
> >
> > Any thoughts on easier or better ways to determine when the cache is
> warm?
> >
> > thx,
> > Chris
>

Re: Using GlobalKTable/KeyValueStore for topic cache

Posted by Patrik Kleindl <pk...@gmail.com>.
Hi Chris

We are using them like you described.
Performance is very good compared to the database used before.
Beware that until https://issues.apache.org/jira/browse/KAFKA-7380
is done the startup will be blocked until all global stores are restored (sequentially).
This can take a little for larger topic and/or multiple global stores.

We are blocking access until they are available although this is not ideal in terms of timeout tuning.

Any ideas are welcome.

Best regards

Patrik

> Am 14.11.2018 um 00:17 schrieb Chris Toomey <ct...@gmail.com>:
> 
> We're considering using GlobalKTables / KeyValueStores for locally caching
> topic content in services. The topics would be compacted such that only the
> latest key/value pair would exist for a given key.
> 
> One question that's come up is how to determine, when bootstrapping the
> app, when the cache has been populated with the latest content from the
> topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
> an approximateNumEntries() method that we could use to see how much we've
> got, but trying to figure out how much there is in the topic looks much
> more difficult -- the only way I can see via the APIs / code is to use an
> AdminClient to get the topic partitions and then the KafkaConsumer to get
> the end offsets for those.
> 
> Does anyone have experience doing this kind of caching? How did you handle
> the bootstrapping issue?
> 
> Any thoughts on easier or better ways to determine when the cache is warm?
> 
> thx,
> Chris

Re: Using GlobalKTable/KeyValueStore for topic cache

Posted by Bill Bejeck <bi...@confluent.io>.
Hi Chris

Yes, for every state store in a kafka streams application, the state
restore listener is executed.

-Bill

On Tue, Nov 13, 2018 at 10:36 PM Chris Toomey <ct...@gmail.com> wrote:

> Thanks Bill. So poking around the code a bit, it looks like perhaps any
> kafka streams execution that produces a state store would execute this
> "restore state store" operation, even creating a new state store, is that
> correct? If so that indeed could be just what I need.
>
>
> On Tue, Nov 13, 2018 at 6:38 PM Bill Bejeck <bi...@confluent.io> wrote:
>
> > Hi Chris,
> >
> > I'm not sure I totally understand your requirements but the
> > StateRestoreListener (
> >
> >
> https://kafka.apache.org/20/javadoc/org/apache/kafka/streams/processor/StateRestoreListener.html
> > )
> > class  provides callbacks when restoring state stores (including
> > GlobalStores) and may provide what you are looking for.
> >
> > Thanks,
> > Bill
> >
> > On Tue, Nov 13, 2018 at 8:49 PM Chris Toomey <ct...@gmail.com> wrote:
> >
> > > Definitely Ryanne -- that's what I meant by "topics would be
> compacted".
> > >
> > > But that doesn't obviate checking bootstrapping progress.
> > >
> > > On Tue, Nov 13, 2018 at 5:04 PM Ryanne Dolan <ry...@gmail.com>
> > > wrote:
> > >
> > > > Chris, consider using log compaction.
> > > >
> > > > Ryanne
> > > >
> > > > On Tue, Nov 13, 2018, 3:17 PM Chris Toomey <ct...@gmail.com>
> wrote:
> > > >
> > > > > We're considering using GlobalKTables / KeyValueStores for locally
> > > > caching
> > > > > topic content in services. The topics would be compacted such that
> > only
> > > > the
> > > > > latest key/value pair would exist for a given key.
> > > > >
> > > > > One question that's come up is how to determine, when bootstrapping
> > the
> > > > > app, when the cache has been populated with the latest content from
> > the
> > > > > topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
> > > > > an approximateNumEntries() method that we could use to see how much
> > > we've
> > > > > got, but trying to figure out how much there is in the topic looks
> > much
> > > > > more difficult -- the only way I can see via the APIs / code is to
> > use
> > > an
> > > > > AdminClient to get the topic partitions and then the KafkaConsumer
> to
> > > get
> > > > > the end offsets for those.
> > > > >
> > > > > Does anyone have experience doing this kind of caching? How did you
> > > > handle
> > > > > the bootstrapping issue?
> > > > >
> > > > > Any thoughts on easier or better ways to determine when the cache
> is
> > > > warm?
> > > > >
> > > > > thx,
> > > > > Chris
> > > > >
> > > >
> > >
> >
>

Re: Using GlobalKTable/KeyValueStore for topic cache

Posted by Chris Toomey <ct...@gmail.com>.
Thanks Bill. So poking around the code a bit, it looks like perhaps any
kafka streams execution that produces a state store would execute this
"restore state store" operation, even creating a new state store, is that
correct? If so that indeed could be just what I need.


On Tue, Nov 13, 2018 at 6:38 PM Bill Bejeck <bi...@confluent.io> wrote:

> Hi Chris,
>
> I'm not sure I totally understand your requirements but the
> StateRestoreListener (
>
> https://kafka.apache.org/20/javadoc/org/apache/kafka/streams/processor/StateRestoreListener.html
> )
> class  provides callbacks when restoring state stores (including
> GlobalStores) and may provide what you are looking for.
>
> Thanks,
> Bill
>
> On Tue, Nov 13, 2018 at 8:49 PM Chris Toomey <ct...@gmail.com> wrote:
>
> > Definitely Ryanne -- that's what I meant by "topics would be compacted".
> >
> > But that doesn't obviate checking bootstrapping progress.
> >
> > On Tue, Nov 13, 2018 at 5:04 PM Ryanne Dolan <ry...@gmail.com>
> > wrote:
> >
> > > Chris, consider using log compaction.
> > >
> > > Ryanne
> > >
> > > On Tue, Nov 13, 2018, 3:17 PM Chris Toomey <ct...@gmail.com> wrote:
> > >
> > > > We're considering using GlobalKTables / KeyValueStores for locally
> > > caching
> > > > topic content in services. The topics would be compacted such that
> only
> > > the
> > > > latest key/value pair would exist for a given key.
> > > >
> > > > One question that's come up is how to determine, when bootstrapping
> the
> > > > app, when the cache has been populated with the latest content from
> the
> > > > topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
> > > > an approximateNumEntries() method that we could use to see how much
> > we've
> > > > got, but trying to figure out how much there is in the topic looks
> much
> > > > more difficult -- the only way I can see via the APIs / code is to
> use
> > an
> > > > AdminClient to get the topic partitions and then the KafkaConsumer to
> > get
> > > > the end offsets for those.
> > > >
> > > > Does anyone have experience doing this kind of caching? How did you
> > > handle
> > > > the bootstrapping issue?
> > > >
> > > > Any thoughts on easier or better ways to determine when the cache is
> > > warm?
> > > >
> > > > thx,
> > > > Chris
> > > >
> > >
> >
>

Re: Using GlobalKTable/KeyValueStore for topic cache

Posted by Bill Bejeck <bi...@confluent.io>.
Hi Chris,

I'm not sure I totally understand your requirements but the
StateRestoreListener (
https://kafka.apache.org/20/javadoc/org/apache/kafka/streams/processor/StateRestoreListener.html)
class  provides callbacks when restoring state stores (including
GlobalStores) and may provide what you are looking for.

Thanks,
Bill

On Tue, Nov 13, 2018 at 8:49 PM Chris Toomey <ct...@gmail.com> wrote:

> Definitely Ryanne -- that's what I meant by "topics would be compacted".
>
> But that doesn't obviate checking bootstrapping progress.
>
> On Tue, Nov 13, 2018 at 5:04 PM Ryanne Dolan <ry...@gmail.com>
> wrote:
>
> > Chris, consider using log compaction.
> >
> > Ryanne
> >
> > On Tue, Nov 13, 2018, 3:17 PM Chris Toomey <ct...@gmail.com> wrote:
> >
> > > We're considering using GlobalKTables / KeyValueStores for locally
> > caching
> > > topic content in services. The topics would be compacted such that only
> > the
> > > latest key/value pair would exist for a given key.
> > >
> > > One question that's come up is how to determine, when bootstrapping the
> > > app, when the cache has been populated with the latest content from the
> > > topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
> > > an approximateNumEntries() method that we could use to see how much
> we've
> > > got, but trying to figure out how much there is in the topic looks much
> > > more difficult -- the only way I can see via the APIs / code is to use
> an
> > > AdminClient to get the topic partitions and then the KafkaConsumer to
> get
> > > the end offsets for those.
> > >
> > > Does anyone have experience doing this kind of caching? How did you
> > handle
> > > the bootstrapping issue?
> > >
> > > Any thoughts on easier or better ways to determine when the cache is
> > warm?
> > >
> > > thx,
> > > Chris
> > >
> >
>

Re: Using GlobalKTable/KeyValueStore for topic cache

Posted by Chris Toomey <ct...@gmail.com>.
Definitely Ryanne -- that's what I meant by "topics would be compacted".

But that doesn't obviate checking bootstrapping progress.

On Tue, Nov 13, 2018 at 5:04 PM Ryanne Dolan <ry...@gmail.com> wrote:

> Chris, consider using log compaction.
>
> Ryanne
>
> On Tue, Nov 13, 2018, 3:17 PM Chris Toomey <ct...@gmail.com> wrote:
>
> > We're considering using GlobalKTables / KeyValueStores for locally
> caching
> > topic content in services. The topics would be compacted such that only
> the
> > latest key/value pair would exist for a given key.
> >
> > One question that's come up is how to determine, when bootstrapping the
> > app, when the cache has been populated with the latest content from the
> > topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
> > an approximateNumEntries() method that we could use to see how much we've
> > got, but trying to figure out how much there is in the topic looks much
> > more difficult -- the only way I can see via the APIs / code is to use an
> > AdminClient to get the topic partitions and then the KafkaConsumer to get
> > the end offsets for those.
> >
> > Does anyone have experience doing this kind of caching? How did you
> handle
> > the bootstrapping issue?
> >
> > Any thoughts on easier or better ways to determine when the cache is
> warm?
> >
> > thx,
> > Chris
> >
>

Re: Using GlobalKTable/KeyValueStore for topic cache

Posted by Ryanne Dolan <ry...@gmail.com>.
Chris, consider using log compaction.

Ryanne

On Tue, Nov 13, 2018, 3:17 PM Chris Toomey <ct...@gmail.com> wrote:

> We're considering using GlobalKTables / KeyValueStores for locally caching
> topic content in services. The topics would be compacted such that only the
> latest key/value pair would exist for a given key.
>
> One question that's come up is how to determine, when bootstrapping the
> app, when the cache has been populated with the latest content from the
> topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
> an approximateNumEntries() method that we could use to see how much we've
> got, but trying to figure out how much there is in the topic looks much
> more difficult -- the only way I can see via the APIs / code is to use an
> AdminClient to get the topic partitions and then the KafkaConsumer to get
> the end offsets for those.
>
> Does anyone have experience doing this kind of caching? How did you handle
> the bootstrapping issue?
>
> Any thoughts on easier or better ways to determine when the cache is warm?
>
> thx,
> Chris
>