You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Steven Schlansker <ss...@opentable.com> on 2017/03/23 19:00:04 UTC

Kafka Streams and reliable state stores

Hello everyone,

I am looking to enhance my Kafka Streams based application from
one instance to many.

Part of the difficulty is the it seems that all of the state providers
are "instance local", either in memory or on local disk.  This means to
answer queries for non-local partitions you have to proxy them to another
node.  There's also race conditions here -- what if node B owns partition 1,
node A redirects a query from a key in that partition, then B fails over to A
concurrently?

I am curious why there's not an option to e.g. use a Memcache, Redis, or Cassandra cluster (or similar).
Seems that it would simplify the inter-node communication (you just speak to a cluster using
e.g. consistent hashing for keys) and improve availability (application node crashing doesn't imply
loss of state for affected partitions)

Is this just because nobody has written it yet?  Is there some reason that having
strictly local storage plus a "gossip" like protocol is superior?

How are other people doing this?

Thanks,
Steven



Re: Kafka Streams and reliable state stores

Posted by "Matthias J. Sax" <ma...@confluent.io>.
> There's also race conditions here -- what if node B owns partition 1,
> node A redirects a query from a key in that partition, then B fails over to A
> concurrently?

You will get an exception, and you need to refresh your metadata.
Afterward, you need to query again.


This blog posts gives more details:
 -
https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
 -
https://blog.codecentric.de/en/2017/03/interactive-queries-in-apache-kafka-streams/


Long story short:
 - you can still use an external system instead of IQ feature (both have
pros/cons)
 - there are not plans to add anything more to Streams -- it's a library
on only provides the basic building blocks



-Matthias


On 3/23/17 12:00 PM, Steven Schlansker wrote:
> Hello everyone,
> 
> I am looking to enhance my Kafka Streams based application from
> one instance to many.
> 
> Part of the difficulty is the it seems that all of the state providers
> are "instance local", either in memory or on local disk.  This means to
> answer queries for non-local partitions you have to proxy them to another
> node.  There's also race conditions here -- what if node B owns partition 1,
> node A redirects a query from a key in that partition, then B fails over to A
> concurrently?
> 
> I am curious why there's not an option to e.g. use a Memcache, Redis, or Cassandra cluster (or similar).
> Seems that it would simplify the inter-node communication (you just speak to a cluster using
> e.g. consistent hashing for keys) and improve availability (application node crashing doesn't imply
> loss of state for affected partitions)
> 
> Is this just because nobody has written it yet?  Is there some reason that having
> strictly local storage plus a "gossip" like protocol is superior?
> 
> How are other people doing this?
> 
> Thanks,
> Steven
> 
>