You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jay Kreps <ja...@confluent.io> on 2016/03/10 22:26:34 UTC

Kafka Streams

Hey all,

Lot's of people have probably seen the ongoing work on Kafka Streams
happening. There is no real way to design a system like this in a vacuum,
so we put up a blog, some snapshot docs, and something you can download and
use easily to get feedback:

http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple

We'd love comments or thoughts from anyone...

-Jay

Re: Kafka Streams

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Jan,

Kafka Streams does implement the repartitioning for certain join /
aggregate cases, such that the joining streams and tables will be
co-partitioned by the specific keys. This is abstracted from the users that
code with the high-level Streams DSL.

Since Kafka topic partitions are replicated themselves, usually users do
not need to send a record to multiple partitions for replication (admit
that currently only the partition leader can take writes AND reads, but we
are also considering to let followers also handle reads in the future).

Guozhang


On Sat, Mar 12, 2016 at 3:59 PM, Jan Filipiak <Ja...@trivago.com>
wrote:

> Hi,
>
> I am very exited about all of this in general. Sadly I haven’t had the
> time to really take a deep look. One thing that is/was always a difficult
> topic to resolve many to many relationships in table x table x table joins
> is the repartitioning that has to happen at some point.
>
> From the documentation I saw this:
>
> "The *keys* of data records determine the partitioning of data in both
> Kafka and Kafka Streams, i.e. how data is routed to specific partitions
> within topics."
>
> This feels unnecessarily restrictive as i can't currently imagin how to
> resolve many to many relationships with this. One can also emmit every
> record to many partitions to make up for no read replicas in kafka aswell
> as partitioning schemes that don't work like this (Shards processing
> overlapping key spaces).
>
> I would really love to hear your thoughts on these topics. Great work!
> Google grade technologies for everyone!
> I <3 logs
>
>
>
>
> On 10.03.2016 22:26, Jay Kreps wrote:
>
>> Hey all,
>>
>> Lot's of people have probably seen the ongoing work on Kafka Streams
>> happening. There is no real way to design a system like this in a vacuum,
>> so we put up a blog, some snapshot docs, and something you can download
>> and
>> use easily to get feedback:
>>
>>
>> http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
>>
>> We'd love comments or thoughts from anyone...
>>
>> -Jay
>>
>>
>


-- 
-- Guozhang

Re: Kafka Streams

Posted by Jan Filipiak <Ja...@trivago.com>.
Hi,

I am very exited about all of this in general. Sadly I haven’t had the 
time to really take a deep look. One thing that is/was always a 
difficult topic to resolve many to many relationships in table x table x 
table joins is the repartitioning that has to happen at some point.

 From the documentation I saw this:

"The *keys* of data records determine the partitioning of data in both 
Kafka and Kafka Streams, i.e. how data is routed to specific partitions 
within topics."

This feels unnecessarily restrictive as i can't currently imagin how to 
resolve many to many relationships with this. One can also emmit every 
record to many partitions to make up for no read replicas in kafka 
aswell as partitioning schemes that don't work like this (Shards 
processing overlapping key spaces).

I would really love to hear your thoughts on these topics. Great work! 
Google grade technologies for everyone!
I <3 logs



On 10.03.2016 22:26, Jay Kreps wrote:
> Hey all,
>
> Lot's of people have probably seen the ongoing work on Kafka Streams
> happening. There is no real way to design a system like this in a vacuum,
> so we put up a blog, some snapshot docs, and something you can download and
> use easily to get feedback:
>
> http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
>
> We'd love comments or thoughts from anyone...
>
> -Jay
>


Re: Kafka Streams

Posted by Jay Kreps <ja...@confluent.io>.
Hey David,

The commit always happens at a "safe point", when the local portion of the
processing topology has fully processed a set of inputs. The frequency is
controlled by the property commit.interval.ms.

-Jay

On Fri, Mar 11, 2016 at 9:28 AM, David Buschman <da...@timeli.io>
wrote:

> @Jay, I currently use reactive-kaka for my Kafka sources and sinks in my
> stream processing apps. I was interested to see if this new stream API
> would make that setup easier/simpler/better in the future when it becomes
> available.
>
> How does the Streams API handle the commit offsets? Since you are
> processing "1-at-a-time”, is it auto magic on commit handling at the
> beginning/end of the processing or can we specify where in the processing
> an offset commit happens?
>
> Thanks,
>     DaVe.
>
> David Buschman
> dave@timeli.io
>
>
>
> > On Mar 11, 2016, at 7:21 AM, Dick Davies <di...@hellooperator.net> wrote:
> >
> > Nice - I've read topics on the idea of a database as the 'now' view of a
> stream
> > of updates, it's a very powerful concept.
> >
> > Reminds me of Rich Hickeys talk on DAtomic, if anyone's seen that.
> >
> >
> >
> > On 10 March 2016 at 21:26, Jay Kreps <ja...@confluent.io> wrote:
> >> Hey all,
> >>
> >> Lot's of people have probably seen the ongoing work on Kafka Streams
> >> happening. There is no real way to design a system like this in a
> vacuum,
> >> so we put up a blog, some snapshot docs, and something you can download
> and
> >> use easily to get feedback:
> >>
> >>
> http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
> >>
> >> We'd love comments or thoughts from anyone...
> >>
> >> -Jay
>
>

Re: Kafka Streams

Posted by David Buschman <da...@timeli.io>.
@Jay, I currently use reactive-kaka for my Kafka sources and sinks in my stream processing apps. I was interested to see if this new stream API would make that setup easier/simpler/better in the future when it becomes available.

How does the Streams API handle the commit offsets? Since you are processing "1-at-a-time”, is it auto magic on commit handling at the beginning/end of the processing or can we specify where in the processing an offset commit happens? 

Thanks,
    DaVe.

David Buschman
dave@timeli.io



> On Mar 11, 2016, at 7:21 AM, Dick Davies <di...@hellooperator.net> wrote:
> 
> Nice - I've read topics on the idea of a database as the 'now' view of a stream
> of updates, it's a very powerful concept.
> 
> Reminds me of Rich Hickeys talk on DAtomic, if anyone's seen that.
> 
> 
> 
> On 10 March 2016 at 21:26, Jay Kreps <ja...@confluent.io> wrote:
>> Hey all,
>> 
>> Lot's of people have probably seen the ongoing work on Kafka Streams
>> happening. There is no real way to design a system like this in a vacuum,
>> so we put up a blog, some snapshot docs, and something you can download and
>> use easily to get feedback:
>> 
>> http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
>> 
>> We'd love comments or thoughts from anyone...
>> 
>> -Jay


Re: Kafka Streams

Posted by Dick Davies <di...@hellooperator.net>.
Nice - I've read topics on the idea of a database as the 'now' view of a stream
of updates, it's a very powerful concept.

Reminds me of Rich Hickeys talk on DAtomic, if anyone's seen that.



On 10 March 2016 at 21:26, Jay Kreps <ja...@confluent.io> wrote:
> Hey all,
>
> Lot's of people have probably seen the ongoing work on Kafka Streams
> happening. There is no real way to design a system like this in a vacuum,
> so we put up a blog, some snapshot docs, and something you can download and
> use easily to get feedback:
>
> http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
>
> We'd love comments or thoughts from anyone...
>
> -Jay

Re: Kafka Streams

Posted by Gerard Klijs <ge...@dizzit.com>.
Nice read. We just started using kafka, and have multiple cases which need
some kind of stream processing. So we most likely will start testing/using
it as soon as it will be released, adding stream processing containers to
our docker landscape.

On Fri, Mar 11, 2016 at 2:42 AM Jay Kreps <ja...@confluent.io> wrote:

> Hey David,
>
> Yeah I think the similarity to Spark (and Flink and RxJava) is the stream
> api style in the DSL. That is totally the way to go for stream processing.
> We tried really hard to make that work early on when we were doing Samza,
> but we really didn't understand the whole iterator/observable distinction
> and the experiment wasn't very successful. We ended up doing a process()
> callback in Samza which I think is just much much less readable. One of the
> nice things about Kafka Streams is I think we really got this right. The
> API is split into two layers--a kind of infrastructure layer which is based
> on modeling data flow DAGs, in some sense all stream processing boils down
> to this, though it is not necessarily the most readable way to express it.
> This layer is documented here (
>
> http://docs.confluent.io/2.1.0-alpha1/streams/developer-guide.html#streams-developer-guide-processor-api
> ).
> Then on top of that you can layer any kind of DSL or language you like. The
> KStreams layer is our take on a readable DSL.
>
> As for RxJava, it is super cool. We looked at it a little bit as a
> potential alternative language versus doing a custom DSL in KStreams. There
> is enough that is unique to distributed stream processing, including the
> whole table/stream distinction, the details of the partitioning model and
> when data is committed, etc that we felt trying to glue something on top
> would end up being a bit limiting. That said, I think there is
> reactive-streams integration for Kafka, though I have no experience with
> it:
>   https://github.com/akka/reactive-kafka
>
> Cheers,
>
> -Jay
>
> On Thu, Mar 10, 2016 at 3:26 PM, David Buschman <da...@timeli.io>
> wrote:
>
> > Very interesting, looks a lot like many operations from Spark were
> brought
> > across.
> >
> > Any plans to integrate with the reactive-stream protocol for
> > interoperability with libraries akka-stream and RxJava?
> >
> > Thanks,
> >     DaVe.
> >
> > David Buschman
> > dave@timeli.io
> >
> >
> >
> > > On Mar 10, 2016, at 2:26 PM, Jay Kreps <ja...@confluent.io> wrote:
> > >
> > > Hey all,
> > >
> > > Lot's of people have probably seen the ongoing work on Kafka Streams
> > > happening. There is no real way to design a system like this in a
> vacuum,
> > > so we put up a blog, some snapshot docs, and something you can download
> > and
> > > use easily to get feedback:
> > >
> > >
> >
> http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
> > >
> > > We'd love comments or thoughts from anyone...
> > >
> > > -Jay
> >
> >
>

Re: Kafka Streams

Posted by Jay Kreps <ja...@confluent.io>.
Hey David,

Yeah I think the similarity to Spark (and Flink and RxJava) is the stream
api style in the DSL. That is totally the way to go for stream processing.
We tried really hard to make that work early on when we were doing Samza,
but we really didn't understand the whole iterator/observable distinction
and the experiment wasn't very successful. We ended up doing a process()
callback in Samza which I think is just much much less readable. One of the
nice things about Kafka Streams is I think we really got this right. The
API is split into two layers--a kind of infrastructure layer which is based
on modeling data flow DAGs, in some sense all stream processing boils down
to this, though it is not necessarily the most readable way to express it.
This layer is documented here (
http://docs.confluent.io/2.1.0-alpha1/streams/developer-guide.html#streams-developer-guide-processor-api).
Then on top of that you can layer any kind of DSL or language you like. The
KStreams layer is our take on a readable DSL.

As for RxJava, it is super cool. We looked at it a little bit as a
potential alternative language versus doing a custom DSL in KStreams. There
is enough that is unique to distributed stream processing, including the
whole table/stream distinction, the details of the partitioning model and
when data is committed, etc that we felt trying to glue something on top
would end up being a bit limiting. That said, I think there is
reactive-streams integration for Kafka, though I have no experience with it:
  https://github.com/akka/reactive-kafka

Cheers,

-Jay

On Thu, Mar 10, 2016 at 3:26 PM, David Buschman <da...@timeli.io>
wrote:

> Very interesting, looks a lot like many operations from Spark were brought
> across.
>
> Any plans to integrate with the reactive-stream protocol for
> interoperability with libraries akka-stream and RxJava?
>
> Thanks,
>     DaVe.
>
> David Buschman
> dave@timeli.io
>
>
>
> > On Mar 10, 2016, at 2:26 PM, Jay Kreps <ja...@confluent.io> wrote:
> >
> > Hey all,
> >
> > Lot's of people have probably seen the ongoing work on Kafka Streams
> > happening. There is no real way to design a system like this in a vacuum,
> > so we put up a blog, some snapshot docs, and something you can download
> and
> > use easily to get feedback:
> >
> >
> http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
> >
> > We'd love comments or thoughts from anyone...
> >
> > -Jay
>
>

Re: Kafka Streams

Posted by David Buschman <da...@timeli.io>.
Very interesting, looks a lot like many operations from Spark were brought across. 

Any plans to integrate with the reactive-stream protocol for interoperability with libraries akka-stream and RxJava?

Thanks,
    DaVe.

David Buschman
dave@timeli.io



> On Mar 10, 2016, at 2:26 PM, Jay Kreps <ja...@confluent.io> wrote:
> 
> Hey all,
> 
> Lot's of people have probably seen the ongoing work on Kafka Streams
> happening. There is no real way to design a system like this in a vacuum,
> so we put up a blog, some snapshot docs, and something you can download and
> use easily to get feedback:
> 
> http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
> 
> We'd love comments or thoughts from anyone...
> 
> -Jay