You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by David Arthur <mu...@gmail.com> on 2012/11/08 16:32:38 UTC

Kafka stream processing framework?

There is a line item on the project ideas for "improved stream processing libraries". I was wondering if anyone has done any work on this. I know you can hook Kafka into things like Storm and S4(?), but I'm not looking for a CEP/dataflow thing, just distributed stream processing

-David

Re: Kafka stream processing framework?

Posted by Milind Parikh <mi...@gmail.com>.
I have an very early version of streaming in Kafka.... I will not  be able
to respond back for the next nine days because of no internet connectvity..
But the early version (0.0.1) streams events to browser as events become
available.

www.github.com/milindparikh/streamkl

Regards
Milind



On Fri, Nov 9, 2012 at 2:07 AM, Prashanth Menon
<pr...@gmail.com>wrote:

> Hi David,
>
> The pattern you mention is a very common one and while Kafka may be a good
> fit, it's impossible to know without more information.  Mind you, I'm a
> committer ...
>
> - Do you need to re-read or replay messages?
> - Are all your consumers always online?
> - At what rate are messages coming in?
> - Do you need to process all your messages in-order?
>
> What most will suggest is to go with a RabbitMQ or ActiveMQ with a queue +
> workers where IDs are partitioned across the set of workers.  This is
> simple and I suspect shoudl satisfy your requirements.  If the
> "distribution" aspect is especially what you need, you'll have to wait for
> the 0.8 Kafka release.  IIRC, RabbitMQ has clustering capabilities (you'll
> have to fuss around setting up an NFS so durable messages are persisted in
> the cluster) and ActiveMQ can operate in P2P and brokered mode.
>
> Storm, as you mentioned, is more of a stream *processing* system that
> allows you to filter, process, pipe and connect a "firehose".
> Interestingly enough, you can use Kafka as a "firehouse" that feeds into
> Storm, but this isn't what you're looking for (but it's quite interesting
> nonetheless).
>
> Hope that helps - other's are welcome to chime in, too :)
>
> - Prashanth
>
> On Thu, Nov 8, 2012 at 11:38 AM, David Arthur <mu...@gmail.com> wrote:
>
> > Prashanth,
> >
> > Storm seems to be more focused on data flow between Storm processors
> > (spout? bolt? i forget). My particular use case follows this pattern:
> >
> > * read id from kafka queue
> > * fetch object from database
> > * modify the object
> > * write back to database
> >
> > Would Storm be a good fit for this? It doesn't seem to fit in with the
> > whole bolt/spout pattern. It's more like a distributed task queue.
> >
> > Thoughts?
> >
> > On Nov 8, 2012, at 10:45 AM, Prashanth Menon wrote:
> >
> > > Yup, I believe Storm as a KafkaSpout that you can use.  Is there
> > something
> > > specific you were interested in?
> > >
> > > On Thu, Nov 8, 2012 at 10:32 AM, David Arthur <mu...@gmail.com>
> wrote:
> > >
> > >> There is a line item on the project ideas for "improved stream
> > processing
> > >> libraries". I was wondering if anyone has done any work on this. I
> know
> > you
> > >> can hook Kafka into things like Storm and S4(?), but I'm not looking
> > for a
> > >> CEP/dataflow thing, just distributed stream processing
> > >>
> > >> -David
> >
> >
>

Re: Kafka stream processing framework?

Posted by Prashanth Menon <pr...@gmail.com>.
Hi David,

The pattern you mention is a very common one and while Kafka may be a good
fit, it's impossible to know without more information.  Mind you, I'm a
committer ...

- Do you need to re-read or replay messages?
- Are all your consumers always online?
- At what rate are messages coming in?
- Do you need to process all your messages in-order?

What most will suggest is to go with a RabbitMQ or ActiveMQ with a queue +
workers where IDs are partitioned across the set of workers.  This is
simple and I suspect shoudl satisfy your requirements.  If the
"distribution" aspect is especially what you need, you'll have to wait for
the 0.8 Kafka release.  IIRC, RabbitMQ has clustering capabilities (you'll
have to fuss around setting up an NFS so durable messages are persisted in
the cluster) and ActiveMQ can operate in P2P and brokered mode.

Storm, as you mentioned, is more of a stream *processing* system that
allows you to filter, process, pipe and connect a "firehose".
Interestingly enough, you can use Kafka as a "firehouse" that feeds into
Storm, but this isn't what you're looking for (but it's quite interesting
nonetheless).

Hope that helps - other's are welcome to chime in, too :)

- Prashanth

On Thu, Nov 8, 2012 at 11:38 AM, David Arthur <mu...@gmail.com> wrote:

> Prashanth,
>
> Storm seems to be more focused on data flow between Storm processors
> (spout? bolt? i forget). My particular use case follows this pattern:
>
> * read id from kafka queue
> * fetch object from database
> * modify the object
> * write back to database
>
> Would Storm be a good fit for this? It doesn't seem to fit in with the
> whole bolt/spout pattern. It's more like a distributed task queue.
>
> Thoughts?
>
> On Nov 8, 2012, at 10:45 AM, Prashanth Menon wrote:
>
> > Yup, I believe Storm as a KafkaSpout that you can use.  Is there
> something
> > specific you were interested in?
> >
> > On Thu, Nov 8, 2012 at 10:32 AM, David Arthur <mu...@gmail.com> wrote:
> >
> >> There is a line item on the project ideas for "improved stream
> processing
> >> libraries". I was wondering if anyone has done any work on this. I know
> you
> >> can hook Kafka into things like Storm and S4(?), but I'm not looking
> for a
> >> CEP/dataflow thing, just distributed stream processing
> >>
> >> -David
>
>

Re: Kafka stream processing framework?

Posted by David Arthur <mu...@gmail.com>.
Prashanth,

Storm seems to be more focused on data flow between Storm processors (spout? bolt? i forget). My particular use case follows this pattern:

* read id from kafka queue
* fetch object from database
* modify the object
* write back to database

Would Storm be a good fit for this? It doesn't seem to fit in with the whole bolt/spout pattern. It's more like a distributed task queue.

Thoughts?

On Nov 8, 2012, at 10:45 AM, Prashanth Menon wrote:

> Yup, I believe Storm as a KafkaSpout that you can use.  Is there something
> specific you were interested in?
> 
> On Thu, Nov 8, 2012 at 10:32 AM, David Arthur <mu...@gmail.com> wrote:
> 
>> There is a line item on the project ideas for "improved stream processing
>> libraries". I was wondering if anyone has done any work on this. I know you
>> can hook Kafka into things like Storm and S4(?), but I'm not looking for a
>> CEP/dataflow thing, just distributed stream processing
>> 
>> -David


Re: Kafka stream processing framework?

Posted by Prashanth Menon <pr...@gmail.com>.
Yup, I believe Storm as a KafkaSpout that you can use.  Is there something
specific you were interested in?

On Thu, Nov 8, 2012 at 10:32 AM, David Arthur <mu...@gmail.com> wrote:

> There is a line item on the project ideas for "improved stream processing
> libraries". I was wondering if anyone has done any work on this. I know you
> can hook Kafka into things like Storm and S4(?), but I'm not looking for a
> CEP/dataflow thing, just distributed stream processing
>
> -David