You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Russell Jurney <ru...@gmail.com> on 2012/05/23 07:23:58 UTC

Kafka events to S3?

Is there a simple way to dump Kafka events to S3 yet?

Russell Jurney http://datasyndrome.com

Re: Kafka events to S3?

Posted by Jay Kreps <ja...@gmail.com>.
That is true, but you want lots of messages in a particular S3 bucket, so
you need some kind of separator or delimeter.

-Jay

On Wed, May 23, 2012 at 11:34 AM, Russell Jurney
<ru...@gmail.com>wrote:

> I've always hoped that since Kafka is agnostic about message payload
> format (right?), that written format might be too... but maybe that is
> a bit over simplified.
>
> Russell Jurney http://datasyndrome.com
>
> On May 23, 2012, at 11:19 AM, S Ahmed <sa...@gmail.com> wrote:
>
> >> Kafka handles
> >> scaling the consumption while making sure each consumer gets a subset of
> >> data.
> > Is there a writeup on the algorithm used to do that? Sounds interesting
> :)
> >
> > Agreed, this sounds like more of a contrib.
> >
> > On Wed, May 23, 2012 at 1:49 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> >> Basically it would just be a consumer that wrote to S3. Kafka handles
> >> scaling the consumption while making sure each consumer gets a subset of
> >> data. Probably we could make some command line tool. You would need some
> >> way to let the user control the format of the S3 data in a pluggable
> >> fashion. It could be a contrib package, or even just a separate github
> >> mini-project since it just works off the public api and would really
> just
> >> be used by people who want to get stuff into S3.
> >>
> >> -Jay
> >>
> >> On Wed, May 23, 2012 at 8:21 AM, S Ahmed <sa...@gmail.com> wrote:
> >>
> >>> What would be needed to do this?
> >>>
> >>> Just thinking off the top of my head:
> >>>
> >>> 1. create a zookeeper store to keep track of the last message offset
> >>> persisted to s3, and which messages each consumer is processing.
> >>>
> >>> 2. pull messages off and group in whatever grouping you want (per
> >> message,
> >>> 10 messages, etc.), and spin off a executorservice to push to s3,
> update
> >>> the zookeeper offset.
> >>>
> >>> I'm new to kafka, but I would have to investigate on how multiple
> >> consumers
> >>> can pull messages and push to s3, while not having the consumers pull
> the
> >>> same messages.
> >>> Setting up a zookeeper store to track progress specifically for what
> has
> >>> been pushed to s3.
> >>>
> >>>
> >>> On Wed, May 23, 2012 at 1:35 AM, Russell Jurney <
> >> russell.jurney@gmail.com
> >>>> wrote:
> >>>
> >>>> Yeah, no kidding. I keep waiting on one :)
> >>>>
> >>>> Russell Jurney http://datasyndrome.com
> >>>>
> >>>> On May 22, 2012, at 10:31 PM, Jay Kreps <ja...@gmail.com> wrote:
> >>>>
> >>>>> No. Patches accepted.
> >>>>>
> >>>>> -Jay
> >>>>>
> >>>>> On Tue, May 22, 2012 at 10:23 PM, Russell Jurney
> >>>>> <ru...@gmail.com>wrote:
> >>>>>
> >>>>>> Is there a simple way to dump Kafka events to S3 yet?
> >>>>>>
> >>>>>> Russell Jurney http://datasyndrome.com
> >>>>>>
> >>>>
> >>>
> >>
>

Re: Kafka events to S3?

Posted by Russell Jurney <ru...@gmail.com>.
I've always hoped that since Kafka is agnostic about message payload
format (right?), that written format might be too... but maybe that is
a bit over simplified.

Russell Jurney http://datasyndrome.com

On May 23, 2012, at 11:19 AM, S Ahmed <sa...@gmail.com> wrote:

>> Kafka handles
>> scaling the consumption while making sure each consumer gets a subset of
>> data.
> Is there a writeup on the algorithm used to do that? Sounds interesting :)
>
> Agreed, this sounds like more of a contrib.
>
> On Wed, May 23, 2012 at 1:49 PM, Jay Kreps <ja...@gmail.com> wrote:
>
>> Basically it would just be a consumer that wrote to S3. Kafka handles
>> scaling the consumption while making sure each consumer gets a subset of
>> data. Probably we could make some command line tool. You would need some
>> way to let the user control the format of the S3 data in a pluggable
>> fashion. It could be a contrib package, or even just a separate github
>> mini-project since it just works off the public api and would really just
>> be used by people who want to get stuff into S3.
>>
>> -Jay
>>
>> On Wed, May 23, 2012 at 8:21 AM, S Ahmed <sa...@gmail.com> wrote:
>>
>>> What would be needed to do this?
>>>
>>> Just thinking off the top of my head:
>>>
>>> 1. create a zookeeper store to keep track of the last message offset
>>> persisted to s3, and which messages each consumer is processing.
>>>
>>> 2. pull messages off and group in whatever grouping you want (per
>> message,
>>> 10 messages, etc.), and spin off a executorservice to push to s3, update
>>> the zookeeper offset.
>>>
>>> I'm new to kafka, but I would have to investigate on how multiple
>> consumers
>>> can pull messages and push to s3, while not having the consumers pull the
>>> same messages.
>>> Setting up a zookeeper store to track progress specifically for what has
>>> been pushed to s3.
>>>
>>>
>>> On Wed, May 23, 2012 at 1:35 AM, Russell Jurney <
>> russell.jurney@gmail.com
>>>> wrote:
>>>
>>>> Yeah, no kidding. I keep waiting on one :)
>>>>
>>>> Russell Jurney http://datasyndrome.com
>>>>
>>>> On May 22, 2012, at 10:31 PM, Jay Kreps <ja...@gmail.com> wrote:
>>>>
>>>>> No. Patches accepted.
>>>>>
>>>>> -Jay
>>>>>
>>>>> On Tue, May 22, 2012 at 10:23 PM, Russell Jurney
>>>>> <ru...@gmail.com>wrote:
>>>>>
>>>>>> Is there a simple way to dump Kafka events to S3 yet?
>>>>>>
>>>>>> Russell Jurney http://datasyndrome.com
>>>>>>
>>>>
>>>
>>

Re: Kafka events to S3?

Posted by S Ahmed <sa...@gmail.com>.
>Kafka handles
>scaling the consumption while making sure each consumer gets a subset of
>data.
Is there a writeup on the algorithm used to do that? Sounds interesting :)

Agreed, this sounds like more of a contrib.

On Wed, May 23, 2012 at 1:49 PM, Jay Kreps <ja...@gmail.com> wrote:

> Basically it would just be a consumer that wrote to S3. Kafka handles
> scaling the consumption while making sure each consumer gets a subset of
> data. Probably we could make some command line tool. You would need some
> way to let the user control the format of the S3 data in a pluggable
> fashion. It could be a contrib package, or even just a separate github
> mini-project since it just works off the public api and would really just
> be used by people who want to get stuff into S3.
>
> -Jay
>
> On Wed, May 23, 2012 at 8:21 AM, S Ahmed <sa...@gmail.com> wrote:
>
> > What would be needed to do this?
> >
> > Just thinking off the top of my head:
> >
> > 1. create a zookeeper store to keep track of the last message offset
> > persisted to s3, and which messages each consumer is processing.
> >
> > 2. pull messages off and group in whatever grouping you want (per
> message,
> > 10 messages, etc.), and spin off a executorservice to push to s3, update
> > the zookeeper offset.
> >
> > I'm new to kafka, but I would have to investigate on how multiple
> consumers
> > can pull messages and push to s3, while not having the consumers pull the
> > same messages.
> > Setting up a zookeeper store to track progress specifically for what has
> > been pushed to s3.
> >
> >
> > On Wed, May 23, 2012 at 1:35 AM, Russell Jurney <
> russell.jurney@gmail.com
> > >wrote:
> >
> > > Yeah, no kidding. I keep waiting on one :)
> > >
> > > Russell Jurney http://datasyndrome.com
> > >
> > > On May 22, 2012, at 10:31 PM, Jay Kreps <ja...@gmail.com> wrote:
> > >
> > > > No. Patches accepted.
> > > >
> > > > -Jay
> > > >
> > > > On Tue, May 22, 2012 at 10:23 PM, Russell Jurney
> > > > <ru...@gmail.com>wrote:
> > > >
> > > >> Is there a simple way to dump Kafka events to S3 yet?
> > > >>
> > > >> Russell Jurney http://datasyndrome.com
> > > >>
> > >
> >
>

Re: Kafka events to S3?

Posted by Jay Kreps <ja...@gmail.com>.
Basically it would just be a consumer that wrote to S3. Kafka handles
scaling the consumption while making sure each consumer gets a subset of
data. Probably we could make some command line tool. You would need some
way to let the user control the format of the S3 data in a pluggable
fashion. It could be a contrib package, or even just a separate github
mini-project since it just works off the public api and would really just
be used by people who want to get stuff into S3.

-Jay

On Wed, May 23, 2012 at 8:21 AM, S Ahmed <sa...@gmail.com> wrote:

> What would be needed to do this?
>
> Just thinking off the top of my head:
>
> 1. create a zookeeper store to keep track of the last message offset
> persisted to s3, and which messages each consumer is processing.
>
> 2. pull messages off and group in whatever grouping you want (per message,
> 10 messages, etc.), and spin off a executorservice to push to s3, update
> the zookeeper offset.
>
> I'm new to kafka, but I would have to investigate on how multiple consumers
> can pull messages and push to s3, while not having the consumers pull the
> same messages.
> Setting up a zookeeper store to track progress specifically for what has
> been pushed to s3.
>
>
> On Wed, May 23, 2012 at 1:35 AM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > Yeah, no kidding. I keep waiting on one :)
> >
> > Russell Jurney http://datasyndrome.com
> >
> > On May 22, 2012, at 10:31 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > No. Patches accepted.
> > >
> > > -Jay
> > >
> > > On Tue, May 22, 2012 at 10:23 PM, Russell Jurney
> > > <ru...@gmail.com>wrote:
> > >
> > >> Is there a simple way to dump Kafka events to S3 yet?
> > >>
> > >> Russell Jurney http://datasyndrome.com
> > >>
> >
>

Re: Kafka events to S3?

Posted by S Ahmed <sa...@gmail.com>.
What would be needed to do this?

Just thinking off the top of my head:

1. create a zookeeper store to keep track of the last message offset
persisted to s3, and which messages each consumer is processing.

2. pull messages off and group in whatever grouping you want (per message,
10 messages, etc.), and spin off a executorservice to push to s3, update
the zookeeper offset.

I'm new to kafka, but I would have to investigate on how multiple consumers
can pull messages and push to s3, while not having the consumers pull the
same messages.
Setting up a zookeeper store to track progress specifically for what has
been pushed to s3.


On Wed, May 23, 2012 at 1:35 AM, Russell Jurney <ru...@gmail.com>wrote:

> Yeah, no kidding. I keep waiting on one :)
>
> Russell Jurney http://datasyndrome.com
>
> On May 22, 2012, at 10:31 PM, Jay Kreps <ja...@gmail.com> wrote:
>
> > No. Patches accepted.
> >
> > -Jay
> >
> > On Tue, May 22, 2012 at 10:23 PM, Russell Jurney
> > <ru...@gmail.com>wrote:
> >
> >> Is there a simple way to dump Kafka events to S3 yet?
> >>
> >> Russell Jurney http://datasyndrome.com
> >>
>

Re: Kafka events to S3?

Posted by Russell Jurney <ru...@gmail.com>.
Yeah, no kidding. I keep waiting on one :)

Russell Jurney http://datasyndrome.com

On May 22, 2012, at 10:31 PM, Jay Kreps <ja...@gmail.com> wrote:

> No. Patches accepted.
>
> -Jay
>
> On Tue, May 22, 2012 at 10:23 PM, Russell Jurney
> <ru...@gmail.com>wrote:
>
>> Is there a simple way to dump Kafka events to S3 yet?
>>
>> Russell Jurney http://datasyndrome.com
>>

Re: Kafka events to S3?

Posted by Jay Kreps <ja...@gmail.com>.
No. Patches accepted.

-Jay

On Tue, May 22, 2012 at 10:23 PM, Russell Jurney
<ru...@gmail.com>wrote:

> Is there a simple way to dump Kafka events to S3 yet?
>
> Russell Jurney http://datasyndrome.com
>