You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Gautam Singaraju <ga...@gmail.com> on 2012/03/20 17:07:45 UTC

Kafka in AWS?

We are have been considering Kafka for a new Data Platform. Has someone
used Kafka in AWS? If so, could you please share your experiences with us?

Thank you!
---
Gautam

Re: Kafka in AWS?

Posted by Neha Narkhede <ne...@gmail.com>.
Vaibhav,

Thanks for explaining your use case. I think I see the requirement
here. It seems like you need the data in S3 since you use Elastic
MapReduce to process your data. I guess that's the reason the Hadoop
input/output formats that Kafka provides are not directly useful.

I have some ideas on how this can be done. Will write them up on a wiki soon.

Thanks,
Neha

On Tue, Mar 20, 2012 at 10:21 PM, Vaibhav Puranik <vp...@gmail.com> wrote:
> Neha,
>
> My requirement is not related to Russell's, but I thought it will be
> helpful describe what we need at GumGum <http://gumgum.com/>.
> I wasn't sure whether it's Kafka domain since kafka gives you a topic
> to pull  data from and then it's up to you to do whatever with it.
>
> But since we are talking about it, here is what we do everyday (currently
> without Kafka):
>
> We are a ad network. We write all of our impressions and clicks data in
> various log files and upload it to S3. At night we run many Map reduce jobs
> to aggregate this data in various ways.
> We have an 'Autoscaled' cluster in AWS. Our webservers keep going up and
> down based on the load on the system.
>
> Whenever a server shuts down we tend to lose data. Many times file upload
> is not completed in time before the server shuts down. That is why we are
> looking at implementing Kafka to send events in real time to S3 without
> losing them.
>
> If there exists a 'sink' that transfers data to S3, our job will be lot
> easier. But again, I am not sure whether Kafka is supposed to provide that
> or not.
>
> Regards,
> Vaibhav
>
>
> On Tue, Mar 20, 2012 at 10:03 PM, Neha Narkhede <ne...@gmail.com>wrote:
>
>> Russell,
>>
>> By "sink events into S3", do you mean you want to have some plugin that
>> will suck data out of your Kafka brokers and upload to S3. Would you mind
>> describing use cases that would require to send data to Kafka, then upload
>> data to S3, and then use it by querying S3 ?
>>
>> Thanks,
>> Neha
>> On Mar 20, 2012 4:51 PM, "Russell Jurney" <ru...@gmail.com>
>> wrote:
>>
>> > I think as soon as someone commits code that reliably sinks events to S3,
>> > Kafka adoption will skyrocket.  There is no good solution to this yet.
>> >  MANY people want one.
>> >
>> > Russ
>> >
>> > On Tue, Mar 20, 2012 at 3:32 PM, Felix GV <fe...@mate1inc.com> wrote:
>> >
>> > > The primary use case for Kafka is to use it on AWS...???
>> > >
>> > > Sorry if I put words you didn't intend in your mouth :P ... I just
>> > thought
>> > > that sounded funny ;)
>> > >
>> > > Sorry for being off-topic. Carry on :/ !
>> > >
>> > > --
>> > > Felix
>> > >
>> > >
>> > >
>> > > On Tue, Mar 20, 2012 at 6:23 PM, Russell Jurney <
>> > russell.jurney@gmail.com
>> > > >wrote:
>> > >
>> > > > Yeah, that is the part I am hoping someone will contribute :)  I
>> know I
>> > > can
>> > > > write that myself.  I also know it will be buggy and that I will have
>> > > lots
>> > > > of trouble.
>> > > >
>> > > > If you contribute this code, it would be a huge boon to Kafka.  It is
>> > imo
>> > > > the primary use case for Kafka atm... if only the code gets into git.
>> > > >
>> > > > On Tue, Mar 20, 2012 at 3:04 PM, Niek Sanders <
>> niek.sanders@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > Russell,
>> > > > >
>> > > > > I'm actually in the process of writing a Java code to go from Kafka
>> > > > > messages to S3.  I might be able to rip-out my application-specific
>> > > > > parts and share something later tonight.
>> > > > >
>> > > > > The biggest hassle is that you can't append to existing S3 files.
>>  So
>> > > > > unless you're planning on uploading each message as a separate S3
>> > > > > object, this means you need message aggregation smarts on the Kafka
>> > > > > consumer / S3 uploader side of things.
>> > > > >
>> > > > > Best,
>> > > > > Niek
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, Mar 20, 2012 at 12:56 PM, Russell Jurney
>> > > > > <ru...@gmail.com> wrote:
>> > > > > > I wish someone would publish some source that writes events to
>> S3.
>> > > > > >
>> > > > > > Russell Jurney
>> > > > > > twitter.com/rjurney
>> > > > > > russell.jurney@gmail.com
>> > > > > > datasyndrome.com
>> > > > > >
>> > > > > > On Mar 20, 2012, at 11:20 AM, Dave Fayram <df...@gmail.com>
>> > wrote:
>> > > > > >
>> > > > > >> We've been successfully using Kafka on AWS as well, and JMX wise
>> > we
>> > > > > >> just use an SSH tunnel.
>> > > > > >>
>> > > > > >> In general, we've been very happy with the performance on AWS,
>> > which
>> > > > > >> some people have reservations about due to the I/O situation on
>> > most
>> > > > > >> Amazon boxes.
>> > > > > >>
>> > > > > >> On Tue, Mar 20, 2012 at 9:07 AM, Gautam Singaraju
>> > > > > >> <ga...@gmail.com> wrote:
>> > > > > >>> We are have been considering Kafka for a new Data Platform. Has
>> > > > someone
>> > > > > >>> used Kafka in AWS? If so, could you please share your
>> experiences
>> > > > with
>> > > > > us?
>> > > > > >>>
>> > > > > >>> Thank you!
>> > > > > >>> ---
>> > > > > >>> Gautam
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > > >> --
>> > > > > >> --
>> > > > > >> Dave Fayram
>> > > > > >> dfayram@gmail.com
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> > > > datasyndrome.com
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> > datasyndrome.com
>> >
>>

Re: Kafka in AWS?

Posted by Vaibhav Puranik <vp...@gmail.com>.
Neha,

My requirement is not related to Russell's, but I thought it will be
helpful describe what we need at GumGum <http://gumgum.com/>.
I wasn't sure whether it's Kafka domain since kafka gives you a topic
to pull  data from and then it's up to you to do whatever with it.

But since we are talking about it, here is what we do everyday (currently
without Kafka):

We are a ad network. We write all of our impressions and clicks data in
various log files and upload it to S3. At night we run many Map reduce jobs
to aggregate this data in various ways.
We have an 'Autoscaled' cluster in AWS. Our webservers keep going up and
down based on the load on the system.

Whenever a server shuts down we tend to lose data. Many times file upload
is not completed in time before the server shuts down. That is why we are
looking at implementing Kafka to send events in real time to S3 without
losing them.

If there exists a 'sink' that transfers data to S3, our job will be lot
easier. But again, I am not sure whether Kafka is supposed to provide that
or not.

Regards,
Vaibhav


On Tue, Mar 20, 2012 at 10:03 PM, Neha Narkhede <ne...@gmail.com>wrote:

> Russell,
>
> By "sink events into S3", do you mean you want to have some plugin that
> will suck data out of your Kafka brokers and upload to S3. Would you mind
> describing use cases that would require to send data to Kafka, then upload
> data to S3, and then use it by querying S3 ?
>
> Thanks,
> Neha
> On Mar 20, 2012 4:51 PM, "Russell Jurney" <ru...@gmail.com>
> wrote:
>
> > I think as soon as someone commits code that reliably sinks events to S3,
> > Kafka adoption will skyrocket.  There is no good solution to this yet.
> >  MANY people want one.
> >
> > Russ
> >
> > On Tue, Mar 20, 2012 at 3:32 PM, Felix GV <fe...@mate1inc.com> wrote:
> >
> > > The primary use case for Kafka is to use it on AWS...???
> > >
> > > Sorry if I put words you didn't intend in your mouth :P ... I just
> > thought
> > > that sounded funny ;)
> > >
> > > Sorry for being off-topic. Carry on :/ !
> > >
> > > --
> > > Felix
> > >
> > >
> > >
> > > On Tue, Mar 20, 2012 at 6:23 PM, Russell Jurney <
> > russell.jurney@gmail.com
> > > >wrote:
> > >
> > > > Yeah, that is the part I am hoping someone will contribute :)  I
> know I
> > > can
> > > > write that myself.  I also know it will be buggy and that I will have
> > > lots
> > > > of trouble.
> > > >
> > > > If you contribute this code, it would be a huge boon to Kafka.  It is
> > imo
> > > > the primary use case for Kafka atm... if only the code gets into git.
> > > >
> > > > On Tue, Mar 20, 2012 at 3:04 PM, Niek Sanders <
> niek.sanders@gmail.com
> > > > >wrote:
> > > >
> > > > > Russell,
> > > > >
> > > > > I'm actually in the process of writing a Java code to go from Kafka
> > > > > messages to S3.  I might be able to rip-out my application-specific
> > > > > parts and share something later tonight.
> > > > >
> > > > > The biggest hassle is that you can't append to existing S3 files.
>  So
> > > > > unless you're planning on uploading each message as a separate S3
> > > > > object, this means you need message aggregation smarts on the Kafka
> > > > > consumer / S3 uploader side of things.
> > > > >
> > > > > Best,
> > > > > Niek
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Mar 20, 2012 at 12:56 PM, Russell Jurney
> > > > > <ru...@gmail.com> wrote:
> > > > > > I wish someone would publish some source that writes events to
> S3.
> > > > > >
> > > > > > Russell Jurney
> > > > > > twitter.com/rjurney
> > > > > > russell.jurney@gmail.com
> > > > > > datasyndrome.com
> > > > > >
> > > > > > On Mar 20, 2012, at 11:20 AM, Dave Fayram <df...@gmail.com>
> > wrote:
> > > > > >
> > > > > >> We've been successfully using Kafka on AWS as well, and JMX wise
> > we
> > > > > >> just use an SSH tunnel.
> > > > > >>
> > > > > >> In general, we've been very happy with the performance on AWS,
> > which
> > > > > >> some people have reservations about due to the I/O situation on
> > most
> > > > > >> Amazon boxes.
> > > > > >>
> > > > > >> On Tue, Mar 20, 2012 at 9:07 AM, Gautam Singaraju
> > > > > >> <ga...@gmail.com> wrote:
> > > > > >>> We are have been considering Kafka for a new Data Platform. Has
> > > > someone
> > > > > >>> used Kafka in AWS? If so, could you please share your
> experiences
> > > > with
> > > > > us?
> > > > > >>>
> > > > > >>> Thank you!
> > > > > >>> ---
> > > > > >>> Gautam
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> --
> > > > > >> Dave Fayram
> > > > > >> dfayram@gmail.com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > > datasyndrome.com
> > > >
> > >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
>

Re: Kafka in AWS?

Posted by Russell Jurney <ru...@gmail.com>.
bump

On Wed, Mar 21, 2012 at 10:01 PM, Vaibhav Puranik <vp...@gmail.com>wrote:

> Let me ask my boss what I can share. Let's talk off the mailing list.
>
> Regards,
> Vaibhav
>
> On Wed, Mar 21, 2012 at 1:44 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > You have code that puts records in bigger blocks on s3? Plz to share? :)
> >
> > Russell Jurney http://datasyndrome.com
> >
> > On Mar 21, 2012, at 1:37 PM, Vaibhav Puranik <vp...@gmail.com> wrote:
> >
> > > We also have s3 files organized by date in the following fashion.
> > >
> > > yyyy/MM/dd/hh
> > >
> > > Our messages are in JSON.
> > >
> > > Regards,
> > > Vaibhav
> > >
> > > On Wed, Mar 21, 2012 at 1:33 PM, Russell Jurney <
> > russell.jurney@gmail.com>wrote:
> > >
> > >> I want the S3 files to be organized by type and date. Folders for
> types,
> > >> subfolders for date down to the hour: year/month/day/hour. All
> payloads
> > of
> > >> a given type get written together.
> > >>
> > >> It would be ideal if there was no integration with the end format, but
> > in
> > >> practice I'm not sure if all the serialization protocols mentioned can
> > be
> > >> written in this way.
> > >>
> > >> Russell Jurney http://datasyndrome.com
> > >>
> > >> On Mar 21, 2012, at 12:50 PM, Tim Lossen <ti...@lossen.de> wrote:
> > >>
> > >>> another good option would be messagepack -- flexible & schemaless
> like
> > >> json, but binary.
> > >>>
> > >>> Sent from my iPhone
> > >>>
> > >>> On 21 Mar 2012, at 20:46, Russell Jurney <ru...@gmail.com>
> > >> wrote:
> > >>>
> > >>>> I'm going to use thrift, avro or protobuf for serialization.
> > >>>>
> > >>>> Russell Jurney http://datasyndrome.com
> > >>>>
> > >>>> On Mar 21, 2012, at 11:59 AM, Vaibhav Puranik <vp...@gmail.com>
> > >> wrote:
> > >>>>
> > >>>>> I would use the payload. I want the message to be exactly as it is.
> > We
> > >> want
> > >>>>> to name the files as per topic.
> > >>>>> (That's how we differentiate right now).
> > >>>>>
> > >>>>> Regards,
> > >>>>> Vaibhav
> > >>>>>
> > >>>>> On Wed, Mar 21, 2012 at 11:01 AM, Niek Sanders <
> > niek.sanders@gmail.com
> > >>> wrote:
> > >>>>>
> > >>>>>> So what would you like the S3 files to actually look like?
> > >>>>>>
> > >>>>>> One Kafka message body per line?  Should the message topic be
> tossed
> > >>>>>> in there too?
> > >>>>>>
> > >>>>>> A tricky aspect is that the Kafka message body is an opaque byte
> > >>>>>> array.  For my own case I'm using JSON for the payload so it makes
> > my
> > >>>>>> requirements simpler.
> > >>>>>>
> > >>>>>> - Niek
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney
> > >>>>>> <ru...@gmail.com> wrote:
> > >>>>>>> I want events in S3 to process them in Hadoop. I'd like to emit
> > them
> > >> in
> > >>>>>> my app, and have them magically show up in 64MB chunks on S3. Like
> > >> most
> > >>>>>> everyone else.
> > >>>>>>>
> > >>>>>>> Russell Jurney http://datasyndrome.com
> > >>>>>>>
> > >>>>>>
> > >>
> >
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Kafka in AWS?

Posted by Vaibhav Puranik <vp...@gmail.com>.
Let me ask my boss what I can share. Let's talk off the mailing list.

Regards,
Vaibhav

On Wed, Mar 21, 2012 at 1:44 PM, Russell Jurney <ru...@gmail.com>wrote:

> You have code that puts records in bigger blocks on s3? Plz to share? :)
>
> Russell Jurney http://datasyndrome.com
>
> On Mar 21, 2012, at 1:37 PM, Vaibhav Puranik <vp...@gmail.com> wrote:
>
> > We also have s3 files organized by date in the following fashion.
> >
> > yyyy/MM/dd/hh
> >
> > Our messages are in JSON.
> >
> > Regards,
> > Vaibhav
> >
> > On Wed, Mar 21, 2012 at 1:33 PM, Russell Jurney <
> russell.jurney@gmail.com>wrote:
> >
> >> I want the S3 files to be organized by type and date. Folders for types,
> >> subfolders for date down to the hour: year/month/day/hour. All payloads
> of
> >> a given type get written together.
> >>
> >> It would be ideal if there was no integration with the end format, but
> in
> >> practice I'm not sure if all the serialization protocols mentioned can
> be
> >> written in this way.
> >>
> >> Russell Jurney http://datasyndrome.com
> >>
> >> On Mar 21, 2012, at 12:50 PM, Tim Lossen <ti...@lossen.de> wrote:
> >>
> >>> another good option would be messagepack -- flexible & schemaless like
> >> json, but binary.
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On 21 Mar 2012, at 20:46, Russell Jurney <ru...@gmail.com>
> >> wrote:
> >>>
> >>>> I'm going to use thrift, avro or protobuf for serialization.
> >>>>
> >>>> Russell Jurney http://datasyndrome.com
> >>>>
> >>>> On Mar 21, 2012, at 11:59 AM, Vaibhav Puranik <vp...@gmail.com>
> >> wrote:
> >>>>
> >>>>> I would use the payload. I want the message to be exactly as it is.
> We
> >> want
> >>>>> to name the files as per topic.
> >>>>> (That's how we differentiate right now).
> >>>>>
> >>>>> Regards,
> >>>>> Vaibhav
> >>>>>
> >>>>> On Wed, Mar 21, 2012 at 11:01 AM, Niek Sanders <
> niek.sanders@gmail.com
> >>> wrote:
> >>>>>
> >>>>>> So what would you like the S3 files to actually look like?
> >>>>>>
> >>>>>> One Kafka message body per line?  Should the message topic be tossed
> >>>>>> in there too?
> >>>>>>
> >>>>>> A tricky aspect is that the Kafka message body is an opaque byte
> >>>>>> array.  For my own case I'm using JSON for the payload so it makes
> my
> >>>>>> requirements simpler.
> >>>>>>
> >>>>>> - Niek
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney
> >>>>>> <ru...@gmail.com> wrote:
> >>>>>>> I want events in S3 to process them in Hadoop. I'd like to emit
> them
> >> in
> >>>>>> my app, and have them magically show up in 64MB chunks on S3. Like
> >> most
> >>>>>> everyone else.
> >>>>>>>
> >>>>>>> Russell Jurney http://datasyndrome.com
> >>>>>>>
> >>>>>>
> >>
>

Re: Kafka in AWS?

Posted by Russell Jurney <ru...@gmail.com>.
You have code that puts records in bigger blocks on s3? Plz to share? :)

Russell Jurney http://datasyndrome.com

On Mar 21, 2012, at 1:37 PM, Vaibhav Puranik <vp...@gmail.com> wrote:

> We also have s3 files organized by date in the following fashion.
> 
> yyyy/MM/dd/hh
> 
> Our messages are in JSON.
> 
> Regards,
> Vaibhav
> 
> On Wed, Mar 21, 2012 at 1:33 PM, Russell Jurney <ru...@gmail.com>wrote:
> 
>> I want the S3 files to be organized by type and date. Folders for types,
>> subfolders for date down to the hour: year/month/day/hour. All payloads of
>> a given type get written together.
>> 
>> It would be ideal if there was no integration with the end format, but in
>> practice I'm not sure if all the serialization protocols mentioned can be
>> written in this way.
>> 
>> Russell Jurney http://datasyndrome.com
>> 
>> On Mar 21, 2012, at 12:50 PM, Tim Lossen <ti...@lossen.de> wrote:
>> 
>>> another good option would be messagepack -- flexible & schemaless like
>> json, but binary.
>>> 
>>> Sent from my iPhone
>>> 
>>> On 21 Mar 2012, at 20:46, Russell Jurney <ru...@gmail.com>
>> wrote:
>>> 
>>>> I'm going to use thrift, avro or protobuf for serialization.
>>>> 
>>>> Russell Jurney http://datasyndrome.com
>>>> 
>>>> On Mar 21, 2012, at 11:59 AM, Vaibhav Puranik <vp...@gmail.com>
>> wrote:
>>>> 
>>>>> I would use the payload. I want the message to be exactly as it is. We
>> want
>>>>> to name the files as per topic.
>>>>> (That's how we differentiate right now).
>>>>> 
>>>>> Regards,
>>>>> Vaibhav
>>>>> 
>>>>> On Wed, Mar 21, 2012 at 11:01 AM, Niek Sanders <niek.sanders@gmail.com
>>> wrote:
>>>>> 
>>>>>> So what would you like the S3 files to actually look like?
>>>>>> 
>>>>>> One Kafka message body per line?  Should the message topic be tossed
>>>>>> in there too?
>>>>>> 
>>>>>> A tricky aspect is that the Kafka message body is an opaque byte
>>>>>> array.  For my own case I'm using JSON for the payload so it makes my
>>>>>> requirements simpler.
>>>>>> 
>>>>>> - Niek
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney
>>>>>> <ru...@gmail.com> wrote:
>>>>>>> I want events in S3 to process them in Hadoop. I'd like to emit them
>> in
>>>>>> my app, and have them magically show up in 64MB chunks on S3. Like
>> most
>>>>>> everyone else.
>>>>>>> 
>>>>>>> Russell Jurney http://datasyndrome.com
>>>>>>> 
>>>>>> 
>> 

Re: Kafka in AWS?

Posted by Vaibhav Puranik <vp...@gmail.com>.
We also have s3 files organized by date in the following fashion.

yyyy/MM/dd/hh

Our messages are in JSON.

Regards,
Vaibhav

On Wed, Mar 21, 2012 at 1:33 PM, Russell Jurney <ru...@gmail.com>wrote:

> I want the S3 files to be organized by type and date. Folders for types,
> subfolders for date down to the hour: year/month/day/hour. All payloads of
> a given type get written together.
>
> It would be ideal if there was no integration with the end format, but in
> practice I'm not sure if all the serialization protocols mentioned can be
> written in this way.
>
> Russell Jurney http://datasyndrome.com
>
> On Mar 21, 2012, at 12:50 PM, Tim Lossen <ti...@lossen.de> wrote:
>
> > another good option would be messagepack -- flexible & schemaless like
> json, but binary.
> >
> > Sent from my iPhone
> >
> > On 21 Mar 2012, at 20:46, Russell Jurney <ru...@gmail.com>
> wrote:
> >
> >> I'm going to use thrift, avro or protobuf for serialization.
> >>
> >> Russell Jurney http://datasyndrome.com
> >>
> >> On Mar 21, 2012, at 11:59 AM, Vaibhav Puranik <vp...@gmail.com>
> wrote:
> >>
> >>> I would use the payload. I want the message to be exactly as it is. We
> want
> >>> to name the files as per topic.
> >>> (That's how we differentiate right now).
> >>>
> >>> Regards,
> >>> Vaibhav
> >>>
> >>> On Wed, Mar 21, 2012 at 11:01 AM, Niek Sanders <niek.sanders@gmail.com
> >wrote:
> >>>
> >>>> So what would you like the S3 files to actually look like?
> >>>>
> >>>> One Kafka message body per line?  Should the message topic be tossed
> >>>> in there too?
> >>>>
> >>>> A tricky aspect is that the Kafka message body is an opaque byte
> >>>> array.  For my own case I'm using JSON for the payload so it makes my
> >>>> requirements simpler.
> >>>>
> >>>> - Niek
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney
> >>>> <ru...@gmail.com> wrote:
> >>>>> I want events in S3 to process them in Hadoop. I'd like to emit them
> in
> >>>> my app, and have them magically show up in 64MB chunks on S3. Like
> most
> >>>> everyone else.
> >>>>>
> >>>>> Russell Jurney http://datasyndrome.com
> >>>>>
> >>>>
>

Re: Kafka in AWS?

Posted by Russell Jurney <ru...@gmail.com>.
I want the S3 files to be organized by type and date. Folders for types, subfolders for date down to the hour: year/month/day/hour. All payloads of a given type get written together.

It would be ideal if there was no integration with the end format, but in practice I'm not sure if all the serialization protocols mentioned can be written in this way.

Russell Jurney http://datasyndrome.com

On Mar 21, 2012, at 12:50 PM, Tim Lossen <ti...@lossen.de> wrote:

> another good option would be messagepack -- flexible & schemaless like json, but binary.
> 
> Sent from my iPhone
> 
> On 21 Mar 2012, at 20:46, Russell Jurney <ru...@gmail.com> wrote:
> 
>> I'm going to use thrift, avro or protobuf for serialization.
>> 
>> Russell Jurney http://datasyndrome.com
>> 
>> On Mar 21, 2012, at 11:59 AM, Vaibhav Puranik <vp...@gmail.com> wrote:
>> 
>>> I would use the payload. I want the message to be exactly as it is. We want
>>> to name the files as per topic.
>>> (That's how we differentiate right now).
>>> 
>>> Regards,
>>> Vaibhav
>>> 
>>> On Wed, Mar 21, 2012 at 11:01 AM, Niek Sanders <ni...@gmail.com>wrote:
>>> 
>>>> So what would you like the S3 files to actually look like?
>>>> 
>>>> One Kafka message body per line?  Should the message topic be tossed
>>>> in there too?
>>>> 
>>>> A tricky aspect is that the Kafka message body is an opaque byte
>>>> array.  For my own case I'm using JSON for the payload so it makes my
>>>> requirements simpler.
>>>> 
>>>> - Niek
>>>> 
>>>> 
>>>> 
>>>> On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney
>>>> <ru...@gmail.com> wrote:
>>>>> I want events in S3 to process them in Hadoop. I'd like to emit them in
>>>> my app, and have them magically show up in 64MB chunks on S3. Like most
>>>> everyone else.
>>>>> 
>>>>> Russell Jurney http://datasyndrome.com
>>>>> 
>>>> 

Re: Kafka in AWS?

Posted by Russell Jurney <ru...@gmail.com>.
I don't want avro to have an opinion on the serialization format.

Russell Jurney http://datasyndrome.com

On Mar 21, 2012, at 12:50 PM, Tim Lossen <ti...@lossen.de> wrote:

> another good option would be messagepack -- flexible & schemaless like json, but binary.
> 
> Sent from my iPhone
> 
> On 21 Mar 2012, at 20:46, Russell Jurney <ru...@gmail.com> wrote:
> 
>> I'm going to use thrift, avro or protobuf for serialization.
>> 
>> Russell Jurney http://datasyndrome.com
>> 
>> On Mar 21, 2012, at 11:59 AM, Vaibhav Puranik <vp...@gmail.com> wrote:
>> 
>>> I would use the payload. I want the message to be exactly as it is. We want
>>> to name the files as per topic.
>>> (That's how we differentiate right now).
>>> 
>>> Regards,
>>> Vaibhav
>>> 
>>> On Wed, Mar 21, 2012 at 11:01 AM, Niek Sanders <ni...@gmail.com>wrote:
>>> 
>>>> So what would you like the S3 files to actually look like?
>>>> 
>>>> One Kafka message body per line?  Should the message topic be tossed
>>>> in there too?
>>>> 
>>>> A tricky aspect is that the Kafka message body is an opaque byte
>>>> array.  For my own case I'm using JSON for the payload so it makes my
>>>> requirements simpler.
>>>> 
>>>> - Niek
>>>> 
>>>> 
>>>> 
>>>> On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney
>>>> <ru...@gmail.com> wrote:
>>>>> I want events in S3 to process them in Hadoop. I'd like to emit them in
>>>> my app, and have them magically show up in 64MB chunks on S3. Like most
>>>> everyone else.
>>>>> 
>>>>> Russell Jurney http://datasyndrome.com
>>>>> 
>>>> 

Re: Kafka in AWS?

Posted by Tim Lossen <ti...@lossen.de>.
another good option would be messagepack -- flexible & schemaless like json, but binary.

Sent from my iPhone

On 21 Mar 2012, at 20:46, Russell Jurney <ru...@gmail.com> wrote:

> I'm going to use thrift, avro or protobuf for serialization.
> 
> Russell Jurney http://datasyndrome.com
> 
> On Mar 21, 2012, at 11:59 AM, Vaibhav Puranik <vp...@gmail.com> wrote:
> 
>> I would use the payload. I want the message to be exactly as it is. We want
>> to name the files as per topic.
>> (That's how we differentiate right now).
>> 
>> Regards,
>> Vaibhav
>> 
>> On Wed, Mar 21, 2012 at 11:01 AM, Niek Sanders <ni...@gmail.com>wrote:
>> 
>>> So what would you like the S3 files to actually look like?
>>> 
>>> One Kafka message body per line?  Should the message topic be tossed
>>> in there too?
>>> 
>>> A tricky aspect is that the Kafka message body is an opaque byte
>>> array.  For my own case I'm using JSON for the payload so it makes my
>>> requirements simpler.
>>> 
>>> - Niek
>>> 
>>> 
>>> 
>>> On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney
>>> <ru...@gmail.com> wrote:
>>>> I want events in S3 to process them in Hadoop. I'd like to emit them in
>>> my app, and have them magically show up in 64MB chunks on S3. Like most
>>> everyone else.
>>>> 
>>>> Russell Jurney http://datasyndrome.com
>>>> 
>>> 

Re: Kafka in AWS?

Posted by Russell Jurney <ru...@gmail.com>.
I'm going to use thrift, avro or protobuf for serialization.

Russell Jurney http://datasyndrome.com

On Mar 21, 2012, at 11:59 AM, Vaibhav Puranik <vp...@gmail.com> wrote:

> I would use the payload. I want the message to be exactly as it is. We want
> to name the files as per topic.
> (That's how we differentiate right now).
> 
> Regards,
> Vaibhav
> 
> On Wed, Mar 21, 2012 at 11:01 AM, Niek Sanders <ni...@gmail.com>wrote:
> 
>> So what would you like the S3 files to actually look like?
>> 
>> One Kafka message body per line?  Should the message topic be tossed
>> in there too?
>> 
>> A tricky aspect is that the Kafka message body is an opaque byte
>> array.  For my own case I'm using JSON for the payload so it makes my
>> requirements simpler.
>> 
>> - Niek
>> 
>> 
>> 
>> On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney
>> <ru...@gmail.com> wrote:
>>> I want events in S3 to process them in Hadoop. I'd like to emit them in
>> my app, and have them magically show up in 64MB chunks on S3. Like most
>> everyone else.
>>> 
>>> Russell Jurney http://datasyndrome.com
>>> 
>> 

Re: Kafka in AWS?

Posted by Vaibhav Puranik <vp...@gmail.com>.
I would use the payload. I want the message to be exactly as it is. We want
to name the files as per topic.
(That's how we differentiate right now).

Regards,
Vaibhav

On Wed, Mar 21, 2012 at 11:01 AM, Niek Sanders <ni...@gmail.com>wrote:

> So what would you like the S3 files to actually look like?
>
> One Kafka message body per line?  Should the message topic be tossed
> in there too?
>
> A tricky aspect is that the Kafka message body is an opaque byte
> array.  For my own case I'm using JSON for the payload so it makes my
> requirements simpler.
>
> - Niek
>
>
>
> On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney
> <ru...@gmail.com> wrote:
> > I want events in S3 to process them in Hadoop. I'd like to emit them in
> my app, and have them magically show up in 64MB chunks on S3. Like most
> everyone else.
> >
> > Russell Jurney http://datasyndrome.com
> >
>

Re: Kafka in AWS?

Posted by Niek Sanders <ni...@gmail.com>.
So what would you like the S3 files to actually look like?

One Kafka message body per line?  Should the message topic be tossed
in there too?

A tricky aspect is that the Kafka message body is an opaque byte
array.  For my own case I'm using JSON for the payload so it makes my
requirements simpler.

- Niek



On Tue, Mar 20, 2012 at 10:07 PM, Russell Jurney
<ru...@gmail.com> wrote:
> I want events in S3 to process them in Hadoop. I'd like to emit them in my app, and have them magically show up in 64MB chunks on S3. Like most everyone else.
>
> Russell Jurney http://datasyndrome.com
>

Re: Kafka in AWS?

Posted by Russell Jurney <ru...@gmail.com>.
I want events in S3 to process them in Hadoop. I'd like to emit them in my app, and have them magically show up in 64MB chunks on S3. Like most everyone else.

Russell Jurney http://datasyndrome.com

On Mar 20, 2012, at 10:03 PM, Neha Narkhede <ne...@gmail.com> wrote:

> Russell,
> 
> By "sink events into S3", do you mean you want to have some plugin that
> will suck data out of your Kafka brokers and upload to S3. Would you mind
> describing use cases that would require to send data to Kafka, then upload
> data to S3, and then use it by querying S3 ?
> 
> Thanks,
> Neha
> On Mar 20, 2012 4:51 PM, "Russell Jurney" <ru...@gmail.com> wrote:
> 
>> I think as soon as someone commits code that reliably sinks events to S3,
>> Kafka adoption will skyrocket.  There is no good solution to this yet.
>> MANY people want one.
>> 
>> Russ
>> 
>> On Tue, Mar 20, 2012 at 3:32 PM, Felix GV <fe...@mate1inc.com> wrote:
>> 
>>> The primary use case for Kafka is to use it on AWS...???
>>> 
>>> Sorry if I put words you didn't intend in your mouth :P ... I just
>> thought
>>> that sounded funny ;)
>>> 
>>> Sorry for being off-topic. Carry on :/ !
>>> 
>>> --
>>> Felix
>>> 
>>> 
>>> 
>>> On Tue, Mar 20, 2012 at 6:23 PM, Russell Jurney <
>> russell.jurney@gmail.com
>>>> wrote:
>>> 
>>>> Yeah, that is the part I am hoping someone will contribute :)  I know I
>>> can
>>>> write that myself.  I also know it will be buggy and that I will have
>>> lots
>>>> of trouble.
>>>> 
>>>> If you contribute this code, it would be a huge boon to Kafka.  It is
>> imo
>>>> the primary use case for Kafka atm... if only the code gets into git.
>>>> 
>>>> On Tue, Mar 20, 2012 at 3:04 PM, Niek Sanders <niek.sanders@gmail.com
>>>>> wrote:
>>>> 
>>>>> Russell,
>>>>> 
>>>>> I'm actually in the process of writing a Java code to go from Kafka
>>>>> messages to S3.  I might be able to rip-out my application-specific
>>>>> parts and share something later tonight.
>>>>> 
>>>>> The biggest hassle is that you can't append to existing S3 files.  So
>>>>> unless you're planning on uploading each message as a separate S3
>>>>> object, this means you need message aggregation smarts on the Kafka
>>>>> consumer / S3 uploader side of things.
>>>>> 
>>>>> Best,
>>>>> Niek
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Mar 20, 2012 at 12:56 PM, Russell Jurney
>>>>> <ru...@gmail.com> wrote:
>>>>>> I wish someone would publish some source that writes events to S3.
>>>>>> 
>>>>>> Russell Jurney
>>>>>> twitter.com/rjurney
>>>>>> russell.jurney@gmail.com
>>>>>> datasyndrome.com
>>>>>> 
>>>>>> On Mar 20, 2012, at 11:20 AM, Dave Fayram <df...@gmail.com>
>> wrote:
>>>>>> 
>>>>>>> We've been successfully using Kafka on AWS as well, and JMX wise
>> we
>>>>>>> just use an SSH tunnel.
>>>>>>> 
>>>>>>> In general, we've been very happy with the performance on AWS,
>> which
>>>>>>> some people have reservations about due to the I/O situation on
>> most
>>>>>>> Amazon boxes.
>>>>>>> 
>>>>>>> On Tue, Mar 20, 2012 at 9:07 AM, Gautam Singaraju
>>>>>>> <ga...@gmail.com> wrote:
>>>>>>>> We are have been considering Kafka for a new Data Platform. Has
>>>> someone
>>>>>>>> used Kafka in AWS? If so, could you please share your experiences
>>>> with
>>>>> us?
>>>>>>>> 
>>>>>>>> Thank you!
>>>>>>>> ---
>>>>>>>> Gautam
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> --
>>>>>>> Dave Fayram
>>>>>>> dfayram@gmail.com
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>> datasyndrome.com
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> datasyndrome.com
>> 

Re: Kafka in AWS?

Posted by Neha Narkhede <ne...@gmail.com>.
Russell,

By "sink events into S3", do you mean you want to have some plugin that
will suck data out of your Kafka brokers and upload to S3. Would you mind
describing use cases that would require to send data to Kafka, then upload
data to S3, and then use it by querying S3 ?

Thanks,
Neha
On Mar 20, 2012 4:51 PM, "Russell Jurney" <ru...@gmail.com> wrote:

> I think as soon as someone commits code that reliably sinks events to S3,
> Kafka adoption will skyrocket.  There is no good solution to this yet.
>  MANY people want one.
>
> Russ
>
> On Tue, Mar 20, 2012 at 3:32 PM, Felix GV <fe...@mate1inc.com> wrote:
>
> > The primary use case for Kafka is to use it on AWS...???
> >
> > Sorry if I put words you didn't intend in your mouth :P ... I just
> thought
> > that sounded funny ;)
> >
> > Sorry for being off-topic. Carry on :/ !
> >
> > --
> > Felix
> >
> >
> >
> > On Tue, Mar 20, 2012 at 6:23 PM, Russell Jurney <
> russell.jurney@gmail.com
> > >wrote:
> >
> > > Yeah, that is the part I am hoping someone will contribute :)  I know I
> > can
> > > write that myself.  I also know it will be buggy and that I will have
> > lots
> > > of trouble.
> > >
> > > If you contribute this code, it would be a huge boon to Kafka.  It is
> imo
> > > the primary use case for Kafka atm... if only the code gets into git.
> > >
> > > On Tue, Mar 20, 2012 at 3:04 PM, Niek Sanders <niek.sanders@gmail.com
> > > >wrote:
> > >
> > > > Russell,
> > > >
> > > > I'm actually in the process of writing a Java code to go from Kafka
> > > > messages to S3.  I might be able to rip-out my application-specific
> > > > parts and share something later tonight.
> > > >
> > > > The biggest hassle is that you can't append to existing S3 files.  So
> > > > unless you're planning on uploading each message as a separate S3
> > > > object, this means you need message aggregation smarts on the Kafka
> > > > consumer / S3 uploader side of things.
> > > >
> > > > Best,
> > > > Niek
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Mar 20, 2012 at 12:56 PM, Russell Jurney
> > > > <ru...@gmail.com> wrote:
> > > > > I wish someone would publish some source that writes events to S3.
> > > > >
> > > > > Russell Jurney
> > > > > twitter.com/rjurney
> > > > > russell.jurney@gmail.com
> > > > > datasyndrome.com
> > > > >
> > > > > On Mar 20, 2012, at 11:20 AM, Dave Fayram <df...@gmail.com>
> wrote:
> > > > >
> > > > >> We've been successfully using Kafka on AWS as well, and JMX wise
> we
> > > > >> just use an SSH tunnel.
> > > > >>
> > > > >> In general, we've been very happy with the performance on AWS,
> which
> > > > >> some people have reservations about due to the I/O situation on
> most
> > > > >> Amazon boxes.
> > > > >>
> > > > >> On Tue, Mar 20, 2012 at 9:07 AM, Gautam Singaraju
> > > > >> <ga...@gmail.com> wrote:
> > > > >>> We are have been considering Kafka for a new Data Platform. Has
> > > someone
> > > > >>> used Kafka in AWS? If so, could you please share your experiences
> > > with
> > > > us?
> > > > >>>
> > > > >>> Thank you!
> > > > >>> ---
> > > > >>> Gautam
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> --
> > > > >> Dave Fayram
> > > > >> dfayram@gmail.com
> > > >
> > >
> > >
> > >
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > datasyndrome.com
> > >
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>

Re: Kafka in AWS?

Posted by Russell Jurney <ru...@gmail.com>.
I think as soon as someone commits code that reliably sinks events to S3,
Kafka adoption will skyrocket.  There is no good solution to this yet.
 MANY people want one.

Russ

On Tue, Mar 20, 2012 at 3:32 PM, Felix GV <fe...@mate1inc.com> wrote:

> The primary use case for Kafka is to use it on AWS...???
>
> Sorry if I put words you didn't intend in your mouth :P ... I just thought
> that sounded funny ;)
>
> Sorry for being off-topic. Carry on :/ !
>
> --
> Felix
>
>
>
> On Tue, Mar 20, 2012 at 6:23 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > Yeah, that is the part I am hoping someone will contribute :)  I know I
> can
> > write that myself.  I also know it will be buggy and that I will have
> lots
> > of trouble.
> >
> > If you contribute this code, it would be a huge boon to Kafka.  It is imo
> > the primary use case for Kafka atm... if only the code gets into git.
> >
> > On Tue, Mar 20, 2012 at 3:04 PM, Niek Sanders <niek.sanders@gmail.com
> > >wrote:
> >
> > > Russell,
> > >
> > > I'm actually in the process of writing a Java code to go from Kafka
> > > messages to S3.  I might be able to rip-out my application-specific
> > > parts and share something later tonight.
> > >
> > > The biggest hassle is that you can't append to existing S3 files.  So
> > > unless you're planning on uploading each message as a separate S3
> > > object, this means you need message aggregation smarts on the Kafka
> > > consumer / S3 uploader side of things.
> > >
> > > Best,
> > > Niek
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Mar 20, 2012 at 12:56 PM, Russell Jurney
> > > <ru...@gmail.com> wrote:
> > > > I wish someone would publish some source that writes events to S3.
> > > >
> > > > Russell Jurney
> > > > twitter.com/rjurney
> > > > russell.jurney@gmail.com
> > > > datasyndrome.com
> > > >
> > > > On Mar 20, 2012, at 11:20 AM, Dave Fayram <df...@gmail.com> wrote:
> > > >
> > > >> We've been successfully using Kafka on AWS as well, and JMX wise we
> > > >> just use an SSH tunnel.
> > > >>
> > > >> In general, we've been very happy with the performance on AWS, which
> > > >> some people have reservations about due to the I/O situation on most
> > > >> Amazon boxes.
> > > >>
> > > >> On Tue, Mar 20, 2012 at 9:07 AM, Gautam Singaraju
> > > >> <ga...@gmail.com> wrote:
> > > >>> We are have been considering Kafka for a new Data Platform. Has
> > someone
> > > >>> used Kafka in AWS? If so, could you please share your experiences
> > with
> > > us?
> > > >>>
> > > >>> Thank you!
> > > >>> ---
> > > >>> Gautam
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> --
> > > >> Dave Fayram
> > > >> dfayram@gmail.com
> > >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Kafka in AWS?

Posted by Felix GV <fe...@mate1inc.com>.
The primary use case for Kafka is to use it on AWS...???

Sorry if I put words you didn't intend in your mouth :P ... I just thought
that sounded funny ;)

Sorry for being off-topic. Carry on :/ !

--
Felix



On Tue, Mar 20, 2012 at 6:23 PM, Russell Jurney <ru...@gmail.com>wrote:

> Yeah, that is the part I am hoping someone will contribute :)  I know I can
> write that myself.  I also know it will be buggy and that I will have lots
> of trouble.
>
> If you contribute this code, it would be a huge boon to Kafka.  It is imo
> the primary use case for Kafka atm... if only the code gets into git.
>
> On Tue, Mar 20, 2012 at 3:04 PM, Niek Sanders <niek.sanders@gmail.com
> >wrote:
>
> > Russell,
> >
> > I'm actually in the process of writing a Java code to go from Kafka
> > messages to S3.  I might be able to rip-out my application-specific
> > parts and share something later tonight.
> >
> > The biggest hassle is that you can't append to existing S3 files.  So
> > unless you're planning on uploading each message as a separate S3
> > object, this means you need message aggregation smarts on the Kafka
> > consumer / S3 uploader side of things.
> >
> > Best,
> > Niek
> >
> >
> >
> >
> >
> >
> > On Tue, Mar 20, 2012 at 12:56 PM, Russell Jurney
> > <ru...@gmail.com> wrote:
> > > I wish someone would publish some source that writes events to S3.
> > >
> > > Russell Jurney
> > > twitter.com/rjurney
> > > russell.jurney@gmail.com
> > > datasyndrome.com
> > >
> > > On Mar 20, 2012, at 11:20 AM, Dave Fayram <df...@gmail.com> wrote:
> > >
> > >> We've been successfully using Kafka on AWS as well, and JMX wise we
> > >> just use an SSH tunnel.
> > >>
> > >> In general, we've been very happy with the performance on AWS, which
> > >> some people have reservations about due to the I/O situation on most
> > >> Amazon boxes.
> > >>
> > >> On Tue, Mar 20, 2012 at 9:07 AM, Gautam Singaraju
> > >> <ga...@gmail.com> wrote:
> > >>> We are have been considering Kafka for a new Data Platform. Has
> someone
> > >>> used Kafka in AWS? If so, could you please share your experiences
> with
> > us?
> > >>>
> > >>> Thank you!
> > >>> ---
> > >>> Gautam
> > >>
> > >>
> > >>
> > >> --
> > >> --
> > >> Dave Fayram
> > >> dfayram@gmail.com
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>

Re: Kafka in AWS?

Posted by Russell Jurney <ru...@gmail.com>.
Yeah, that is the part I am hoping someone will contribute :)  I know I can
write that myself.  I also know it will be buggy and that I will have lots
of trouble.

If you contribute this code, it would be a huge boon to Kafka.  It is imo
the primary use case for Kafka atm... if only the code gets into git.

On Tue, Mar 20, 2012 at 3:04 PM, Niek Sanders <ni...@gmail.com>wrote:

> Russell,
>
> I'm actually in the process of writing a Java code to go from Kafka
> messages to S3.  I might be able to rip-out my application-specific
> parts and share something later tonight.
>
> The biggest hassle is that you can't append to existing S3 files.  So
> unless you're planning on uploading each message as a separate S3
> object, this means you need message aggregation smarts on the Kafka
> consumer / S3 uploader side of things.
>
> Best,
> Niek
>
>
>
>
>
>
> On Tue, Mar 20, 2012 at 12:56 PM, Russell Jurney
> <ru...@gmail.com> wrote:
> > I wish someone would publish some source that writes events to S3.
> >
> > Russell Jurney
> > twitter.com/rjurney
> > russell.jurney@gmail.com
> > datasyndrome.com
> >
> > On Mar 20, 2012, at 11:20 AM, Dave Fayram <df...@gmail.com> wrote:
> >
> >> We've been successfully using Kafka on AWS as well, and JMX wise we
> >> just use an SSH tunnel.
> >>
> >> In general, we've been very happy with the performance on AWS, which
> >> some people have reservations about due to the I/O situation on most
> >> Amazon boxes.
> >>
> >> On Tue, Mar 20, 2012 at 9:07 AM, Gautam Singaraju
> >> <ga...@gmail.com> wrote:
> >>> We are have been considering Kafka for a new Data Platform. Has someone
> >>> used Kafka in AWS? If so, could you please share your experiences with
> us?
> >>>
> >>> Thank you!
> >>> ---
> >>> Gautam
> >>
> >>
> >>
> >> --
> >> --
> >> Dave Fayram
> >> dfayram@gmail.com
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Kafka in AWS?

Posted by Niek Sanders <ni...@gmail.com>.
Russell,

I'm actually in the process of writing a Java code to go from Kafka
messages to S3.  I might be able to rip-out my application-specific
parts and share something later tonight.

The biggest hassle is that you can't append to existing S3 files.  So
unless you're planning on uploading each message as a separate S3
object, this means you need message aggregation smarts on the Kafka
consumer / S3 uploader side of things.

Best,
Niek






On Tue, Mar 20, 2012 at 12:56 PM, Russell Jurney
<ru...@gmail.com> wrote:
> I wish someone would publish some source that writes events to S3.
>
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>
> On Mar 20, 2012, at 11:20 AM, Dave Fayram <df...@gmail.com> wrote:
>
>> We've been successfully using Kafka on AWS as well, and JMX wise we
>> just use an SSH tunnel.
>>
>> In general, we've been very happy with the performance on AWS, which
>> some people have reservations about due to the I/O situation on most
>> Amazon boxes.
>>
>> On Tue, Mar 20, 2012 at 9:07 AM, Gautam Singaraju
>> <ga...@gmail.com> wrote:
>>> We are have been considering Kafka for a new Data Platform. Has someone
>>> used Kafka in AWS? If so, could you please share your experiences with us?
>>>
>>> Thank you!
>>> ---
>>> Gautam
>>
>>
>>
>> --
>> --
>> Dave Fayram
>> dfayram@gmail.com

Re: Kafka in AWS?

Posted by Russell Jurney <ru...@gmail.com>.
I wish someone would publish some source that writes events to S3.

Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com

On Mar 20, 2012, at 11:20 AM, Dave Fayram <df...@gmail.com> wrote:

> We've been successfully using Kafka on AWS as well, and JMX wise we
> just use an SSH tunnel.
>
> In general, we've been very happy with the performance on AWS, which
> some people have reservations about due to the I/O situation on most
> Amazon boxes.
>
> On Tue, Mar 20, 2012 at 9:07 AM, Gautam Singaraju
> <ga...@gmail.com> wrote:
>> We are have been considering Kafka for a new Data Platform. Has someone
>> used Kafka in AWS? If so, could you please share your experiences with us?
>>
>> Thank you!
>> ---
>> Gautam
>
>
>
> --
> --
> Dave Fayram
> dfayram@gmail.com

Re: Kafka in AWS?

Posted by Dave Fayram <df...@gmail.com>.
We've been successfully using Kafka on AWS as well, and JMX wise we
just use an SSH tunnel.

In general, we've been very happy with the performance on AWS, which
some people have reservations about due to the I/O situation on most
Amazon boxes.

On Tue, Mar 20, 2012 at 9:07 AM, Gautam Singaraju
<ga...@gmail.com> wrote:
> We are have been considering Kafka for a new Data Platform. Has someone
> used Kafka in AWS? If so, could you please share your experiences with us?
>
> Thank you!
> ---
> Gautam



-- 
--
Dave Fayram
dfayram@gmail.com

Re: Kafka in AWS?

Posted by Gautam Singaraju <ga...@gmail.com>.
Thank you all! We will let you know of our experiences soon.
---
Gautam


2012/3/20 Patricio Echagüe <pa...@gmail.com>

> We are able to connect to jmx via console by creating a simple SSH tunnel.
>
> In addition to that, jmxtrans installed in the same machine sends jmx info
> from kafka to ganglia.
>
> Sent from my Android
> On Mar 20, 2012 9:25 AM, "Jun Rao" <ju...@gmail.com> wrote:
>
> > Evan,
> >
> > We do have mx4j support. See kafka-78.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Mar 20, 2012 at 9:15 AM, Evan Chan <ev...@ooyala.com> wrote:
> >
> > > We deploy Kafka in AWS.  The only negative so far I've found is that
> > native
> > > JMX is difficult or impossible to use with AWS, because it opens up
> > > secondary ports that you don't have control over.   However, I've heard
> > > there are alternative JMX implementations that allow HTTP and other
> > > alternative protocols which may be more AWS friendly.
> > >
> > > On Tue, Mar 20, 2012 at 9:10 AM, Elben Shira <el...@gmail.com>
> > wrote:
> > >
> > > > There's some on the mailing list archives:
> > > >
> > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201202.mbox/%3CCADWPM3jzgMZmc57HYb55PX=GeAt6d6wzbvowvrMEM4Dw3ttu2g@mail.gmail.com%3E
> > > >
> > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201203.mbox/%3CCAFHvO5s2wHSiegUWGPsfV-eN45SK%3Djfh8ObsU1gHZczNdGg-gg%40mail.gmail.com%3E
> > > >
> > > > Elben
> > > >
> > > >
> > > > On Tue, Mar 20, 2012 at 11:07 AM, Gautam Singaraju <
> > > > gautam.singaraju@gmail.com> wrote:
> > > >
> > > > > We are have been considering Kafka for a new Data Platform. Has
> > someone
> > > > > used Kafka in AWS? If so, could you please share your experiences
> > with
> > > > us?
> > > > >
> > > > > Thank you!
> > > > > ---
> > > > > Gautam
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > > *Evan Chan*
> > > Senior Software Engineer |
> > > ev@ooyala.com | (650) 996-4600
> > > www.ooyala.com | blog <http://www.ooyala.com/blog> |
> > > @ooyala<http://www.twitter.com/ooyala>
> > >
> >
>

Re: Kafka in AWS?

Posted by Patricio Echagüe <pa...@gmail.com>.
We are able to connect to jmx via console by creating a simple SSH tunnel.

In addition to that, jmxtrans installed in the same machine sends jmx info
from kafka to ganglia.

Sent from my Android
On Mar 20, 2012 9:25 AM, "Jun Rao" <ju...@gmail.com> wrote:

> Evan,
>
> We do have mx4j support. See kafka-78.
>
> Thanks,
>
> Jun
>
> On Tue, Mar 20, 2012 at 9:15 AM, Evan Chan <ev...@ooyala.com> wrote:
>
> > We deploy Kafka in AWS.  The only negative so far I've found is that
> native
> > JMX is difficult or impossible to use with AWS, because it opens up
> > secondary ports that you don't have control over.   However, I've heard
> > there are alternative JMX implementations that allow HTTP and other
> > alternative protocols which may be more AWS friendly.
> >
> > On Tue, Mar 20, 2012 at 9:10 AM, Elben Shira <el...@gmail.com>
> wrote:
> >
> > > There's some on the mailing list archives:
> > >
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201202.mbox/%3CCADWPM3jzgMZmc57HYb55PX=GeAt6d6wzbvowvrMEM4Dw3ttu2g@mail.gmail.com%3E
> > >
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201203.mbox/%3CCAFHvO5s2wHSiegUWGPsfV-eN45SK%3Djfh8ObsU1gHZczNdGg-gg%40mail.gmail.com%3E
> > >
> > > Elben
> > >
> > >
> > > On Tue, Mar 20, 2012 at 11:07 AM, Gautam Singaraju <
> > > gautam.singaraju@gmail.com> wrote:
> > >
> > > > We are have been considering Kafka for a new Data Platform. Has
> someone
> > > > used Kafka in AWS? If so, could you please share your experiences
> with
> > > us?
> > > >
> > > > Thank you!
> > > > ---
> > > > Gautam
> > > >
> > >
> >
> >
> >
> > --
> > --
> > *Evan Chan*
> > Senior Software Engineer |
> > ev@ooyala.com | (650) 996-4600
> > www.ooyala.com | blog <http://www.ooyala.com/blog> |
> > @ooyala<http://www.twitter.com/ooyala>
> >
>

Re: Kafka in AWS?

Posted by Jun Rao <ju...@gmail.com>.
Evan,

We do have mx4j support. See kafka-78.

Thanks,

Jun

On Tue, Mar 20, 2012 at 9:15 AM, Evan Chan <ev...@ooyala.com> wrote:

> We deploy Kafka in AWS.  The only negative so far I've found is that native
> JMX is difficult or impossible to use with AWS, because it opens up
> secondary ports that you don't have control over.   However, I've heard
> there are alternative JMX implementations that allow HTTP and other
> alternative protocols which may be more AWS friendly.
>
> On Tue, Mar 20, 2012 at 9:10 AM, Elben Shira <el...@gmail.com> wrote:
>
> > There's some on the mailing list archives:
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201202.mbox/%3CCADWPM3jzgMZmc57HYb55PX=GeAt6d6wzbvowvrMEM4Dw3ttu2g@mail.gmail.com%3E
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201203.mbox/%3CCAFHvO5s2wHSiegUWGPsfV-eN45SK%3Djfh8ObsU1gHZczNdGg-gg%40mail.gmail.com%3E
> >
> > Elben
> >
> >
> > On Tue, Mar 20, 2012 at 11:07 AM, Gautam Singaraju <
> > gautam.singaraju@gmail.com> wrote:
> >
> > > We are have been considering Kafka for a new Data Platform. Has someone
> > > used Kafka in AWS? If so, could you please share your experiences with
> > us?
> > >
> > > Thank you!
> > > ---
> > > Gautam
> > >
> >
>
>
>
> --
> --
> *Evan Chan*
> Senior Software Engineer |
> ev@ooyala.com | (650) 996-4600
> www.ooyala.com | blog <http://www.ooyala.com/blog> |
> @ooyala<http://www.twitter.com/ooyala>
>

Re: Kafka in AWS?

Posted by Evan Chan <ev...@ooyala.com>.
We deploy Kafka in AWS.  The only negative so far I've found is that native
JMX is difficult or impossible to use with AWS, because it opens up
secondary ports that you don't have control over.   However, I've heard
there are alternative JMX implementations that allow HTTP and other
alternative protocols which may be more AWS friendly.

On Tue, Mar 20, 2012 at 9:10 AM, Elben Shira <el...@gmail.com> wrote:

> There's some on the mailing list archives:
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201202.mbox/%3CCADWPM3jzgMZmc57HYb55PX=GeAt6d6wzbvowvrMEM4Dw3ttu2g@mail.gmail.com%3E
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201203.mbox/%3CCAFHvO5s2wHSiegUWGPsfV-eN45SK%3Djfh8ObsU1gHZczNdGg-gg%40mail.gmail.com%3E
>
> Elben
>
>
> On Tue, Mar 20, 2012 at 11:07 AM, Gautam Singaraju <
> gautam.singaraju@gmail.com> wrote:
>
> > We are have been considering Kafka for a new Data Platform. Has someone
> > used Kafka in AWS? If so, could you please share your experiences with
> us?
> >
> > Thank you!
> > ---
> > Gautam
> >
>



-- 
--
*Evan Chan*
Senior Software Engineer |
ev@ooyala.com | (650) 996-4600
www.ooyala.com | blog <http://www.ooyala.com/blog> |
@ooyala<http://www.twitter.com/ooyala>

Re: Kafka in AWS?

Posted by Elben Shira <el...@gmail.com>.
There's some on the mailing list archives:

http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201202.mbox/%3CCADWPM3jzgMZmc57HYb55PX=GeAt6d6wzbvowvrMEM4Dw3ttu2g@mail.gmail.com%3E

http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201203.mbox/%3CCAFHvO5s2wHSiegUWGPsfV-eN45SK%3Djfh8ObsU1gHZczNdGg-gg%40mail.gmail.com%3E

Elben


On Tue, Mar 20, 2012 at 11:07 AM, Gautam Singaraju <
gautam.singaraju@gmail.com> wrote:

> We are have been considering Kafka for a new Data Platform. Has someone
> used Kafka in AWS? If so, could you please share your experiences with us?
>
> Thank you!
> ---
> Gautam
>