You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Joe Stein <cr...@gmail.com> on 2011/12/29 16:09:53 UTC

Replication Follow-up

Hello, hope everyone's holiday break was/is going well? (If you are lucky
enough to have one).

My busy season (mobile advertising) will start to ease up next week and I
will get to have some down time.

We use Kafka in production now and are looking to up it's usage but really
need/want replication and outside of work it is an interesting problem to
work on.

So, I wanted to follow-up on the replication work and what I can do to
help?  I was not sure where this had left off and if the work had already
started or what.

If not, is it good to have maybe have a conference call or something to go
through it and divy things up?  If it has started let me know where I can
jump in please.

Thanks!!!

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/

Re: Replication Follow-up

Posted by Jay Kreps <ja...@gmail.com>.

Yeah it might be good to announce the replication discussion time in a
separate thread (when we have a time) so people can find it.

-Jay

On Tue, Jan 3, 2012 at 8:24 AM, Andrew Psaltis <Andrew.Psaltis@webtrends.com
> wrote:

> Jun,
> I would be interested in participating in the replication skype call and
> helping out where possible.
>
> Thanks,
> Andrew
>
> -----Original Message-----
> From: Jun Rao [mailto:junrao@gmail.com]
> Sent: Thursday, December 29, 2011 10:05 AM
> To: kafka-users@incubator.apache.org
> Subject: Re: Replication Follow-up
>
> Hi, Joe,
>
> Thanks for your interest. We haven't started coding replication yet. There
> is a bit more discussion on dependency in kafka-50 and some details in
> kafka-47. Also, as part of the replication work, we have been discussing
> wire format changes (
> https://cwiki.apache.org/confluence/display/KAFKA/New+Wire+Format+Proposal
> ).
>
> To get things started, my thinking is to do kafka-47 first, followed by
> another jira that makes use of the new data structures in ZK, but only
> supports replication factor 1 (basically making partitions logical and
> global within the cluster).
>
> We plan to start coding the replication feature in the new year. We can
> setup a skype call in the first week of 2012. Anyone else is interested?
>
> Thanks,
>
> Jun
>
> On Thu, Dec 29, 2011 at 7:09 AM, Joe Stein <cr...@gmail.com> wrote:
>
> > Hello, hope everyone's holiday break was/is going well? (If you are lucky
> > enough to have one).
> >
> > My busy season (mobile advertising) will start to ease up next week and I
> > will get to have some down time.
> >
> > We use Kafka in production now and are looking to up it's usage but
> really
> > need/want replication and outside of work it is an interesting problem to
> > work on.
> >
> > So, I wanted to follow-up on the replication work and what I can do to
> > help?  I was not sure where this had left off and if the work had already
> > started or what.
> >
> > If not, is it good to have maybe have a conference call or something to
> go
> > through it and divy things up?  If it has started let me know where I can
> > jump in please.
> >
> > Thanks!!!
> >
> > /*
> > Joe Stein
> > http://www.linkedin.com/in/charmalloc
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > */
> >
>

RE: Replication Follow-up

Posted by Andrew Psaltis <An...@Webtrends.com>.

Jun,
I would be interested in participating in the replication skype call and helping out where possible.

Thanks,
Andrew

-----Original Message-----
From: Jun Rao [mailto:junrao@gmail.com] 
Sent: Thursday, December 29, 2011 10:05 AM
To: kafka-users@incubator.apache.org
Subject: Re: Replication Follow-up

Hi, Joe,

Thanks for your interest. We haven't started coding replication yet. There
is a bit more discussion on dependency in kafka-50 and some details in
kafka-47. Also, as part of the replication work, we have been discussing
wire format changes (
https://cwiki.apache.org/confluence/display/KAFKA/New+Wire+Format+Proposal).

To get things started, my thinking is to do kafka-47 first, followed by
another jira that makes use of the new data structures in ZK, but only
supports replication factor 1 (basically making partitions logical and
global within the cluster).

We plan to start coding the replication feature in the new year. We can
setup a skype call in the first week of 2012. Anyone else is interested?

Thanks,

Jun

On Thu, Dec 29, 2011 at 7:09 AM, Joe Stein <cr...@gmail.com> wrote:

> Hello, hope everyone's holiday break was/is going well? (If you are lucky
> enough to have one).
>
> My busy season (mobile advertising) will start to ease up next week and I
> will get to have some down time.
>
> We use Kafka in production now and are looking to up it's usage but really
> need/want replication and outside of work it is an interesting problem to
> work on.
>
> So, I wanted to follow-up on the replication work and what I can do to
> help?  I was not sure where this had left off and if the work had already
> started or what.
>
> If not, is it good to have maybe have a conference call or something to go
> through it and divy things up?  If it has started let me know where I can
> jump in please.
>
> Thanks!!!
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> */
>

Re: Replication Follow-up

Posted by Jun Rao <ju...@gmail.com>.

Hi, Joe,

Thanks for your interest. We haven't started coding replication yet. There
is a bit more discussion on dependency in kafka-50 and some details in
kafka-47. Also, as part of the replication work, we have been discussing
wire format changes (
https://cwiki.apache.org/confluence/display/KAFKA/New+Wire+Format+Proposal).

To get things started, my thinking is to do kafka-47 first, followed by
another jira that makes use of the new data structures in ZK, but only
supports replication factor 1 (basically making partitions logical and
global within the cluster).

We plan to start coding the replication feature in the new year. We can
setup a skype call in the first week of 2012. Anyone else is interested?

Thanks,

Jun

On Thu, Dec 29, 2011 at 7:09 AM, Joe Stein <cr...@gmail.com> wrote:

> Hello, hope everyone's holiday break was/is going well? (If you are lucky
> enough to have one).
>
> My busy season (mobile advertising) will start to ease up next week and I
> will get to have some down time.
>
> We use Kafka in production now and are looking to up it's usage but really
> need/want replication and outside of work it is an interesting problem to
> work on.
>
> So, I wanted to follow-up on the replication work and what I can do to
> help?  I was not sure where this had left off and if the work had already
> started or what.
>
> If not, is it good to have maybe have a conference call or something to go
> through it and divy things up?  If it has started let me know where I can
> jump in please.
>
> Thanks!!!
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> */
>

Re: Replication Follow-up

Posted by Matt Abrams <ab...@clearspring.com>.

Joe -

That sounds good.  I'm highly interested in your use of MapR for HA
NFS.  We've  also considered using it.  I'll work on getting the code
for the repeater cleaned up and push it to github soon.  Maybe at that
point we can have a chat about your use cases for the repeater and I
could get some feedback on how MapR is working out for you?

Jun - please include me on the skype call regarding replication.

Thanks,

Matt

On Thu, Dec 29, 2011 at 10:35 AM, Joe Stein <cr...@gmail.com> wrote:
> I started using MapR last week in pilot mode and I have been considering
> using it's HA NFS as the underlying file system for my Kafka brokers (on a
> 10Gb network) until replication is live.  For my use case it is all about
> data reliability and high availability so having kafka guarantee this if
> paramount.
>
> Your solution sounds interesting for another use case we have where we (in
> the consumer) will get an error (for lots of reasons) and then have to
> reprocess what we consumed... right now we don't do this very well and
> something I have on the to-be-do list and what you have might be useful for
> that use case (except i would not send it to another kafka server just back
> into the same one probably on another topic but not always, I would be
> using zookeeper too) so maybe we can collaborate on the code
> for separate use cases?
>
>
> On Thu, Dec 29, 2011 at 10:24 AM, Matt Abrams <ab...@clearspring.com>wrote:
>
>> Joe -
>>
>> I am interested in replication as well.  We are going live with our
>> implementation in the next few days.
>>
>> In our setup events are fed into one of two datacenters (depending on
>> which is closest to the client).   We wanted to be able to process all
>> of the data in a single DC regardless of which DC the event originally
>> arrived in.  So we wrote small project to make this possible.
>>
>> The project uses a fairly simple process that consumes data from one
>> Kafka broker and then repeats the data to another Kafka broker (in our
>> case the 'other' broker is in the remote data center).  ZooKeeper is
>> used to identify peers and detect failures, new nodes, etc.  This
>> isn't really replication but can be used repeat all data from one
>> broker onto another and works for our use case.  If there is general
>> interest in this project we can work on open sourcing it.
>>
>> Matt
>>
>> On Thu, Dec 29, 2011 at 10:09 AM, Joe Stein <cr...@gmail.com> wrote:
>> > Hello, hope everyone's holiday break was/is going well? (If you are lucky
>> > enough to have one).
>> >
>> > My busy season (mobile advertising) will start to ease up next week and I
>> > will get to have some down time.
>> >
>> > We use Kafka in production now and are looking to up it's usage but
>> really
>> > need/want replication and outside of work it is an interesting problem to
>> > work on.
>> >
>> > So, I wanted to follow-up on the replication work and what I can do to
>> > help?  I was not sure where this had left off and if the work had already
>> > started or what.
>> >
>> > If not, is it good to have maybe have a conference call or something to
>> go
>> > through it and divy things up?  If it has started let me know where I can
>> > jump in please.
>> >
>> > Thanks!!!
>> >
>> > /*
>> > Joe Stein
>> > http://www.linkedin.com/in/charmalloc
>> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> > */
>>
>
>
>
> --
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> */

Re: Replication Follow-up

Posted by Joe Stein <cr...@gmail.com>.

I started using MapR last week in pilot mode and I have been considering
using it's HA NFS as the underlying file system for my Kafka brokers (on a
10Gb network) until replication is live.  For my use case it is all about
data reliability and high availability so having kafka guarantee this if
paramount.

Your solution sounds interesting for another use case we have where we (in
the consumer) will get an error (for lots of reasons) and then have to
reprocess what we consumed... right now we don't do this very well and
something I have on the to-be-do list and what you have might be useful for
that use case (except i would not send it to another kafka server just back
into the same one probably on another topic but not always, I would be
using zookeeper too) so maybe we can collaborate on the code
for separate use cases?

On Thu, Dec 29, 2011 at 10:24 AM, Matt Abrams <ab...@clearspring.com>wrote:

> Joe -
>
> I am interested in replication as well.  We are going live with our
> implementation in the next few days.
>
> In our setup events are fed into one of two datacenters (depending on
> which is closest to the client).   We wanted to be able to process all
> of the data in a single DC regardless of which DC the event originally
> arrived in.  So we wrote small project to make this possible.
>
> The project uses a fairly simple process that consumes data from one
> Kafka broker and then repeats the data to another Kafka broker (in our
> case the 'other' broker is in the remote data center).  ZooKeeper is
> used to identify peers and detect failures, new nodes, etc.  This
> isn't really replication but can be used repeat all data from one
> broker onto another and works for our use case.  If there is general
> interest in this project we can work on open sourcing it.
>
> Matt
>
> On Thu, Dec 29, 2011 at 10:09 AM, Joe Stein <cr...@gmail.com> wrote:
> > Hello, hope everyone's holiday break was/is going well? (If you are lucky
> > enough to have one).
> >
> > My busy season (mobile advertising) will start to ease up next week and I
> > will get to have some down time.
> >
> > We use Kafka in production now and are looking to up it's usage but
> really
> > need/want replication and outside of work it is an interesting problem to
> > work on.
> >
> > So, I wanted to follow-up on the replication work and what I can do to
> > help?  I was not sure where this had left off and if the work had already
> > started or what.
> >
> > If not, is it good to have maybe have a conference call or something to
> go
> > through it and divy things up?  If it has started let me know where I can
> > jump in please.
> >
> > Thanks!!!
> >
> > /*
> > Joe Stein
> > http://www.linkedin.com/in/charmalloc
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > */
>

-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/

Re: Replication Follow-up

Posted by Matt Abrams <ab...@clearspring.com>.

Joe -

I am interested in replication as well.  We are going live with our
implementation in the next few days.

In our setup events are fed into one of two datacenters (depending on
which is closest to the client).   We wanted to be able to process all
of the data in a single DC regardless of which DC the event originally
arrived in.  So we wrote small project to make this possible.

The project uses a fairly simple process that consumes data from one
Kafka broker and then repeats the data to another Kafka broker (in our
case the 'other' broker is in the remote data center).  ZooKeeper is
used to identify peers and detect failures, new nodes, etc.  This
isn't really replication but can be used repeat all data from one
broker onto another and works for our use case.  If there is general
interest in this project we can work on open sourcing it.

Matt

On Thu, Dec 29, 2011 at 10:09 AM, Joe Stein <cr...@gmail.com> wrote:
> Hello, hope everyone's holiday break was/is going well? (If you are lucky
> enough to have one).
>
> My busy season (mobile advertising) will start to ease up next week and I
> will get to have some down time.
>
> We use Kafka in production now and are looking to up it's usage but really
> need/want replication and outside of work it is an interesting problem to
> work on.
>
> So, I wanted to follow-up on the replication work and what I can do to
> help?  I was not sure where this had left off and if the work had already
> started or what.
>
> If not, is it good to have maybe have a conference call or something to go
> through it and divy things up?  If it has started let me know where I can
> jump in please.
>
> Thanks!!!
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> */