You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Roman Garcia <rg...@dridco.com> on 2011/09/09 20:57:40 UTC

Re: HA / failover

Sorry Jun that it took me so long to reply.
There's still one thing I don't get:

>> There is one offset per topic/partition, if a partition is not available
because a broker is down, its offset in the consumer won't grow anymore.

So, because I want HA, I set up 2 brokers to attend same topic/partition, right?

using zk-producer  msgs will be sent only to one of those 2 brokers? Or will it balance randomly?

If one of those 2 brokers is down, producer will start sending messages to the one alive?
 
Example:
Start zk x 3, kafka x 2 (first run), 1 zk-producer, 1 zk-consumer

Produce msgs 1 & 2
Consume msg 1
kafka A fails -> consumer now reads kafka B
Produce msgs 3 & 4
Consume msgs 3,4
Kafka A is started
Consumer sees it, but won't ask for msg 2

Makes sense?

PS: I'm trying to understand how linkedin manages HA with sensei + kafka...sorry!


----- Mensaje original -----
De: Jun Rao [mailto:junrao@gmail.com]
Enviado: Tuesday, August 30, 2011 03:46 PM
Para: kafka-users@incubator.apache.org <ka...@incubator.apache.org>
Asunto: Re: HA / failover

See my inlined reply below.

Thanks,

Jun


On Tue, Aug 30, 2011 at 8:36 AM, Roman Garcia <rg...@dridco.com> wrote:

> >> Roman,
> Without replication, Kafka can lose messages permanently if the
> underlying storage system is damaged. Setting that aside, there are 2
> ways that you can achieve HA now. In either case, you need to set up a
> Kafka cluster with at least 2 brokers.
>
> Thanks for the clarification Jun. But even then, with replication, you
> could still lose messages, right?
>
>
If you do synchronous replication with replication factor >1 and there is
only 1 failure, you won't lose any messages.


> >> [...] Unconsumed messages on that broker will not be available for
> consumption until the broker comes up again.
>
> How does a Consumer fetch those "old" messages, given that it did
> already fetch "new" messages at a higher offset? What am I missing?
>

There is one offset per topic/partition, if a partition is not available
because a broker is down, its offset in the consumer won't grow anymore.


>
> >> The second approach is to use the built-in ZK-based software load
> balancer in Kafka (by setting zk.connect in the producer config). In
> this case, we rely on ZK to detect broker failures.
>
> This is the approach I've tried. I did use zj.connect.
> I started all locally:
> - 2 Kafka brokers (broker id=0 & 1, single partition)
> - 3 zookeeper nodes (all of these on a single box) with different
> election ports and different fs paths/ids.
> - 5 producer threads sending <1k msgs
>
> Then I killed one of the Kafka brokers, and all my producer threads
> died.
>
>
That could be a bug. Are you using trunk? Any errors/exceptions in the log?


> What I'm I doing wrong?
>
>
> Thanks!
> Roman
>
>
> -----Original Message-----
> From: Jun Rao [mailto:junrao@gmail.com]
> Sent: Tuesday, August 30, 2011 11:44 AM
> To: kafka-users@incubator.apache.org
> Subject: Re: HA / failover
>
> Roman,
>
> Without replication, Kafka can lose messages permanently if the
> underlying storage system is damaged. Setting that aside, there are 2
> ways that you can achieve HA now. In either case, you need to set up a
> Kafka cluster with at least 2 brokers.
>
> The first approach is to put the hosts of all Kafka brokers in a VIP and
> rely on a hardware load balancer to do health check and routing. In the
> case, all producers send data through the VIP. If one of the brokers is
> down temporarily, the load balancer will direct the produce requests to
> the rest of the brokers. Unconsumed messages on that broker will not be
> available for consumption until the broker comes up again.
>
>  The second approach is to use the built-in ZK-based software load
> balancer in Kafka (by setting zk.connect in the producer config). In
> this case, we rely on ZK to detect broker failures.
>
> Thanks,
>
> Jun
>
> On Tue, Aug 30, 2011 at 7:18 AM, Roman Garcia <rg...@dridco.com>
> wrote:
>
> > Hi, I'm trying to figure out how my prod environment should look like,
>
> > and still I don't seem to understand how to achieve HA / FO
> conditions.
> >
> > I realize this is going to be fully supported once there is
> > replication, right?
> >
> > But what about right now? How do you guys achieve this?
> >
> > I understand at least LinkedIn has a Kafka cluster deployed.
> >
> > - How do you guys ensure no messages get lost before flush to disk
> happens?
> >
> > - How did you manage to always have a broker available and redirect
> > producers to those during failure?
> > I've tried using Producer class with "sync" type and zookeeper, and
> > killing one of two brokers, but I've got an exception. Should I handle
>
> > and retry then?
> >
> > So, to sum up, any pointer on how should I setup a prod env will be
> > appreciated! Any doc I might have missed or a simple short example
> > would help.
> > Thanks!
> > Roman
> >
>

Re: HA / failover

Posted by Jun Rao <ju...@gmail.com>.
Roman,

You understanding is right. If you use ZK producer and have 2 brokers, when
1 broker fails, the producer will send new messages to the surviving broker.
It's just that unconsumed data in the failed broker won't be available until
it comes back.

Jun

On Fri, Sep 9, 2011 at 11:57 AM, Roman Garcia <rg...@dridco.com> wrote:

> Sorry Jun that it took me so long to reply.
> There's still one thing I don't get:
>
> >> There is one offset per topic/partition, if a partition is not available
> because a broker is down, its offset in the consumer won't grow anymore.
>
> So, because I want HA, I set up 2 brokers to attend same topic/partition,
> right?
>
> using zk-producer  msgs will be sent only to one of those 2 brokers? Or
> will it balance randomly?
>
> If one of those 2 brokers is down, producer will start sending messages to
> the one alive?
>
> Example:
> Start zk x 3, kafka x 2 (first run), 1 zk-producer, 1 zk-consumer
>
> Produce msgs 1 & 2
> Consume msg 1
> kafka A fails -> consumer now reads kafka B
> Produce msgs 3 & 4
> Consume msgs 3,4
> Kafka A is started
> Consumer sees it, but won't ask for msg 2
>
> Makes sense?
>
> PS: I'm trying to understand how linkedin manages HA with sensei +
> kafka...sorry!
>
>
> ----- Mensaje original -----
> De: Jun Rao [mailto:junrao@gmail.com]
> Enviado: Tuesday, August 30, 2011 03:46 PM
> Para: kafka-users@incubator.apache.org <ka...@incubator.apache.org>
> Asunto: Re: HA / failover
>
> See my inlined reply below.
>
> Thanks,
>
> Jun
>
>
> On Tue, Aug 30, 2011 at 8:36 AM, Roman Garcia <rg...@dridco.com> wrote:
>
> > >> Roman,
> > Without replication, Kafka can lose messages permanently if the
> > underlying storage system is damaged. Setting that aside, there are 2
> > ways that you can achieve HA now. In either case, you need to set up a
> > Kafka cluster with at least 2 brokers.
> >
> > Thanks for the clarification Jun. But even then, with replication, you
> > could still lose messages, right?
> >
> >
> If you do synchronous replication with replication factor >1 and there is
> only 1 failure, you won't lose any messages.
>
>
> > >> [...] Unconsumed messages on that broker will not be available for
> > consumption until the broker comes up again.
> >
> > How does a Consumer fetch those "old" messages, given that it did
> > already fetch "new" messages at a higher offset? What am I missing?
> >
>
> There is one offset per topic/partition, if a partition is not available
> because a broker is down, its offset in the consumer won't grow anymore.
>
>
> >
> > >> The second approach is to use the built-in ZK-based software load
> > balancer in Kafka (by setting zk.connect in the producer config). In
> > this case, we rely on ZK to detect broker failures.
> >
> > This is the approach I've tried. I did use zj.connect.
> > I started all locally:
> > - 2 Kafka brokers (broker id=0 & 1, single partition)
> > - 3 zookeeper nodes (all of these on a single box) with different
> > election ports and different fs paths/ids.
> > - 5 producer threads sending <1k msgs
> >
> > Then I killed one of the Kafka brokers, and all my producer threads
> > died.
> >
> >
> That could be a bug. Are you using trunk? Any errors/exceptions in the log?
>
>
> > What I'm I doing wrong?
> >
> >
> > Thanks!
> > Roman
> >
> >
> > -----Original Message-----
> > From: Jun Rao [mailto:junrao@gmail.com]
> > Sent: Tuesday, August 30, 2011 11:44 AM
> > To: kafka-users@incubator.apache.org
> > Subject: Re: HA / failover
> >
> > Roman,
> >
> > Without replication, Kafka can lose messages permanently if the
> > underlying storage system is damaged. Setting that aside, there are 2
> > ways that you can achieve HA now. In either case, you need to set up a
> > Kafka cluster with at least 2 brokers.
> >
> > The first approach is to put the hosts of all Kafka brokers in a VIP and
> > rely on a hardware load balancer to do health check and routing. In the
> > case, all producers send data through the VIP. If one of the brokers is
> > down temporarily, the load balancer will direct the produce requests to
> > the rest of the brokers. Unconsumed messages on that broker will not be
> > available for consumption until the broker comes up again.
> >
> >  The second approach is to use the built-in ZK-based software load
> > balancer in Kafka (by setting zk.connect in the producer config). In
> > this case, we rely on ZK to detect broker failures.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Aug 30, 2011 at 7:18 AM, Roman Garcia <rg...@dridco.com>
> > wrote:
> >
> > > Hi, I'm trying to figure out how my prod environment should look like,
> >
> > > and still I don't seem to understand how to achieve HA / FO
> > conditions.
> > >
> > > I realize this is going to be fully supported once there is
> > > replication, right?
> > >
> > > But what about right now? How do you guys achieve this?
> > >
> > > I understand at least LinkedIn has a Kafka cluster deployed.
> > >
> > > - How do you guys ensure no messages get lost before flush to disk
> > happens?
> > >
> > > - How did you manage to always have a broker available and redirect
> > > producers to those during failure?
> > > I've tried using Producer class with "sync" type and zookeeper, and
> > > killing one of two brokers, but I've got an exception. Should I handle
> >
> > > and retry then?
> > >
> > > So, to sum up, any pointer on how should I setup a prod env will be
> > > appreciated! Any doc I might have missed or a simple short example
> > > would help.
> > > Thanks!
> > > Roman
> > >
> >
>