You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Weide Zhang <we...@gmail.com> on 2014/08/02 00:20:39 UTC

kafka consumer fail over

Hi,

I have a use case for a master slave  cluster where the logic inside master
need to consume data from kafka and publish some aggregated data to kafka
again. When master dies, slave need to take the latest committed offset
from master and continue consuming the data from kafka and doing the push.

My questions is what will be easiest kafka consumer design for this
scenario to work ? I was thinking about using simpleconsumer and doing
manual consumer offset syncing between master and slave. That seems to
solve the problem but I was wondering if it can be achieved by using high
level consumer client ?

Thanks,

Weide

Re: kafka consumer fail over

Posted by Weide Zhang <we...@gmail.com>.
Thanks a lot Guozhang and Daniel.

Weide


On Mon, Aug 4, 2014 at 8:27 AM, Guozhang Wang <wa...@gmail.com> wrote:

> Weide,
>
> Like Daniel said, the rebalance logic is deterministic as round robin, so
> if you have a total number of partitions as n, and each one (master or
> slave) machine also has n threads, then all partitions will go to master.
> When master fails and restarts, the partitions will automatically go back
> to the old master.
>
>
> On Sun, Aug 3, 2014 at 6:25 AM, Daniel Compton <de...@danielcompton.net>
> wrote:
>
> > Hi Weide
> >
> > The consumer rebalancing algorithm is deterministic. In your failure
> > scenario, when A comes back up again, the consumer threads will
> rebalance.
> > This will give you the initial consumer configuration at the start of the
> > test.
> >
> > I'm unsure whether the partitions are balanced round robin, or if they
> > will all go to A, then the overflow to B.
> >
> > If all of the messages need to be processed by a single machine, an
> > alternative architecture would be to have a standby server that waits
> until
> > master A fails and then connects as a consumer. This could be
> accomplished
> > by watching Zookeeper and getting a notification when A's ephemeral node
> is
> > removed.
> >
> > The high level consumer does seem to be the way to go as long as your
> > application can handle duplicate processing.
> >
> > Daniel.
> >
> > > On 2/08/2014, at 1:38 pm, Weide Zhang <we...@gmail.com> wrote:
> > >
> > > Hi Guozhang,
> > >
> > > If I use high level consumer, how do I ensure all data goes to master
> > even
> > > if slave was up and running ? Is it just by forcing master to have
> enough
> > > consumer thread to cover maximum number of partitions of a topic  since
> > > high level consumer doesn't have assumption of consumers who are master
> > and
> > > consumers who are slave.
> > >
> > > For example, master A initiate enough thread such that it can cover all
> > the
> > > partitions. slave B is standby with same consumer group and same number
> > of
> > > threads but since master A has enough thread to cover all the
> partitions.
> > > Slave B won't get any data.
> > >
> > > Suddenly master A goes down, slave B becomes new master, and it start
> to
> > > get data based on high level consumer rebalance design.
> > >
> > > After that old master A comes up and becomes slave, will A get data ?
> >  Or A
> > > will not get data because B has enough thread to cover all partitions
> in
> > > the rebalancing logic.
> > >
> > > Thanks,
> > >
> > > Weide
> > >
> > >
> > >> On Fri, Aug 1, 2014 at 4:45 PM, Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >>
> > >> Hello Weide,
> > >>
> > >> That should be doable via high-level consumer, you can take a look at
> > this
> > >> page:
> > >>
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > >>
> > >> Guozhang
> > >>
> > >>
> > >>> On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang <we...@gmail.com>
> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> I have a use case for a master slave  cluster where the logic inside
> > >> master
> > >>> need to consume data from kafka and publish some aggregated data to
> > kafka
> > >>> again. When master dies, slave need to take the latest committed
> offset
> > >>> from master and continue consuming the data from kafka and doing the
> > >> push.
> > >>>
> > >>> My questions is what will be easiest kafka consumer design for this
> > >>> scenario to work ? I was thinking about using simpleconsumer and
> doing
> > >>> manual consumer offset syncing between master and slave. That seems
> to
> > >>> solve the problem but I was wondering if it can be achieved by using
> > high
> > >>> level consumer client ?
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Weide
> > >>
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> >
>
>
>
> --
> -- Guozhang
>

Re: kafka consumer fail over

Posted by Guozhang Wang <wa...@gmail.com>.
Weide,

Like Daniel said, the rebalance logic is deterministic as round robin, so
if you have a total number of partitions as n, and each one (master or
slave) machine also has n threads, then all partitions will go to master.
When master fails and restarts, the partitions will automatically go back
to the old master.


On Sun, Aug 3, 2014 at 6:25 AM, Daniel Compton <de...@danielcompton.net>
wrote:

> Hi Weide
>
> The consumer rebalancing algorithm is deterministic. In your failure
> scenario, when A comes back up again, the consumer threads will rebalance.
> This will give you the initial consumer configuration at the start of the
> test.
>
> I'm unsure whether the partitions are balanced round robin, or if they
> will all go to A, then the overflow to B.
>
> If all of the messages need to be processed by a single machine, an
> alternative architecture would be to have a standby server that waits until
> master A fails and then connects as a consumer. This could be accomplished
> by watching Zookeeper and getting a notification when A's ephemeral node is
> removed.
>
> The high level consumer does seem to be the way to go as long as your
> application can handle duplicate processing.
>
> Daniel.
>
> > On 2/08/2014, at 1:38 pm, Weide Zhang <we...@gmail.com> wrote:
> >
> > Hi Guozhang,
> >
> > If I use high level consumer, how do I ensure all data goes to master
> even
> > if slave was up and running ? Is it just by forcing master to have enough
> > consumer thread to cover maximum number of partitions of a topic  since
> > high level consumer doesn't have assumption of consumers who are master
> and
> > consumers who are slave.
> >
> > For example, master A initiate enough thread such that it can cover all
> the
> > partitions. slave B is standby with same consumer group and same number
> of
> > threads but since master A has enough thread to cover all the partitions.
> > Slave B won't get any data.
> >
> > Suddenly master A goes down, slave B becomes new master, and it start to
> > get data based on high level consumer rebalance design.
> >
> > After that old master A comes up and becomes slave, will A get data ?
>  Or A
> > will not get data because B has enough thread to cover all partitions in
> > the rebalancing logic.
> >
> > Thanks,
> >
> > Weide
> >
> >
> >> On Fri, Aug 1, 2014 at 4:45 PM, Guozhang Wang <wa...@gmail.com>
> wrote:
> >>
> >> Hello Weide,
> >>
> >> That should be doable via high-level consumer, you can take a look at
> this
> >> page:
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> >>
> >> Guozhang
> >>
> >>
> >>> On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang <we...@gmail.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I have a use case for a master slave  cluster where the logic inside
> >> master
> >>> need to consume data from kafka and publish some aggregated data to
> kafka
> >>> again. When master dies, slave need to take the latest committed offset
> >>> from master and continue consuming the data from kafka and doing the
> >> push.
> >>>
> >>> My questions is what will be easiest kafka consumer design for this
> >>> scenario to work ? I was thinking about using simpleconsumer and doing
> >>> manual consumer offset syncing between master and slave. That seems to
> >>> solve the problem but I was wondering if it can be achieved by using
> high
> >>> level consumer client ?
> >>>
> >>> Thanks,
> >>>
> >>> Weide
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
>



-- 
-- Guozhang

Re: kafka consumer fail over

Posted by Daniel Compton <de...@danielcompton.net>.
Hi Weide

The consumer rebalancing algorithm is deterministic. In your failure scenario, when A comes back up again, the consumer threads will rebalance. This will give you the initial consumer configuration at the start of the test. 

I'm unsure whether the partitions are balanced round robin, or if they will all go to A, then the overflow to B. 

If all of the messages need to be processed by a single machine, an alternative architecture would be to have a standby server that waits until master A fails and then connects as a consumer. This could be accomplished by watching Zookeeper and getting a notification when A's ephemeral node is removed. 

The high level consumer does seem to be the way to go as long as your application can handle duplicate processing. 

Daniel.

> On 2/08/2014, at 1:38 pm, Weide Zhang <we...@gmail.com> wrote:
> 
> Hi Guozhang,
> 
> If I use high level consumer, how do I ensure all data goes to master even
> if slave was up and running ? Is it just by forcing master to have enough
> consumer thread to cover maximum number of partitions of a topic  since
> high level consumer doesn't have assumption of consumers who are master and
> consumers who are slave.
> 
> For example, master A initiate enough thread such that it can cover all the
> partitions. slave B is standby with same consumer group and same number of
> threads but since master A has enough thread to cover all the partitions.
> Slave B won't get any data.
> 
> Suddenly master A goes down, slave B becomes new master, and it start to
> get data based on high level consumer rebalance design.
> 
> After that old master A comes up and becomes slave, will A get data ?  Or A
> will not get data because B has enough thread to cover all partitions in
> the rebalancing logic.
> 
> Thanks,
> 
> Weide
> 
> 
>> On Fri, Aug 1, 2014 at 4:45 PM, Guozhang Wang <wa...@gmail.com> wrote:
>> 
>> Hello Weide,
>> 
>> That should be doable via high-level consumer, you can take a look at this
>> page:
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
>> 
>> Guozhang
>> 
>> 
>>> On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang <we...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I have a use case for a master slave  cluster where the logic inside
>> master
>>> need to consume data from kafka and publish some aggregated data to kafka
>>> again. When master dies, slave need to take the latest committed offset
>>> from master and continue consuming the data from kafka and doing the
>> push.
>>> 
>>> My questions is what will be easiest kafka consumer design for this
>>> scenario to work ? I was thinking about using simpleconsumer and doing
>>> manual consumer offset syncing between master and slave. That seems to
>>> solve the problem but I was wondering if it can be achieved by using high
>>> level consumer client ?
>>> 
>>> Thanks,
>>> 
>>> Weide
>> 
>> 
>> 
>> --
>> -- Guozhang
>> 

Re: kafka consumer fail over

Posted by Weide Zhang <we...@gmail.com>.
Hi Guozhang,

If I use high level consumer, how do I ensure all data goes to master even
if slave was up and running ? Is it just by forcing master to have enough
consumer thread to cover maximum number of partitions of a topic  since
high level consumer doesn't have assumption of consumers who are master and
consumers who are slave.

For example, master A initiate enough thread such that it can cover all the
partitions. slave B is standby with same consumer group and same number of
threads but since master A has enough thread to cover all the partitions.
Slave B won't get any data.

Suddenly master A goes down, slave B becomes new master, and it start to
get data based on high level consumer rebalance design.

After that old master A comes up and becomes slave, will A get data ?  Or A
will not get data because B has enough thread to cover all partitions in
the rebalancing logic.

Thanks,

Weide


On Fri, Aug 1, 2014 at 4:45 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Hello Weide,
>
> That should be doable via high-level consumer, you can take a look at this
> page:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
>
> Guozhang
>
>
> On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang <we...@gmail.com> wrote:
>
> > Hi,
> >
> > I have a use case for a master slave  cluster where the logic inside
> master
> > need to consume data from kafka and publish some aggregated data to kafka
> > again. When master dies, slave need to take the latest committed offset
> > from master and continue consuming the data from kafka and doing the
> push.
> >
> > My questions is what will be easiest kafka consumer design for this
> > scenario to work ? I was thinking about using simpleconsumer and doing
> > manual consumer offset syncing between master and slave. That seems to
> > solve the problem but I was wondering if it can be achieved by using high
> > level consumer client ?
> >
> > Thanks,
> >
> > Weide
> >
>
>
>
> --
> -- Guozhang
>

Re: kafka consumer fail over

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Weide,

That should be doable via high-level consumer, you can take a look at this
page:

https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example

Guozhang


On Fri, Aug 1, 2014 at 3:20 PM, Weide Zhang <we...@gmail.com> wrote:

> Hi,
>
> I have a use case for a master slave  cluster where the logic inside master
> need to consume data from kafka and publish some aggregated data to kafka
> again. When master dies, slave need to take the latest committed offset
> from master and continue consuming the data from kafka and doing the push.
>
> My questions is what will be easiest kafka consumer design for this
> scenario to work ? I was thinking about using simpleconsumer and doing
> manual consumer offset syncing between master and slave. That seems to
> solve the problem but I was wondering if it can be achieved by using high
> level consumer client ?
>
> Thanks,
>
> Weide
>



-- 
-- Guozhang