You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Prabhjot Bharaj <pr...@gmail.com> on 2015/11/26 12:13:23 UTC

producer-consumer issues during deployments

Hi,

We arrange our kafka machines in groups and deploy these phases.

For kafka, we’ll have to map groups with phases. During each phase of the
release, all the machines in that group can go down.

When this happens, there are a couple of cases:-

   1. All replicas are residing in a group of machines which will all go
   down in this phase
      - Affect on Producer –
         - What happens to the produce requests (whether produce can
         dynamically keep writing to the remaining partitions now)
         - What happens to the already queued requests which were being
         sent to the earlier replicas – they will fail (we’ll have to
use producer
         callback feature to take care of retrying in case the above step
         works fine)
      - Affect on Consumer -
         - Can the consumers consume from a lesser number of partitions?
         - Does the consumer 'consume' api gives any callback/failure when
         all replicas of a partition go down?

If you have come across any of the above cases, please provide how you
solved the problem ? or whether everything works just well with Kafka
during deployments and my cases described above are all invalid or handled
by kafka and its clients internally ?

Thanks,
Prabhjot

Re: producer-consumer issues during deployments

Posted by Prabhjot Bharaj <pr...@gmail.com>.
Hi,

Thanks for your reply
We have 4 phases of deploys and in each phase, we can take down few machines
These releases happen every 2 weeks, because on all machines, there are a
bunch of other micro services running along with the core system - Kafka in
this case

My only concern is that during runtime, I.e. between 2 releases, the
replica distribution per topic can become disoriented because of some
restarts or occasional machines failures/reboots

Because of that, the steps that you've mentioned would become operational
nightmare for us.

What I'm looking for is a more automated solution e.g. even if all the
replicas for a partition are down, the producers (running from around 50
machines) should switch to the other available partitions, until this
partition becomes available
Also, on the consumer side, consumers should not fail but keep consuming
from the available partitions until this partition comes up

Is it possible with the new producer and new consumer or high level
consumer?

Thanks,
Prabhjot
On Nov 27, 2015 12:00 AM, "Ben Stopford" <be...@confluent.io> wrote:

> Hi Prabhjot
>
> I may have slightly misunderstood your question so apologies if that’s the
> case. The general approach to releases is to use a rolling upgrade where
> you take one machine offline at a time, restart it, wait for it to come
> online (you can monitor this via JMX) then move onto the next. If you’re
> taking multiple machines offline at the same time you need to be careful
> about where the replicas for those machines reside. You can examine these
> individually for each topic via kafka-topcis.sh.
>
> Regarding your questions the following points may be of use:
>
> - Only one replica (the leader) will be available for writing at any one
> time in Kafka. If you offline machines then Kafka will switch over to use
> replicas on other machines if they are available.
> - The behaviour of produce requests will depend on the acknowledgment
> setting the producer provides, the setting for minimum in sync replicas and
> how many replicas remain standing after the failure. There are a few things
> going on here but they’re explained quite well here <
> http://kafka.apache.org/090/documentation.html#design_ha>.
> - Consumers consume from the leader also so if the leader for a partition
> is online then you will be able to consumer from it. If the leader is on a
> machine that goes offline then consumption will pause whilst leadership
> switches over to a replica.
>
> All the best
> B
>
> > On 26 Nov 2015, at 17:58, Prabhjot Bharaj <pr...@gmail.com> wrote:
> >
> > Hi,
> >
> > Request your expertise on these doubts of mine
> >
> > Thanks,
> > Prabhjot
> >
> > On Thu, Nov 26, 2015 at 4:43 PM, Prabhjot Bharaj <pr...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> We arrange our kafka machines in groups and deploy these phases.
> >>
> >> For kafka, we’ll have to map groups with phases. During each phase of
> the
> >> release, all the machines in that group can go down.
> >>
> >> When this happens, there are a couple of cases:-
> >>
> >>   1. All replicas are residing in a group of machines which will all go
> >>   down in this phase
> >>      - Affect on Producer –
> >>         - What happens to the produce requests (whether produce can
> >>         dynamically keep writing to the remaining partitions now)
> >>         - What happens to the already queued requests which were being
> >>         sent to the earlier replicas – they will fail (we’ll have to
> use producer
> >>         callback feature to take care of retrying in case the above step
> >>         works fine)
> >>      - Affect on Consumer -
> >>         - Can the consumers consume from a lesser number of partitions?
> >>         - Does the consumer 'consume' api gives any callback/failure
> >>         when all replicas of a partition go down?
> >>
> >> If you have come across any of the above cases, please provide how you
> >> solved the problem ? or whether everything works just well with Kafka
> >> during deployments and my cases described above are all invalid or
> handled
> >> by kafka and its clients internally ?
> >>
> >> Thanks,
> >> Prabhjot
> >>
> >
> >
> >
> > --
> > ---------------------------------------------------------
> > "There are only 10 types of people in the world: Those who understand
> > binary, and those who don't"
>
>

Re: producer-consumer issues during deployments

Posted by Prabhjot Bharaj <pr...@gmail.com>.
Hi,

Thanks for your reply
We have 4 phases of deploys and in each phase, we can take down few machines
These releases happen every 2 weeks, because on all machines, there are a
bunch of other micro services running along with the core system - Kafka in
this case

My only concern is that during runtime, I.e. between 2 releases, the
replica distribution per topic can become disoriented because of some
restarts or occasional machines failures/reboots

Because of that, the steps that you've mentioned would become operational
nightmare for us.

What I'm looking for is a more automated solution e.g. even if all the
replicas for a partition are down, the producers (running from around 50
machines) should switch to the other available partitions, until this
partition becomes available
Also, on the consumer side, consumers should not fail but keep consuming
from the available partitions until this partition comes up

Is it possible with the new producer and new consumer or high level
consumer?

Thanks,
Prabhjot
On Nov 27, 2015 12:00 AM, "Ben Stopford" <be...@confluent.io> wrote:

> Hi Prabhjot
>
> I may have slightly misunderstood your question so apologies if that’s the
> case. The general approach to releases is to use a rolling upgrade where
> you take one machine offline at a time, restart it, wait for it to come
> online (you can monitor this via JMX) then move onto the next. If you’re
> taking multiple machines offline at the same time you need to be careful
> about where the replicas for those machines reside. You can examine these
> individually for each topic via kafka-topcis.sh.
>
> Regarding your questions the following points may be of use:
>
> - Only one replica (the leader) will be available for writing at any one
> time in Kafka. If you offline machines then Kafka will switch over to use
> replicas on other machines if they are available.
> - The behaviour of produce requests will depend on the acknowledgment
> setting the producer provides, the setting for minimum in sync replicas and
> how many replicas remain standing after the failure. There are a few things
> going on here but they’re explained quite well here <
> http://kafka.apache.org/090/documentation.html#design_ha>.
> - Consumers consume from the leader also so if the leader for a partition
> is online then you will be able to consumer from it. If the leader is on a
> machine that goes offline then consumption will pause whilst leadership
> switches over to a replica.
>
> All the best
> B
>
> > On 26 Nov 2015, at 17:58, Prabhjot Bharaj <pr...@gmail.com> wrote:
> >
> > Hi,
> >
> > Request your expertise on these doubts of mine
> >
> > Thanks,
> > Prabhjot
> >
> > On Thu, Nov 26, 2015 at 4:43 PM, Prabhjot Bharaj <pr...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> We arrange our kafka machines in groups and deploy these phases.
> >>
> >> For kafka, we’ll have to map groups with phases. During each phase of
> the
> >> release, all the machines in that group can go down.
> >>
> >> When this happens, there are a couple of cases:-
> >>
> >>   1. All replicas are residing in a group of machines which will all go
> >>   down in this phase
> >>      - Affect on Producer –
> >>         - What happens to the produce requests (whether produce can
> >>         dynamically keep writing to the remaining partitions now)
> >>         - What happens to the already queued requests which were being
> >>         sent to the earlier replicas – they will fail (we’ll have to
> use producer
> >>         callback feature to take care of retrying in case the above step
> >>         works fine)
> >>      - Affect on Consumer -
> >>         - Can the consumers consume from a lesser number of partitions?
> >>         - Does the consumer 'consume' api gives any callback/failure
> >>         when all replicas of a partition go down?
> >>
> >> If you have come across any of the above cases, please provide how you
> >> solved the problem ? or whether everything works just well with Kafka
> >> during deployments and my cases described above are all invalid or
> handled
> >> by kafka and its clients internally ?
> >>
> >> Thanks,
> >> Prabhjot
> >>
> >
> >
> >
> > --
> > ---------------------------------------------------------
> > "There are only 10 types of people in the world: Those who understand
> > binary, and those who don't"
>
>

Re: producer-consumer issues during deployments

Posted by Ben Stopford <be...@confluent.io>.
Hi Prabhjot

I may have slightly misunderstood your question so apologies if that’s the case. The general approach to releases is to use a rolling upgrade where you take one machine offline at a time, restart it, wait for it to come online (you can monitor this via JMX) then move onto the next. If you’re taking multiple machines offline at the same time you need to be careful about where the replicas for those machines reside. You can examine these individually for each topic via kafka-topcis.sh. 

Regarding your questions the following points may be of use:

- Only one replica (the leader) will be available for writing at any one time in Kafka. If you offline machines then Kafka will switch over to use replicas on other machines if they are available. 
- The behaviour of produce requests will depend on the acknowledgment setting the producer provides, the setting for minimum in sync replicas and how many replicas remain standing after the failure. There are a few things going on here but they’re explained quite well here <http://kafka.apache.org/090/documentation.html#design_ha>. 
- Consumers consume from the leader also so if the leader for a partition is online then you will be able to consumer from it. If the leader is on a machine that goes offline then consumption will pause whilst leadership switches over to a replica.  

All the best
B

> On 26 Nov 2015, at 17:58, Prabhjot Bharaj <pr...@gmail.com> wrote:
> 
> Hi,
> 
> Request your expertise on these doubts of mine
> 
> Thanks,
> Prabhjot
> 
> On Thu, Nov 26, 2015 at 4:43 PM, Prabhjot Bharaj <pr...@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> We arrange our kafka machines in groups and deploy these phases.
>> 
>> For kafka, we’ll have to map groups with phases. During each phase of the
>> release, all the machines in that group can go down.
>> 
>> When this happens, there are a couple of cases:-
>> 
>>   1. All replicas are residing in a group of machines which will all go
>>   down in this phase
>>      - Affect on Producer –
>>         - What happens to the produce requests (whether produce can
>>         dynamically keep writing to the remaining partitions now)
>>         - What happens to the already queued requests which were being
>>         sent to the earlier replicas – they will fail (we’ll have to use producer
>>         callback feature to take care of retrying in case the above step
>>         works fine)
>>      - Affect on Consumer -
>>         - Can the consumers consume from a lesser number of partitions?
>>         - Does the consumer 'consume' api gives any callback/failure
>>         when all replicas of a partition go down?
>> 
>> If you have come across any of the above cases, please provide how you
>> solved the problem ? or whether everything works just well with Kafka
>> during deployments and my cases described above are all invalid or handled
>> by kafka and its clients internally ?
>> 
>> Thanks,
>> Prabhjot
>> 
> 
> 
> 
> -- 
> ---------------------------------------------------------
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"


Re: producer-consumer issues during deployments

Posted by Ben Stopford <be...@confluent.io>.
Hi Prabhjot

I may have slightly misunderstood your question so apologies if that’s the case. The general approach to releases is to use a rolling upgrade where you take one machine offline at a time, restart it, wait for it to come online (you can monitor this via JMX) then move onto the next. If you’re taking multiple machines offline at the same time you need to be careful about where the replicas for those machines reside. You can examine these individually for each topic via kafka-topcis.sh. 

Regarding your questions the following points may be of use:

- Only one replica (the leader) will be available for writing at any one time in Kafka. If you offline machines then Kafka will switch over to use replicas on other machines if they are available. 
- The behaviour of produce requests will depend on the acknowledgment setting the producer provides, the setting for minimum in sync replicas and how many replicas remain standing after the failure. There are a few things going on here but they’re explained quite well here <http://kafka.apache.org/090/documentation.html#design_ha>. 
- Consumers consume from the leader also so if the leader for a partition is online then you will be able to consumer from it. If the leader is on a machine that goes offline then consumption will pause whilst leadership switches over to a replica.  

All the best
B

> On 26 Nov 2015, at 17:58, Prabhjot Bharaj <pr...@gmail.com> wrote:
> 
> Hi,
> 
> Request your expertise on these doubts of mine
> 
> Thanks,
> Prabhjot
> 
> On Thu, Nov 26, 2015 at 4:43 PM, Prabhjot Bharaj <pr...@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> We arrange our kafka machines in groups and deploy these phases.
>> 
>> For kafka, we’ll have to map groups with phases. During each phase of the
>> release, all the machines in that group can go down.
>> 
>> When this happens, there are a couple of cases:-
>> 
>>   1. All replicas are residing in a group of machines which will all go
>>   down in this phase
>>      - Affect on Producer –
>>         - What happens to the produce requests (whether produce can
>>         dynamically keep writing to the remaining partitions now)
>>         - What happens to the already queued requests which were being
>>         sent to the earlier replicas – they will fail (we’ll have to use producer
>>         callback feature to take care of retrying in case the above step
>>         works fine)
>>      - Affect on Consumer -
>>         - Can the consumers consume from a lesser number of partitions?
>>         - Does the consumer 'consume' api gives any callback/failure
>>         when all replicas of a partition go down?
>> 
>> If you have come across any of the above cases, please provide how you
>> solved the problem ? or whether everything works just well with Kafka
>> during deployments and my cases described above are all invalid or handled
>> by kafka and its clients internally ?
>> 
>> Thanks,
>> Prabhjot
>> 
> 
> 
> 
> -- 
> ---------------------------------------------------------
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"


Re: producer-consumer issues during deployments

Posted by Prabhjot Bharaj <pr...@gmail.com>.
Hi,

Request your expertise on these doubts of mine

Thanks,
Prabhjot

On Thu, Nov 26, 2015 at 4:43 PM, Prabhjot Bharaj <pr...@gmail.com>
wrote:

> Hi,
>
> We arrange our kafka machines in groups and deploy these phases.
>
> For kafka, we’ll have to map groups with phases. During each phase of the
> release, all the machines in that group can go down.
>
> When this happens, there are a couple of cases:-
>
>    1. All replicas are residing in a group of machines which will all go
>    down in this phase
>       - Affect on Producer –
>          - What happens to the produce requests (whether produce can
>          dynamically keep writing to the remaining partitions now)
>          - What happens to the already queued requests which were being
>          sent to the earlier replicas – they will fail (we’ll have to use producer
>          callback feature to take care of retrying in case the above step
>          works fine)
>       - Affect on Consumer -
>          - Can the consumers consume from a lesser number of partitions?
>          - Does the consumer 'consume' api gives any callback/failure
>          when all replicas of a partition go down?
>
> If you have come across any of the above cases, please provide how you
> solved the problem ? or whether everything works just well with Kafka
> during deployments and my cases described above are all invalid or handled
> by kafka and its clients internally ?
>
> Thanks,
> Prabhjot
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Re: producer-consumer issues during deployments

Posted by Prabhjot Bharaj <pr...@gmail.com>.
Hi,

Request your expertise on these doubts of mine

Thanks,
Prabhjot

On Thu, Nov 26, 2015 at 4:43 PM, Prabhjot Bharaj <pr...@gmail.com>
wrote:

> Hi,
>
> We arrange our kafka machines in groups and deploy these phases.
>
> For kafka, we’ll have to map groups with phases. During each phase of the
> release, all the machines in that group can go down.
>
> When this happens, there are a couple of cases:-
>
>    1. All replicas are residing in a group of machines which will all go
>    down in this phase
>       - Affect on Producer –
>          - What happens to the produce requests (whether produce can
>          dynamically keep writing to the remaining partitions now)
>          - What happens to the already queued requests which were being
>          sent to the earlier replicas – they will fail (we’ll have to use producer
>          callback feature to take care of retrying in case the above step
>          works fine)
>       - Affect on Consumer -
>          - Can the consumers consume from a lesser number of partitions?
>          - Does the consumer 'consume' api gives any callback/failure
>          when all replicas of a partition go down?
>
> If you have come across any of the above cases, please provide how you
> solved the problem ? or whether everything works just well with Kafka
> during deployments and my cases described above are all invalid or handled
> by kafka and its clients internally ?
>
> Thanks,
> Prabhjot
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"