You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Sachin Kale <sa...@gmail.com> on 2019/11/11 11:54:44 UTC

Detecting cluster down in consumer

Hi,

We are working on a prototype where we write to two Kafka cluster
(primary-secondary) and read from one of them (based on which one is
primary) to increase the availability. There is a flag which is used to
determine which cluster is primary and other becomes secondary. On
detecting primary cluster is down, secondary is promoted to primary.

How do we detect cluster downtime failures in Kafka Consumer? I tried
different things but poll() makes sure to mask all the exceptions and
returns 0 records.


-Sachin-

Re: Detecting cluster down in consumer

Posted by Ryanne Dolan <ry...@gmail.com>.

Sachin, assuming you are using something like MM2, I recommend the
following approaches:

1) have an external system monitor the clusters and trigger a failover by
terminating the existing consumer group and launching a replacement. This
can be done manually or can be automated if your infrastructure is
sufficiently advanced. MM2's checkpoints make it possible to do this
without losing progress or skipping records.

2) add failover logic around your KafkaConsumers to detect failure and
reconfigure.

3) run consumer groups in both clusters, i.e. "active/active", with each
configured to process records originating in their local cluster only. Set
up health checks and a load balancer s.t. producers send to the healthiest
cluster. In this approach, no intervention is required to failover or
failback. Under normal operation, your secondary consumer group doesn't
process anything, but will step in and process new records whenever the
secondary cluster becomes active.

Ryanne

On Mon, Nov 11, 2019, 5:55 AM Sachin Kale <sa...@gmail.com> wrote:

> Hi,
>
> We are working on a prototype where we write to two Kafka cluster
> (primary-secondary) and read from one of them (based on which one is
> primary) to increase the availability. There is a flag which is used to
> determine which cluster is primary and other becomes secondary. On
> detecting primary cluster is down, secondary is promoted to primary.
>
> How do we detect cluster downtime failures in Kafka Consumer? I tried
> different things but poll() makes sure to mask all the exceptions and
> returns 0 records.
>
>
> -Sachin-
>

Re: Detecting cluster down in consumer

Posted by "M. Manna" <ma...@gmail.com>.

Hi,

On Mon, 11 Nov 2019 at 11:55, Sachin Kale <sa...@gmail.com> wrote:

> Hi,
>
> We are working on a prototype where we write to two Kafka cluster
> (primary-secondary) and read from one of them (based on which one is
> primary) to increase the availability. There is a flag which is used to
> determine which cluster is primary and other becomes secondary. On
> detecting primary cluster is down, secondary is promoted to primary.
>
> How do we detect cluster downtime failures in Kafka Consumer? I tried
> different things but poll() makes sure to mask all the exceptions and
> returns 0 records.
>
>
> -Sachin-
>

These couple of links suggest how to approach it..

https://www.slideshare.net/gwenshap/multicluster-and-failover-for-apache-kafka-kafka-summit-sf-17

https://www.confluent.io/blog/3-ways-prepare-disaster-recovery-multi-datacenter-apache-kafka-deployments

If you are in container world (e.g. K8s, YARN or Mesos) - using liveness
probe can help you determine if there's been a failover. But on traditional
cloud, it's simply a heartbeat mechanism that tells you whether the
services are usable or not.
An example would be to be setup monitor alerts using SolarWind (or similar
monitoring agents) and use Cruise control or Kafka-Monitor to setup alerts.

May be others can also suggest something which I cannot think of right now.

Thanks,