You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Avinash Herle <av...@gmail.com> on 2018/02/14 17:38:51 UTC

Kafka cluster instablility

Hi,

I'm using Kafka version 0.11.0.2. In my cluster, I've 4 nodes running Kafka
of which 3 nodes also running Zookeeper. I've a few producer processes that
publish to Kafka and multiple consumer processes, a streaming engine
(Spark) that ingests from Kafka and also publishes data to Kafka, and a
distributed data store (Druid) which reads all messages from Kafka and
stores in the DB. Druid also uses the same Zookeeper cluster being used by
Kafka for cluster state management.

*Kafka Configs:*
1) No replication being used
2) Number of network threads 30
3) Number of IO threads 8
4) Machines have 64GB RAM and 16 cores
5) 3 topics with 64 partitions per topic

*Questions:*

1) *Partitions going offline*
I frequently see partitions going offline because of which the scheduling
delay of the Spark application increases and input rate gets jittery. I
tried enabling replication too to see if it helped with the problem. It
didn't quite make a difference. What could be the cause of this issue? Lack
of resources or cluster misconfigurations? Can the cause be large number of
receiver processes?

*2) Colocation of Zookeeper and Kafka:*
As I mentioned above, I'm running 3 nodes with both Zookeeper and Kafka
colocated. Both the components are containerized, so they are running
inside docker containers. I found a few blogs that suggested not colocating
them for performance reasons. Is it necessary to run them on dedicated
machines?

*3) Using same Zookeeper cluster across different components*
In my cluster, I use the same Zookeeper cluster for state management of the
Kafka cluster and the Druid cluster. Could this cause instability of the
overall system?

Hope I've covered all the necessary information needed. Please let me know
if more information about my cluster is needed.

Thanks in advance,
Avinash
-- 

Excuse brevity and typos. Sent from mobile device.

Re: Kafka cluster instablility

Posted by Ted Yu <yu...@gmail.com>.
For #2 and #3, you would get better stability if zookeeper and Kafka get
dedicated machines.

Have you profiled the performance of the nodes where multiple processes ran
(zookeeper / Kafka / Druid) ? How was disk and network IO like ?

Cheers

On Wed, Feb 14, 2018 at 9:38 AM, Avinash Herle <av...@gmail.com>
wrote:

> Hi,
>
> I'm using Kafka version 0.11.0.2. In my cluster, I've 4 nodes running Kafka
> of which 3 nodes also running Zookeeper. I've a few producer processes that
> publish to Kafka and multiple consumer processes, a streaming engine
> (Spark) that ingests from Kafka and also publishes data to Kafka, and a
> distributed data store (Druid) which reads all messages from Kafka and
> stores in the DB. Druid also uses the same Zookeeper cluster being used by
> Kafka for cluster state management.
>
> *Kafka Configs:*
> 1) No replication being used
> 2) Number of network threads 30
> 3) Number of IO threads 8
> 4) Machines have 64GB RAM and 16 cores
> 5) 3 topics with 64 partitions per topic
>
> *Questions:*
>
> 1) *Partitions going offline*
> I frequently see partitions going offline because of which the scheduling
> delay of the Spark application increases and input rate gets jittery. I
> tried enabling replication too to see if it helped with the problem. It
> didn't quite make a difference. What could be the cause of this issue? Lack
> of resources or cluster misconfigurations? Can the cause be large number of
> receiver processes?
>
> *2) Colocation of Zookeeper and Kafka:*
> As I mentioned above, I'm running 3 nodes with both Zookeeper and Kafka
> colocated. Both the components are containerized, so they are running
> inside docker containers. I found a few blogs that suggested not colocating
> them for performance reasons. Is it necessary to run them on dedicated
> machines?
>
> *3) Using same Zookeeper cluster across different components*
> In my cluster, I use the same Zookeeper cluster for state management of the
> Kafka cluster and the Druid cluster. Could this cause instability of the
> overall system?
>
> Hope I've covered all the necessary information needed. Please let me know
> if more information about my cluster is needed.
>
> Thanks in advance,
> Avinash
> --
>
> Excuse brevity and typos. Sent from mobile device.
>