You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by David Garcia <da...@spiceworks.com> on 2016/10/19 21:33:31 UTC

Please help with AWS configuration

Hello everyone.  I’m having a hell of a time figuring out a Kafka performance issue in AWS. Any help is greatly appreciated!

Here is our AWS configuration:


-          Zookeeper Cluster (3.4.6): 3-nodes on m4.xlarges (default configuration) EBS volumes (sd1)

-          Kafka Cluster (0.10.0): 3 nodes on m4.2xlarges (config: https://gist.github.com/anduill/710bb0619a80019016ac85bb5c060440) EBS volumes (sd1)

Usage:

Our usage of the cluster is fairly modest (at least I think so). At peak hours, each broker will receive about 1.4 MB/sec. Our primary input topic has about 54 partitions with replication set to 3 (ack=all).  Another consumer consumes this topic and spreads the messages across 8 other topics each with 8 partitions…each of which has replication set to 2 (ack=all).  Downstream, 4 other consumers consume these topics(one consumer consumes the 8 previous topics, transforms the messages, and sends the new messages to 8 other topics(ack=1) .  In all we end up generating about 206 partitions with an average replication of 2.26.

Our Problem:

Our cluster will hum-along just fine when suddenly, 1 or more brokers will start experiencing severe ISR-shrinking/expanding.  This causes underreplicated partitions and the producer purgatory size starts to rapidly expand(on the affected brokers)…this causes downstream producers to get behind in some cases.

In the Kafka configuration above, we have a couple of non-default settings, but nothing seems to stand out.  Is there anything obvious I’m missing (or need to add/adjust)?  Or is there a bug I should be aware of that would cause these issues.

-David

Re: Please help with AWS configuration

Posted by David Garcia <da...@spiceworks.com>.

Sorry, had a typo in my gist.  Here is the correct location:

https://gist.github.com/anduill/710bb0619a80019016ac85bb5c060440

On 10/19/16, 4:33 PM, "David Garcia" <da...@spiceworks.com> wrote:

    Hello everyone.  I’m having a hell of a time figuring out a Kafka performance issue in AWS. Any help is greatly appreciated!
    
    Here is our AWS configuration:
    
    
    -          Zookeeper Cluster (3.4.6): 3-nodes on m4.xlarges (default configuration) EBS volumes (sd1)
    
    -          Kafka Cluster (0.10.0): 3 nodes on m4.2xlarges (config: https://gist.github.com/anduill/710bb0619a80019016ac85bb5c060440 EBS volumes (sd1)
    
    Usage:
    
    Our usage of the cluster is fairly modest (at least I think so). At peak hours, each broker will receive about 1.4 MB/sec. Our primary input topic has about 54 partitions with replication set to 3 (ack=all).  Another consumer consumes this topic and spreads the messages across 8 other topics each with 8 partitions…each of which has replication set to 2 (ack=all).  Downstream, 4 other consumers consume these topics(one consumer consumes the 8 previous topics, transforms the messages, and sends the new messages to 8 other topics(ack=1) .  In all we end up generating about 206 partitions with an average replication of 2.26.
    
    Our Problem:
    
    Our cluster will hum-along just fine when suddenly, 1 or more brokers will start experiencing severe ISR-shrinking/expanding.  This causes underreplicated partitions and the producer purgatory size starts to rapidly expand(on the affected brokers)…this causes downstream producers to get behind in some cases.
    
    In the Kafka configuration above, we have a couple of non-default settings, but nothing seems to stand out.  Is there anything obvious I’m missing (or need to add/adjust)?  Or is there a bug I should be aware of that would cause these issues.
    
    -David