You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Ofir Manor <of...@equalum.io> on 2018/05/29 21:28:39 UTC

producing with acks=all (2 replicas) is 100x slower and fails on timeouts with Kafka 1.0.1

Hi all,
I'm running into a weird slowness when using acks=all on Kafka 1.0.1.
I reproduced it on a 3-node cluster (each 4 cores/14GB RAM), using a topic
with replication factor 2.
I used the built-in kafka-producer-perf-test.sh tool with 1KB messages.
With all defaults, it can send 100K-200K messages / sec.
With acks=all, it timesout quickly even when I only ask 1K-2K messages/sec.

Is that expected?  Any idea how to investigate / tune / workaround?


Here is a test case:

*1.* Create a new topic
$ *kafka-topics.sh --create   *--zookeeper $ZK_CLUSTER_HOSTS --partitions 6
--replication-factor 2 --topic my-topic
Created topic "my-topic"

$ *kafka-topics.sh --describe *--zookeeper $ZK_CLUSTER_HOSTS --topic
my-topic
Topic:my-topic  PartitionCount:6        ReplicationFactor:2     Configs:
        Topic: my-topic Partition: 0    Leader: 2       Replicas: 2,1
 Isr: 2,1
        Topic: my-topic Partition: 1    Leader: 3       Replicas: 3,2
 Isr: 3,2
        Topic: my-topic Partition: 2    Leader: 1       Replicas: 1,3
 Isr: 1,3
        Topic: my-topic Partition: 3    Leader: 2       Replicas: 2,3
 Isr: 2,3
        Topic: my-topic Partition: 4    Leader: 3       Replicas: 3,1
 Isr: 3,1
        Topic: my-topic Partition: 5    Leader: 1       Replicas: 1,2
 Isr: 1,2


*2.* Producing events in default config supports 100K events per sec (1KB
each) without issue:
$ *kafka-producer-perf-test.sh *--topic my_topic --num-records 10000000
--record-size 1024 --throughput 100000 --producer-props
bootstrap.servers=$KAFKA_CLUSTER_HOSTS
492612 records sent, 98522.4 records/sec (96.21 MB/sec), 120.9 ms avg
latency, 735.0 max latency.
493623 records sent, 98724.6 records/sec (96.41 MB/sec), 9.0 ms avg
latency, 161.0 max latency.
513640 records sent, 102728.0 records/sec (100.32 MB/sec), 34.4 ms avg
latency, 710.0 max latency.
485664 records sent, 97132.8 records/sec (94.86 MB/sec), 132.6 ms avg
latency, 1015.0 max latency.
506546 records sent, 101248.5 records/sec (98.88 MB/sec), 69.2 ms avg
latency, 786.0 max latency.
508195 records sent, 101639.0 records/sec (99.26 MB/sec), 15.3 ms avg
latency, 374.0 max latency.
387037 records sent, 74688.7 records/sec (72.94 MB/sec), 177.6 ms avg
latency, 1926.0 max latency.
375598 records sent, 69349.7 records/sec (67.72 MB/sec), 330.9 ms avg
latency, 3152.0 max latency.
648168 records sent, 129633.6 records/sec (126.60 MB/sec), 356.7 ms avg
latency, 3177.0 max latency.
339564 records sent, 67912.8 records/sec (66.32 MB/sec), 457.5 ms avg
latency, 2193.0 max latency.
498226 records sent, 99645.2 records/sec (97.31 MB/sec), 305.6 ms avg
latency, 2256.0 max latency.
811259 records sent, 162251.8 records/sec (158.45 MB/sec), 143.7 ms avg
latency, 636.0 max latency.
448713 records sent, 69428.0 records/sec (67.80 MB/sec), 139.5 ms avg
latency, 2320.0 max latency.
697652 records sent, 139530.4 records/sec (136.26 MB/sec), 203.4 ms avg
latency, 2333.0 max latency.
484168 records sent, 96833.6 records/sec (94.56 MB/sec), 57.6 ms avg
latency, 601.0 max latency.
427440 records sent, 85488.0 records/sec (83.48 MB/sec), 225.9 ms avg
latency, 1337.0 max latency.
513414 records sent, 90565.2 records/sec (88.44 MB/sec), 113.0 ms avg
latency, 1574.0 max latency.
641844 records sent, 128368.8 records/sec (125.36 MB/sec), 235.1 ms avg
latency, 2930.0 max latency.
479190 records sent, 95818.8 records/sec (93.57 MB/sec), 69.9 ms avg
latency, 864.0 max latency.
10000000 records sent, 99712.827058 records/sec (97.38 MB/sec), 165.35 ms
avg latency, 3177.00 ms max latency, 4 ms 50th, 782 ms 95th, 2102 ms 99th,
3148 ms 99.9th.


*3.* Producing events in default config supports only 2K events per sec
with acks="all" quickly breaks apart - it doesn't keep up the pace and
fails on timeouts.
Sometimes it recovers from it, sometimes not. Here is an example where it
recovers a bit:
$* kafka-producer-perf-test.sh *--topic my_topic --num-records 10000000
--record-size 1024 --throughput 2000 --producer-props
bootstrap.servers=$KAFKA_CLUSTER_HOSTS acks="all"
9997 records sent, 1999.4 records/sec (1.95 MB/sec), 4.2 ms avg latency,
162.0 max latency.
9799 records sent, 1959.4 records/sec (1.91 MB/sec), 5.3 ms avg latency,
79.0 max latency.
6670 records sent, 1334.0 records/sec (1.30 MB/sec), 2.3 ms avg latency,
21.0 max latency.
3624 records sent, 724.7 records/sec (0.71 MB/sec), 2.1 ms avg latency,
25.0 max latency.
3334 records sent, 666.7 records/sec (0.65 MB/sec), 7.4 ms avg latency,
72.0 max latency.
3334 records sent, 666.7 records/sec (0.65 MB/sec), 2.0 ms avg latency,
19.0 max latency.
3334 records sent, 666.8 records/sec (0.65 MB/sec), 2.4 ms avg latency,
43.0 max latency.
org.apache.kafka.common.errors.TimeoutException: The request timed out.
7606 records sent, 1521.2 records/sec (1.49 MB/sec), 17253.4 ms avg
latency, 30002.0 max latency.
22602 records sent, 4520.4 records/sec (4.41 MB/sec), 7719.0 ms avg
latency, 22874.0 max latency.
org.apache.kafka.common.errors.NetworkException: The server disconnected
before a response was received.
org.apache.kafka.common.errors.NetworkException: The server disconnected
before a response was received.
org.apache.kafka.common.errors.NetworkException: The server disconnected
before a response was received.
org.apache.kafka.common.errors.NetworkException: The server disconnected
before a response was received.
org.apache.kafka.common.errors.NetworkException: The server disconnected
before a response was received.
org.apache.kafka.common.errors.NetworkException: The server disconnected
before a response was received.
org.apache.kafka.common.errors.NetworkException: The server disconnected
before a response was received.
org.apache.kafka.common.errors.NetworkException: The server disconnected
before a response was received.
org.apache.kafka.common.errors.NetworkException: The server disconnected
before a response was received.
org.apache.kafka.common.errors.NetworkException: The server disconnected
before a response was received.
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-0: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
org.apache.kafka.common.errors.TimeoutException: Expiring 15 record(s) for
my_topic-3: 30001 ms has passed since batch creation plus linger time
29714 records sent, 5942.8 records/sec (5.80 MB/sec), 10338.3 ms avg
latency, 30006.0 max latency.
10002 records sent, 2000.0 records/sec (1.95 MB/sec), 3.8 ms avg latency,
74.0 max latency.
7848 records sent, 1569.3 records/sec (1.53 MB/sec), 5.1 ms avg latency,
90.0 max latency.
6666 records sent, 1333.2 records/sec (1.30 MB/sec), 2.3 ms avg latency,
21.0 max latency.
6666 records sent, 1333.2 records/sec (1.30 MB/sec), 6.3 ms avg latency,
100.0 max latency.
6670 records sent, 1334.0 records/sec (1.30 MB/sec), 2.1 ms avg latency,
15.0 max latency.
6666 records sent, 1333.2 records/sec (1.30 MB/sec), 2.6 ms avg latency,
67.0 max latency.
^C



Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.manor@equalum.io