You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Wesley Chow <we...@chartbeat.com> on 2016/07/05 20:14:55 UTC

high rate of small packets between brokers

I’ve been investigating some possible network performance issues we’re having with our Kafka brokers, and noticed that traffic sent between brokers tends to show frequent bursts of very small packets:

16:09:52.299863 IP stream02.chartbeat.net.9092 > stream03.chartbeat.net.39399: Flags [P.], seq 127908:127925, ack 4143, win 32488, length 17
16:09:52.299870 IP stream02.chartbeat.net.9092 > stream03.chartbeat.net.39399: Flags [P.], seq 127925:127943, ack 4143, win 32488, length 18
16:09:52.299876 IP stream02.chartbeat.net.9092 > stream03.chartbeat.net.39399: Flags [P.], seq 127943:127967, ack 4143, win 32488, length 24
16:09:52.299889 IP stream02.chartbeat.net.9092 > stream03.chartbeat.net.39399: Flags [P.], seq 127967:127985, ack 4143, win 32488, length 18
16:09:52.299892 IP stream02.chartbeat.net.9092 > stream03.chartbeat.net.39399: Flags [P.], seq 127985:127999, ack 4143, win 32488, length 14
16:09:52.299895 IP stream02.chartbeat.net.9092 > stream03.chartbeat.net.39399: Flags [P.], seq 127999:128017, ack 4143, win 32488, length 18
16:09:52.299897 IP stream02.chartbeat.net.9092 > stream03.chartbeat.net.39399: Flags [P.], seq 128017:128031, ack 4143, win 32488, length 14
16:09:52.299900 IP stream02.chartbeat.net.9092 > stream03.chartbeat.net.39399: Flags [P.], seq 128031:128049, ack 4143, win 32488, length 18
16:09:52.300612 IP stream02.chartbeat.net.9092 > stream03.chartbeat.net.39400: Flags [P.], seq 279162:279178, ack 6700, win 32488, length 16
16:09:52.300645 IP stream02.chartbeat.net.9092 > stream03.chartbeat.net.39400: Flags [P.], seq 279178:279189, ack 6700, win 32488, length 11
16:09:52.300655 IP stream02.chartbeat.net.9092 > stream03.chartbeat.net.39400: Flags [P.], seq 279189:279207, ack 6700, win 32488, length 18

I don’t know if this in itself is really an issue, but I thought I’d check with the group to see. The MTU on the interfaces is set to 9001, and regular consumers don’t get the same bursts of small push packets. Our replica config is:

replica.lag.time.max.ms=10000
replica.lag.max.messages=4000
replica.socket.timeout.ms=301000
replica.socket.receive.buffer.bytes=641024
replica.fetch.max.bytes=10241024
replica.fetch.wait.max.ms=500
replica.fetch.min.bytes=1
num.replica.fetchers=16

Any thoughts on whether or not this is an issue, and if so how we should correct it? I’m wondering about the replica.fetch.*.bytes settings — it’s unclear to me from the docs what those do exactly.

Thanks,
Wes