You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Bharath Srinivasan <bh...@gmail.com> on 2016/08/25 21:19:41 UTC
Kafka 0.8.2.2 - CLOSE_WAITS on broker
Hello:
We are running a data pipeline application stack using Kafka 0.8.2.2 in
production. We have been seeing intermittent CLOSE_WAIT on our kafka
brokers frequently and they fill up the file handles pretty quickly. By the
time the open file count reaches around 40K, the node becomes unresponsive
and we see huge GC pauses. The only way out has been restart of the node.
When the nodes are working fine, the average open files in the nodes stay
around 6K during peak load and 3K at average.
Configurations:
- 5 broker cluster (Single node spec: 24 core processors, 250 GB RAM, 256GB
SSD)
- 20 topics and 1100 partitions across all topics
- Replication factor of 3
- Java based KafkaProducer and high level consumers
(ZookeeperConsumerConnector)
- GC params { -Xmx32G -Xms4G -server -XX:MetaspaceSize=96m -XX:+UseG1GC
-XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
-XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50
-XX:MaxMetaspaceFreeRatio=80 }
Any pointers here? Appreciate your help.
Thanks,
Bharath
Re: Kafka 0.8.2.2 - CLOSE_WAITS on broker
Posted by Bharath Srinivasan <bh...@gmail.com>.
Java / OS info:
----------
java.specification.version = 1.8
java.vendor = Oracle Corporation
java.version = 1.8.0_45
Oracle Linux Server release 6.7
kernel version 2.6.32-573.18.1.el6.x86_64
Redacted LSOF
---------------------
~46K Close Waits
------------------
java 4692 kafka 2618u IPv6 264581081 0t0 TCP
XX-XXXX-kafka01:XmlIpcRegSvc->XX-XXXX-host1:33089 (CLOSE_WAIT)
java 4692 kafka 2619u IPv6 264581082 0t0 TCP
XX-XXXX-kafka01:XmlIpcRegSvc->XX-XXXX-host2:37371 (CLOSE_WAIT)
java 4692 kafka 2621u IPv6 264600187 0t0 TCP
XX-XXXX-kafka01:XmlIpcRegSvc->XX-XXXX-host3:40788 (CLOSE_WAIT)
475 Established connections
----------------------------
java 4692 kafka *427u IPv6 282382725 0t0 TCP
XX-XXXX-kafka01:54099->XX-XXXX-host1:eforward (ESTABLISHED)
java 4692 kafka *639u IPv6 282426735 0t0 TCP
XX-XXXX-kafka01:36157->XX-XXXX-kafka01:59964 (ESTABLISHED)
java 4692 kafka *860u IPv6 282480072 0t0 TCP
XX-XXXX-kafka01:XmlIpcRegSvc->XX-XXXX-host2:50547 (ESTABLISHED)
java 4692 kafka *507u IPv6 282481853 0t0 TCP
XX-XXXX-kafka01:XmlIpcRegSvc->XX-XXXX-host3:45096 (ESTABLISHED)
~3K
----------------------------
java 4692 kafka 2367u REG 253,3 104857335 141033710
/XXX/kafka/LOG/__consumer_offsets-10/00000000000035177234.log
~1.5K
----------------------------
java 4692 kafka mem REG 253,3 10485760 141297356
/XXX/kafka/LOG/TOPIC-1-9/00000000000000028243.index
~1.5K
----------------------------
java 4692 kafka 818u REG 253,3 2548089 141297556
/XXX/kafka/LOG/TOPIC-1-2-76/00000000000000146894.log
java 4692 kafka 819u REG 253,3 0 141165545
/XXX/kafka/LOG/TOPIC-2-2-11/00000000000000000000.log
On Fri, Aug 26, 2016 at 6:37 AM, Jaikiran Pai <ja...@gmail.com>
wrote:
> Which Java vendor and version are you using in runtime? Also what OS is
> this? Can you get the lsof output (on Linux) and paste the output of that
> to some place (like gist) to show us what descriptors are open etc...
>
> -Jaikiran
>
>
> On Friday 26 August 2016 02:49 AM, Bharath Srinivasan wrote:
>
>> Hello:
>>
>> We are running a data pipeline application stack using Kafka 0.8.2.2 in
>> production. We have been seeing intermittent CLOSE_WAIT on our kafka
>> brokers frequently and they fill up the file handles pretty quickly. By
>> the
>> time the open file count reaches around 40K, the node becomes unresponsive
>> and we see huge GC pauses. The only way out has been restart of the node.
>> When the nodes are working fine, the average open files in the nodes stay
>> around 6K during peak load and 3K at average.
>>
>> Configurations:
>> - 5 broker cluster (Single node spec: 24 core processors, 250 GB RAM,
>> 256GB
>> SSD)
>> - 20 topics and 1100 partitions across all topics
>> - Replication factor of 3
>> - Java based KafkaProducer and high level consumers
>> (ZookeeperConsumerConnector)
>> - GC params { -Xmx32G -Xms4G -server -XX:MetaspaceSize=96m -XX:+UseG1GC
>> -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
>> -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50
>> -XX:MaxMetaspaceFreeRatio=80 }
>>
>> Any pointers here? Appreciate your help.
>>
>> Thanks,
>> Bharath
>>
>>
>
Re: Kafka 0.8.2.2 - CLOSE_WAITS on broker
Posted by Jaikiran Pai <ja...@gmail.com>.
Which Java vendor and version are you using in runtime? Also what OS is
this? Can you get the lsof output (on Linux) and paste the output of
that to some place (like gist) to show us what descriptors are open etc...
-Jaikiran
On Friday 26 August 2016 02:49 AM, Bharath Srinivasan wrote:
> Hello:
>
> We are running a data pipeline application stack using Kafka 0.8.2.2 in
> production. We have been seeing intermittent CLOSE_WAIT on our kafka
> brokers frequently and they fill up the file handles pretty quickly. By the
> time the open file count reaches around 40K, the node becomes unresponsive
> and we see huge GC pauses. The only way out has been restart of the node.
> When the nodes are working fine, the average open files in the nodes stay
> around 6K during peak load and 3K at average.
>
> Configurations:
> - 5 broker cluster (Single node spec: 24 core processors, 250 GB RAM, 256GB
> SSD)
> - 20 topics and 1100 partitions across all topics
> - Replication factor of 3
> - Java based KafkaProducer and high level consumers
> (ZookeeperConsumerConnector)
> - GC params { -Xmx32G -Xms4G -server -XX:MetaspaceSize=96m -XX:+UseG1GC
> -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
> -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50
> -XX:MaxMetaspaceFreeRatio=80 }
>
> Any pointers here? Appreciate your help.
>
> Thanks,
> Bharath
>