You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "Manikumar (JIRA)" <ji...@apache.org> on 2018/04/17 19:22:00 UTC

[jira] [Commented] (KAFKA-6479) Broker file descriptor leak after consumer request timeout

    [ https://issues.apache.org/jira/browse/KAFKA-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441366#comment-16441366 ] 

Manikumar commented on KAFKA-6479:
----------------------------------

One of the option is to adjust "connections.max.idle.ms"  config value (Default: 10mins). Server closes the connections that are idle more than this config value.

> Broker file descriptor leak after consumer request timeout
> ----------------------------------------------------------
>
>                 Key: KAFKA-6479
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6479
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 1.0.0
>            Reporter: Ryan Leslie
>            Priority: Major
>
> When a consumer request times out, i.e. takes longer than request.timeout.ms, and the client disconnects from the coordinator, the coordinator may leak file descriptors. The following code produces this behavior:
> {code:java}
> Properties config = new Properties();
> config.put("bootstrap.servers", BROKERS);
> config.put("group.id", "leak-test");
> config.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
> config.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
> config.put("max.poll.interval.ms", Integer.MAX_VALUE);
> config.put("request.timeout.ms", 12000);
> KafkaConsumer<String, String> consumer1 = new KafkaConsumer<>(config);
> KafkaConsumer<String, String> consumer2 = new KafkaConsumer<>(config);
> List<String> topics = Collections.singletonList("leak-test");
> consumer1.subscribe(topics);
> consumer2.subscribe(topics);
> consumer1.poll(100); 
> consumer2.poll(100);
> {code}
> When the above executes, consumer 2 will attempt to rebalance indefinitely (blocked by the inactive consumer 1), logging a _Marking the coordinator dead_ message every 12 seconds after giving up on the JOIN_GROUP request and disconnecting. Unless the consumer exits or times out, this will cause a socket in CLOSE_WAIT to leak in the coordinator and the broker will eventually run out of file descriptors and crash.
> Aside from faulty code as in the example above, or an intentional DoS, any client bug causing a consumer to block, e.g. KAFKA-6397, could also result in this leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)