You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Alex the Rocker (JIRA)" <ji...@apache.org> on 2015/06/04 23:56:38 UTC

[jira] [Comment Edited] (KAFKA-2096) Enable keepalive socket option for broker to prevent socket leak

    [ https://issues.apache.org/jira/browse/KAFKA-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573661#comment-14573661 ] 

Alex the Rocker edited comment on KAFKA-2096 at 6/4/15 9:56 PM:
----------------------------------------------------------------

We also have got the same issue with Kafka 0.8.1.1, is it possible to have the fix in 0.8.1.2? using conntrack-tools we observe brokers with huge (up to 14000) UNREPLIED sessions.

Question is: is it required to open a new JIRA to report the same issue with 0.8.1.1 ?


was (Author: alex.m3tal):
We also have got the same issue with Kafka 0.8.1.1, is it possible to have the fix in 0.8.1.2? using conntrack-tools we observe brokers with huge (up to 14000) UNREPLIED sessions.



> Enable keepalive socket option for broker to prevent socket leak
> ----------------------------------------------------------------
>
>                 Key: KAFKA-2096
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2096
>             Project: Kafka
>          Issue Type: Improvement
>          Components: network
>    Affects Versions: 0.8.2.1
>            Reporter: Allen Wang
>            Assignee: Allen Wang
>            Priority: Critical
>             Fix For: 0.8.3
>
>         Attachments: patch.diff
>
>
> We run a Kafka 0.8.2.1 cluster in AWS with large number of producers (> 10000). Also the number of producer instances scale up and down significantly on a daily basis.
> The issue we found is that after 10 days, the open file descriptor count will approach the limit of 32K. An investigation of these open file descriptors shows that a significant portion of these are from client instances that are terminated during scaling down. Somehow they still show as "ESTABLISHED" in netstat. We suspect that the AWS firewall between the client and broker causes this issue.
> We attempted to use "keepalive" socket option to reduce this socket leak on broker and it appears to be working. Specifically, we added this line to kafka.network.Acceptor.accept():
>       socketChannel.socket().setKeepAlive(true)
> It is confirmed during our experiment of this change that entries in netstat where the client instance is terminated were probed as configured in operating system. After configured number of probes, the OS determined that the peer is no longer alive and the entry is removed, possibly after an error in Kafka to read from the channel and closing the channel. Also, our experiment shows that after a few days, the instance was able to keep a stable low point of open file descriptor count, compared with other instances where the low point keeps increasing day to day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)