You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "yazgoo (JIRA)" <ji...@apache.org> on 2015/06/03 13:41:50 UTC

[jira] [Commented] (KAFKA-2096) Enable keepalive socket option for broker to prevent socket leak

    [ https://issues.apache.org/jira/browse/KAFKA-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570697#comment-14570697 ] 

yazgoo commented on KAFKA-2096:
-------------------------------

This issue seems also to affect 0.8.1 branch (since accept() method did not change the socket initalisation).
Is it possible to mark it for 0.8.1.2 also ?
I can submit a patch if need be.

> Enable keepalive socket option for broker to prevent socket leak
> ----------------------------------------------------------------
>
>                 Key: KAFKA-2096
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2096
>             Project: Kafka
>          Issue Type: Improvement
>          Components: network
>    Affects Versions: 0.8.2.1
>            Reporter: Allen Wang
>            Assignee: Allen Wang
>            Priority: Critical
>             Fix For: 0.8.3
>
>         Attachments: patch.diff
>
>
> We run a Kafka 0.8.2.1 cluster in AWS with large number of producers (> 10000). Also the number of producer instances scale up and down significantly on a daily basis.
> The issue we found is that after 10 days, the open file descriptor count will approach the limit of 32K. An investigation of these open file descriptors shows that a significant portion of these are from client instances that are terminated during scaling down. Somehow they still show as "ESTABLISHED" in netstat. We suspect that the AWS firewall between the client and broker causes this issue.
> We attempted to use "keepalive" socket option to reduce this socket leak on broker and it appears to be working. Specifically, we added this line to kafka.network.Acceptor.accept():
>       socketChannel.socket().setKeepAlive(true)
> It is confirmed during our experiment of this change that entries in netstat where the client instance is terminated were probed as configured in operating system. After configured number of probes, the OS determined that the peer is no longer alive and the entry is removed, possibly after an error in Kafka to read from the channel and closing the channel. Also, our experiment shows that after a few days, the instance was able to keep a stable low point of open file descriptor count, compared with other instances where the low point keeps increasing day to day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)