You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Joseph Aliase (JIRA)" <ji...@apache.org> on 2017/04/20 20:51:04 UTC
[jira] [Comment Edited] (KAFKA-5007) Kafka Replica Fetcher Thread-
Resource Leak
[ https://issues.apache.org/jira/browse/KAFKA-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970549#comment-15970549 ]
Joseph Aliase edited comment on KAFKA-5007 at 4/20/17 8:50 PM:
---------------------------------------------------------------
It seems ticket https://issues.apache.org/jira/browse/KAFKA-4477 is similar to the current ticket.
The ticket was closed since nobody reported in 0.10.1.1. But issue exists in 0.10.1.1.
It's a major issue and I would I like to work on it. Any guidance will be appreciated.
[~junrao]
was (Author: joseph.aliase07@gmail.com):
It seems ticket https://issues.apache.org/jira/browse/KAFKA-4477 is similar to the current ticket.
The ticket was closed since nobody reported in 0.10.1.1. But issue exists in 0.10.1.1.
It's a major issue and I would I like to work on it. I need guidance on where to start.
[~junrao]
> Kafka Replica Fetcher Thread- Resource Leak
> -------------------------------------------
>
> Key: KAFKA-5007
> URL: https://issues.apache.org/jira/browse/KAFKA-5007
> Project: Kafka
> Issue Type: Bug
> Components: core, network
> Affects Versions: 0.10.0.0, 0.10.1.1, 0.10.2.0
> Environment: Centos 7
> Jave 8
> Reporter: Joseph Aliase
> Priority: Critical
> Labels: reliability
>
> Kafka is running out of open file descriptor when system network interface is done.
> Issue description:
> We have a Kafka Cluster of 5 node running on version 0.10.1.1. The open file descriptor for the account running Kafka is set to 100000.
> During an upgrade, network interface went down. Outage continued for 12 hours eventually all the broker crashed with java.io.IOException: Too many open files error.
> We repeated the test in a lower environment and observed that Open Socket count keeps on increasing while the NIC is down.
> We have around 13 topics with max partition size of 120 and number of replica fetcher thread is set to 8.
> Using an internal monitoring tool we observed that Open Socket descriptor for the broker pid continued to increase although NIC was down leading to Open File descriptor error.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)