You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Nicolas Berthet <ni...@maaii.com> on 2013/10/04 11:15:52 UTC

RE: Too many open files

Hi Mark,

Sorry for the delay. We're not using a load balancer if it's what you mean by LB. 

After applying the change I mentioned last time (the netfilter thing), I couldn't see any improvement. We even restart kafka, but since the restart, I saw connection count slowly getting higher.
                                                      
Best regards,

Nicolas Berthet 


-----Original Message-----
From: Mark [mailto:static.void.dev@gmail.com] 
Sent: Saturday, September 28, 2013 12:35 AM
To: users@kafka.apache.org
Subject: Re: Too many open files

No, this is all within the same DC. I think the problem has to do with the LB. We've upgraded our producers to point directory to a node for testing and after running it all night, I don't see any more connections then there are supposed to be. 

Can I ask which LB are you using? We are using A10's

On Sep 26, 2013, at 6:41 PM, Nicolas Berthet <ni...@maaii.com> wrote:

> Hi Mark,
> 
> I'm using centos 6.2. My file limit is something like 500k, the value is arbitrary.
> 
> One of the thing I changed so far are the TCP keepalive parameters, it had moderate success so far.
> 
> net.ipv4.tcp_keepalive_time
> net.ipv4.tcp_keepalive_intvl
> net.ipv4.tcp_keepalive_probes
> 
> I still notice an abnormal number of ESTABLISHED connections, I've 
> been doing some search and came over this page 
> (http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/)
> 
> I'll change the "net.netfilter.nf_conntrack_tcp_timeout_established" as indicated there, it looks closer to the solution to my issue.
> 
> Are you also experiencing the issue in a cross data center context ? 
> 
> Best regards,
> 
> Nicolas Berthet
> 
> 
> -----Original Message-----
> From: Mark [mailto:static.void.dev@gmail.com]
> Sent: Friday, September 27, 2013 6:08 AM
> To: users@kafka.apache.org
> Subject: Re: Too many open files
> 
> What OS settings did you change? How high is your huge file limit?
> 
> 
> On Sep 25, 2013, at 10:06 PM, Nicolas Berthet <ni...@maaii.com> wrote:
> 
>> Jun,
>> 
>> I observed similar kind of things recently. (didn't notice before 
>> because our file limit is huge)
>> 
>> I have a set of brokers in a datacenter, and producers in different data centers. 
>> 
>> At some point I got disconnections, from the producer perspective I had something like 15 connections to the broker. On the other hand on the broker side, I observed hundreds of connections from the producer in an ESTABLISHED state.
>> 
>> We had some default settings for the socket timeout on the OS level, which we reduced hoping it would prevent the issue in the future. I'm not sure if the issue is from the broker or OS configuration though. I'm still keeping the broker under observation for the time being.
>> 
>> Note that, for clients in the same datacenter, we didn't see this issue, the socket count matches on both ends.
>> 
>> Nicolas Berthet
>> 
>> -----Original Message-----
>> From: Jun Rao [mailto:junrao@gmail.com]
>> Sent: Thursday, September 26, 2013 12:39 PM
>> To: users@kafka.apache.org
>> Subject: Re: Too many open files
>> 
>> If a client is gone, the broker should automatically close those broken sockets. Are you using a hardware load balancer?
>> 
>> Thanks,
>> 
>> Jun
>> 
>> 
>> On Wed, Sep 25, 2013 at 4:48 PM, Mark <st...@gmail.com> wrote:
>> 
>>> FYI if I kill all producers I don't see the number of open files drop. 
>>> I still see all the ESTABLISHED connections.
>>> 
>>> Is there a broker setting to automatically kill any inactive TCP 
>>> connections?
>>> 
>>> 
>>> On Sep 25, 2013, at 4:30 PM, Mark <st...@gmail.com> wrote:
>>> 
>>>> Any other ideas?
>>>> 
>>>> On Sep 25, 2013, at 9:06 AM, Jun Rao <ju...@gmail.com> wrote:
>>>> 
>>>>> We haven't seen any socket leaks with the java producer. If you 
>>>>> have
>>> lots
>>>>> of unexplained socket connections in established mode, one 
>>>>> possible
>>> cause
>>>>> is that the client created new producer instances, but didn't 
>>>>> close the
>>> old
>>>>> ones.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>>> 
>>>>> On Wed, Sep 25, 2013 at 6:08 AM, Mark <st...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> No. We are using the kafka-rb ruby gem producer.
>>>>>> https://github.com/acrosa/kafka-rb
>>>>>> 
>>>>>> Now that you asked that question I need to ask. Is there a 
>>>>>> problem with the java producer?
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On Sep 24, 2013, at 9:01 PM, Jun Rao <ju...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Are you using the java producer client?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Jun
>>>>>>> 
>>>>>>> 
>>>>>>>> On Tue, Sep 24, 2013 at 5:33 PM, Mark 
>>>>>>>> <st...@gmail.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Our 0.7.2 Kafka cluster keeps crashing with:
>>>>>>>> 
>>>>>>>> 2013-09-24 17:21:47,513 -  [kafka-acceptor:Acceptor@153] - 
>>>>>>>> Error in acceptor
>>>>>>>>    java.io.IOException: Too many open
>>>>>>>> 
>>>>>>>> The obvious fix is to bump up the number of open files but I'm
>>> wondering
>>>>>>>> if there is a leak on the Kafka side and/or our application 
>>>>>>>> side. We currently have the ulimit set to a generous 4096 but 
>>>>>>>> obviously we are hitting this ceiling. What's a recommended value?
>>>>>>>> 
>>>>>>>> We are running rails and our Unicorn workers are connecting to 
>>>>>>>> our
>>> Kafka
>>>>>>>> cluster via round-robin load balancing. We have about 1500 
>>>>>>>> workers to
>>>>>> that
>>>>>>>> would be 1500 connections right there but they should be split 
>>>>>>>> across
>>>>>> our 3
>>>>>>>> nodes. Instead Netstat shows thousands of connections that look 
>>>>>>>> like
>>>>>> this:
>>>>>>>> 
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:22503    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:48398    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.2:29617    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:32444    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:34415    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.1:56901    ESTABLISHED
>>>>>>>> tcp        0      0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:
>>>>>> 10.99.99.2:45349    ESTABLISHED
>>>>>>>> 
>>>>>>>> Has anyone come across this problem before? Is this a 0.7.2 
>>>>>>>> leak, LB misconfiguration... ?
>>>>>>>> 
>>>>>>>> Thanks
>>>>>> 
>>>> 
>>> 
>>> 
>