You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Lung, Paul" <pl...@ebay.com> on 2014/06/25 07:18:29 UTC

Getting "java.io.IOException: Too many open files"

Hi All,


I just upgraded my cluster from 0.8.1 to 0.8.1.1. I’m seeing the following error messages on the same 3 brokers once in a while:


[2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor)

java.io.IOException: Too many open files

        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)

        at kafka.network.Acceptor.accept(SocketServer.scala:200)

        at kafka.network.Acceptor.run(SocketServer.scala:154)

        at java.lang.Thread.run(Thread.java:679)

[2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor)

java.io.IOException: Too many open files

        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)

        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)

        at kafka.network.Acceptor.accept(SocketServer.scala:200)

        at kafka.network.Acceptor.run(SocketServer.scala:154)

        at java.lang.Thread.run(Thread.java:679)

When this happens, these 3 brokers essentially go out of sync when you do a “kafka-topics.sh —describe”.

I tracked the number of open files by doing “watch –n 1 ‘sudo lsof | wc –l’”, which basically counts all open files on the system. The numbers for the systems are basically in the 6000 range, with one system going to 9000. I presume the 9000 machine is the controller. Looking at the ulimit of the user, both the hard limit and the soft limit for open files is 100,000. Using sysctl, the max file is fs.file-max = 9774928. So we seem to be way under the limit.

What am I missing here? Is there some JVM limit around 10K open files or something?

Paul Lung

Re: Getting "java.io.IOException: Too many open files"

Posted by Prakash Gowri Shankor <pr...@gmail.com>.
Without knowing the intricacies of Kafka, i think the default open file
descriptors is 1024 on unix. This can be changed by setting a higher ulimit
value ( typically 8192 but sometimes even 100000 ).
Before modifying the ulimit I would recommend you check the number of
sockets stuck in TIME_WAIT mode. In this case, it looks like the broker has
too many open sockets. This could be because you have a rogue client
connecting and disconnecting repeatedly.
You might have to reduce the TIME_WAIT state to 30 seconds or lower.



On Wed, Jun 25, 2014 at 10:19 AM, Lung, Paul <pl...@ebay.com> wrote:

> Hi Prakash,
>
> How many open files do you expect a broker to be able to handle? It seems
> like this broker is crashing at around 4100 or so open files.
>
> Thanks,
> Paul Lung
>
> On 6/24/14, 11:08 PM, "Lung, Paul" <pl...@ebay.com> wrote:
>
> >Ok. What I just saw was that when the controller machine reaches around
> >4100+ files, it crashes. Then I think the controller bounced between 2
> >other machines, taking them down too, and the circled back to the original
> >machine.
> >
> >Paul Lung
> >
> >On 6/24/14, 10:51 PM, "Lung, Paul" <pl...@ebay.com> wrote:
> >
> >>The controller machine has 3500 or so, while the other machines have
> >>around 1600.
> >>
> >>Paul Lung
> >>
> >>On 6/24/14, 10:31 PM, "Prakash Gowri Shankor" <prakash.shankor@gmail.com
> >
> >>wrote:
> >>
> >>>How many files does each broker itself have open ? You can find this
> >>>from
> >>>'ls -l /proc/<processid>/fd'
> >>>
> >>>
> >>>
> >>>
> >>>On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul <pl...@ebay.com> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>>
> >>>> I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the
> >>>>following
> >>>> error messages on the same 3 brokers once in a while:
> >>>>
> >>>>
> >>>> [2014-06-24 21:43:44,711] ERROR Error in acceptor
> >>>>(kafka.network.Acceptor)
> >>>>
> >>>> java.io.IOException: Too many open files
> >>>>
> >>>>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> >>>>
> >>>>         at
> >>>>
> >>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
> >>>>1
> >>>>6
> >>>>3)
> >>>>
> >>>>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
> >>>>
> >>>>         at kafka.network.Acceptor.run(SocketServer.scala:154)
> >>>>
> >>>>         at java.lang.Thread.run(Thread.java:679)
> >>>>
> >>>> [2014-06-24 21:43:44,711] ERROR Error in acceptor
> >>>>(kafka.network.Acceptor)
> >>>>
> >>>> java.io.IOException: Too many open files
> >>>>
> >>>>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> >>>>
> >>>>         at
> >>>>
> >>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
> >>>>1
> >>>>6
> >>>>3)
> >>>>
> >>>>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
> >>>>
> >>>>         at kafka.network.Acceptor.run(SocketServer.scala:154)
> >>>>
> >>>>         at java.lang.Thread.run(Thread.java:679)
> >>>>
> >>>> When this happens, these 3 brokers essentially go out of sync when you
> >>>>do
> >>>> a ³kafka-topics.sh ‹describe².
> >>>>
> >>>> I tracked the number of open files by doing ³watch ­n 1 Œsudo lsof |
> >>>>wc
> >>>> ­l¹², which basically counts all open files on the system. The numbers
> >>>>for
> >>>> the systems are basically in the 6000 range, with one system going to
> >>>>9000.
> >>>> I presume the 9000 machine is the controller. Looking at the ulimit of
> >>>>the
> >>>> user, both the hard limit and the soft limit for open files is
> >>>>100,000.
> >>>> Using sysctl, the max file is fs.file-max = 9774928. So we seem to be
> >>>>way
> >>>> under the limit.
> >>>>
> >>>> What am I missing here? Is there some JVM limit around 10K open files
> >>>>or
> >>>> something?
> >>>>
> >>>> Paul Lung
> >>>>
> >>
> >
>
>

Re: Getting "java.io.IOException: Too many open files"

Posted by "Lung, Paul" <pl...@ebay.com>.
Hi Prakash,

How many open files do you expect a broker to be able to handle? It seems
like this broker is crashing at around 4100 or so open files.

Thanks,
Paul Lung

On 6/24/14, 11:08 PM, "Lung, Paul" <pl...@ebay.com> wrote:

>Ok. What I just saw was that when the controller machine reaches around
>4100+ files, it crashes. Then I think the controller bounced between 2
>other machines, taking them down too, and the circled back to the original
>machine.
>
>Paul Lung
>
>On 6/24/14, 10:51 PM, "Lung, Paul" <pl...@ebay.com> wrote:
>
>>The controller machine has 3500 or so, while the other machines have
>>around 1600.
>>
>>Paul Lung
>>
>>On 6/24/14, 10:31 PM, "Prakash Gowri Shankor" <pr...@gmail.com>
>>wrote:
>>
>>>How many files does each broker itself have open ? You can find this
>>>from
>>>'ls -l /proc/<processid>/fd'
>>>
>>>
>>>
>>>
>>>On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul <pl...@ebay.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>>
>>>> I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the
>>>>following
>>>> error messages on the same 3 brokers once in a while:
>>>>
>>>>
>>>> [2014-06-24 21:43:44,711] ERROR Error in acceptor
>>>>(kafka.network.Acceptor)
>>>>
>>>> java.io.IOException: Too many open files
>>>>
>>>>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>>>
>>>>         at
>>>> 
>>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
>>>>1
>>>>6
>>>>3)
>>>>
>>>>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
>>>>
>>>>         at kafka.network.Acceptor.run(SocketServer.scala:154)
>>>>
>>>>         at java.lang.Thread.run(Thread.java:679)
>>>>
>>>> [2014-06-24 21:43:44,711] ERROR Error in acceptor
>>>>(kafka.network.Acceptor)
>>>>
>>>> java.io.IOException: Too many open files
>>>>
>>>>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>>>
>>>>         at
>>>> 
>>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:
>>>>1
>>>>6
>>>>3)
>>>>
>>>>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
>>>>
>>>>         at kafka.network.Acceptor.run(SocketServer.scala:154)
>>>>
>>>>         at java.lang.Thread.run(Thread.java:679)
>>>>
>>>> When this happens, these 3 brokers essentially go out of sync when you
>>>>do
>>>> a ³kafka-topics.sh ‹describe².
>>>>
>>>> I tracked the number of open files by doing ³watch ­n 1 Œsudo lsof |
>>>>wc
>>>> ­l¹², which basically counts all open files on the system. The numbers
>>>>for
>>>> the systems are basically in the 6000 range, with one system going to
>>>>9000.
>>>> I presume the 9000 machine is the controller. Looking at the ulimit of
>>>>the
>>>> user, both the hard limit and the soft limit for open files is
>>>>100,000.
>>>> Using sysctl, the max file is fs.file-max = 9774928. So we seem to be
>>>>way
>>>> under the limit.
>>>>
>>>> What am I missing here? Is there some JVM limit around 10K open files
>>>>or
>>>> something?
>>>>
>>>> Paul Lung
>>>>
>>
>


Re: Getting "java.io.IOException: Too many open files"

Posted by "Lung, Paul" <pl...@ebay.com>.
Ok. What I just saw was that when the controller machine reaches around
4100+ files, it crashes. Then I think the controller bounced between 2
other machines, taking them down too, and the circled back to the original
machine.

Paul Lung

On 6/24/14, 10:51 PM, "Lung, Paul" <pl...@ebay.com> wrote:

>The controller machine has 3500 or so, while the other machines have
>around 1600.
>
>Paul Lung
>
>On 6/24/14, 10:31 PM, "Prakash Gowri Shankor" <pr...@gmail.com>
>wrote:
>
>>How many files does each broker itself have open ? You can find this from
>>'ls -l /proc/<processid>/fd'
>>
>>
>>
>>
>>On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul <pl...@ebay.com> wrote:
>>
>>> Hi All,
>>>
>>>
>>> I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the
>>>following
>>> error messages on the same 3 brokers once in a while:
>>>
>>>
>>> [2014-06-24 21:43:44,711] ERROR Error in acceptor
>>>(kafka.network.Acceptor)
>>>
>>> java.io.IOException: Too many open files
>>>
>>>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>>
>>>         at
>>> 
>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:1
>>>6
>>>3)
>>>
>>>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
>>>
>>>         at kafka.network.Acceptor.run(SocketServer.scala:154)
>>>
>>>         at java.lang.Thread.run(Thread.java:679)
>>>
>>> [2014-06-24 21:43:44,711] ERROR Error in acceptor
>>>(kafka.network.Acceptor)
>>>
>>> java.io.IOException: Too many open files
>>>
>>>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>>
>>>         at
>>> 
>>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:1
>>>6
>>>3)
>>>
>>>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
>>>
>>>         at kafka.network.Acceptor.run(SocketServer.scala:154)
>>>
>>>         at java.lang.Thread.run(Thread.java:679)
>>>
>>> When this happens, these 3 brokers essentially go out of sync when you
>>>do
>>> a ³kafka-topics.sh ‹describe².
>>>
>>> I tracked the number of open files by doing ³watch ­n 1 Œsudo lsof | wc
>>> ­l¹², which basically counts all open files on the system. The numbers
>>>for
>>> the systems are basically in the 6000 range, with one system going to
>>>9000.
>>> I presume the 9000 machine is the controller. Looking at the ulimit of
>>>the
>>> user, both the hard limit and the soft limit for open files is 100,000.
>>> Using sysctl, the max file is fs.file-max = 9774928. So we seem to be
>>>way
>>> under the limit.
>>>
>>> What am I missing here? Is there some JVM limit around 10K open files
>>>or
>>> something?
>>>
>>> Paul Lung
>>>
>


Re: Getting "java.io.IOException: Too many open files"

Posted by "Lung, Paul" <pl...@ebay.com>.
The controller machine has 3500 or so, while the other machines have
around 1600.

Paul Lung

On 6/24/14, 10:31 PM, "Prakash Gowri Shankor" <pr...@gmail.com>
wrote:

>How many files does each broker itself have open ? You can find this from
>'ls -l /proc/<processid>/fd'
>
>
>
>
>On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul <pl...@ebay.com> wrote:
>
>> Hi All,
>>
>>
>> I just upgraded my cluster from 0.8.1 to 0.8.1.1. I¹m seeing the
>>following
>> error messages on the same 3 brokers once in a while:
>>
>>
>> [2014-06-24 21:43:44,711] ERROR Error in acceptor
>>(kafka.network.Acceptor)
>>
>> java.io.IOException: Too many open files
>>
>>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>
>>         at
>> 
>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16
>>3)
>>
>>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
>>
>>         at kafka.network.Acceptor.run(SocketServer.scala:154)
>>
>>         at java.lang.Thread.run(Thread.java:679)
>>
>> [2014-06-24 21:43:44,711] ERROR Error in acceptor
>>(kafka.network.Acceptor)
>>
>> java.io.IOException: Too many open files
>>
>>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>>
>>         at
>> 
>>sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:16
>>3)
>>
>>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
>>
>>         at kafka.network.Acceptor.run(SocketServer.scala:154)
>>
>>         at java.lang.Thread.run(Thread.java:679)
>>
>> When this happens, these 3 brokers essentially go out of sync when you
>>do
>> a ³kafka-topics.sh ‹describe².
>>
>> I tracked the number of open files by doing ³watch ­n 1 Œsudo lsof | wc
>> ­l¹², which basically counts all open files on the system. The numbers
>>for
>> the systems are basically in the 6000 range, with one system going to
>>9000.
>> I presume the 9000 machine is the controller. Looking at the ulimit of
>>the
>> user, both the hard limit and the soft limit for open files is 100,000.
>> Using sysctl, the max file is fs.file-max = 9774928. So we seem to be
>>way
>> under the limit.
>>
>> What am I missing here? Is there some JVM limit around 10K open files or
>> something?
>>
>> Paul Lung
>>


Re: Getting "java.io.IOException: Too many open files"

Posted by Prakash Gowri Shankor <pr...@gmail.com>.
How many files does each broker itself have open ? You can find this from
'ls -l /proc/<processid>/fd'




On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul <pl...@ebay.com> wrote:

> Hi All,
>
>
> I just upgraded my cluster from 0.8.1 to 0.8.1.1. I’m seeing the following
> error messages on the same 3 brokers once in a while:
>
>
> [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor)
>
> java.io.IOException: Too many open files
>
>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>
>         at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
>
>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
>
>         at kafka.network.Acceptor.run(SocketServer.scala:154)
>
>         at java.lang.Thread.run(Thread.java:679)
>
> [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor)
>
> java.io.IOException: Too many open files
>
>         at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>
>         at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
>
>         at kafka.network.Acceptor.accept(SocketServer.scala:200)
>
>         at kafka.network.Acceptor.run(SocketServer.scala:154)
>
>         at java.lang.Thread.run(Thread.java:679)
>
> When this happens, these 3 brokers essentially go out of sync when you do
> a “kafka-topics.sh —describe”.
>
> I tracked the number of open files by doing “watch –n 1 ‘sudo lsof | wc
> –l’”, which basically counts all open files on the system. The numbers for
> the systems are basically in the 6000 range, with one system going to 9000.
> I presume the 9000 machine is the controller. Looking at the ulimit of the
> user, both the hard limit and the soft limit for open files is 100,000.
> Using sysctl, the max file is fs.file-max = 9774928. So we seem to be way
> under the limit.
>
> What am I missing here? Is there some JVM limit around 10K open files or
> something?
>
> Paul Lung
>