You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by static-max <fl...@googlemail.com> on 2016/11/12 10:33:26 UTC

"Too many open files" in Job Manager

Hi,

I get a ton of these messages in my Job Manager's logfile. This makes Flink
unstable, as I cannot list or cancel/stop the jobs.
I run Flink in YARN under a default Horton HDP 2.5 installation. HDP sets
the hard and soft limit of open files to 32768 for the user "yarn" that
runs the Flink JVMs, so that should not be an issue.
I also checked the number of open files for user "yarn" with "lsof -u yarn
| wc -l" and I got ~ 4000 open files when the errors occured in the logs,
so there should be room for more.

Any idea how to solve this?

Thanks, Max

2016-11-12 10:23:04,422 WARN
 org.jboss.netty.channel.socket.nio.AbstractNioSelector        - Failed to
accept a connection.
java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
        at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
        at
org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100)
        at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
        at
org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2016-11-12 10:23:04,898 WARN  io.netty.channel.DefaultChannelPipeline
                - An exceptionCaught() event was fired, and it reached at
the tail of the pipeline. It usually means the last handler in the pipeline
did not handle the exception.
java.io.IOException: Too many open files
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
        at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
        at
io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:135)
        at
io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:69)
        at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:745)

Re: "Too many open files" in Job Manager

Posted by Ufuk Celebi <uc...@apache.org>.
Hey Max!

Thanks for reporting this issue. Can you give more details about how you are running your job? If you are doing checkpoints to HDFS, could you please report how many checkpoints you find in your configured directory? Is everything properly cleaned up there?

– Ufuk

On 12 November 2016 at 11:33:47, static-max (flashacid@googlemail.com) wrote:
> Hi,
>  
> I get a ton of these messages in my Job Manager's logfile. This makes Flink
> unstable, as I cannot list or cancel/stop the jobs.
> I run Flink in YARN under a default Horton HDP 2.5 installation. HDP sets
> the hard and soft limit of open files to 32768 for the user "yarn" that
> runs the Flink JVMs, so that should not be an issue.
> I also checked the number of open files for user "yarn" with "lsof -u yarn
> | wc -l" and I got ~ 4000 open files when the errors occured in the logs,
> so there should be room for more.
>  
> Any idea how to solve this?
>  
> Thanks, Max
>  
> 2016-11-12 10:23:04,422 WARN
> org.jboss.netty.channel.socket.nio.AbstractNioSelector - Failed to
> accept a connection.
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)  
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)  
> at
> org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100)  
> at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)  
> at
> org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)  
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)  
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)  
> at java.lang.Thread.run(Thread.java:745)
> 2016-11-12 10:23:04,898 WARN io.netty.channel.DefaultChannelPipeline
> - An exceptionCaught() event was fired, and it reached at
> the tail of the pipeline. It usually means the last handler in the pipeline
> did not handle the exception.
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)  
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)  
> at
> io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:135)
> at
> io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:69)  
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)  
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)  
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)  
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)  
> at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)  
> at java.lang.Thread.run(Thread.java:745)
>