You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Siddharth Ahuja (JIRA)" <ji...@apache.org> on 2016/05/08 14:33:12 UTC

[jira] [Updated] (FLUME-2905) NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks

     [ https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siddharth Ahuja updated FLUME-2905:
-----------------------------------
    Description: 
During the flume agent start-up, the flume configuration containing the NetcatSource is parsed and the source's start() is called. If there is an issue while binding the channel's socket to a local address to configure the socket to listen for connections following exception is thrown but the socket open just before is not closed. 

{code}
2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - Exception follows.
org.apache.flume.FlumeException: java.net.BindException: Address already in use
        at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173)
        at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:444)
        at sun.nio.ch.Net.bind(Net.java:436)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
        at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167)
        ... 9 more
{code}

The source's start() is then called again leading to another socket being opened but not closed and so on. This leads to file descriptor (socket) leaks.

This can be easily reproduced as follows:
1. Set Netcat as the source in flume agent configuration.
2. Set the bind port for the netcat source to a port which is already in use. e.g. in my case I used 50010 which is the port for DataNode's XCeiver Protocol in use by the HDFS service.
3. Start flume agent and perform "lsof -p <flume_process_id> | wc -l". Notice the file descriptors keep on growing due to socket leaks with errors like: "can't identify protocol".


  was:
During the flume agent start-up, the flume configuration containing the NetcatSource is parsed and the source's start() is called. If there is an issue while binding the channel's socket to a local address to configure the socket to listen for connections following exception is thrown but the socket open just before is not closed. 

2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - Exception follows.
org.apache.flume.FlumeException: java.net.BindException: Address already in use
        at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173)
        at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:444)
        at sun.nio.ch.Net.bind(Net.java:436)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
        at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167)
        ... 9 more

The source's start() is then called again leading to another socket being opened but not closed and so on. This leads to file descriptor (socket) leaks.

This can be easily reproduced as follows:
1. Set Netcat as the source in flume agent configuration.
2. Set the bind port for the netcat source to a port which is already in use. e.g. in my case I used 50010 which is the port for DataNode's XCeiver Protocol in use by the HDFS service.
3. Start flume agent and perform "lsof -p <flume_process_id> | wc -l". Notice the file descriptors keep on growing due to socket leaks with errors like: "can't identify protocol".



> NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-2905
>                 URL: https://issues.apache.org/jira/browse/FLUME-2905
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.6.0
>            Reporter: Siddharth Ahuja
>            Assignee: Siddharth Ahuja
>
> During the flume agent start-up, the flume configuration containing the NetcatSource is parsed and the source's start() is called. If there is an issue while binding the channel's socket to a local address to configure the socket to listen for connections following exception is thrown but the socket open just before is not closed. 
> {code}
> 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - Exception follows.
> org.apache.flume.FlumeException: java.net.BindException: Address already in use
>         at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173)
>         at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
>         at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.BindException: Address already in use
>         at sun.nio.ch.Net.bind0(Native Method)
>         at sun.nio.ch.Net.bind(Net.java:444)
>         at sun.nio.ch.Net.bind(Net.java:436)
>         at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>         at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>         at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
>         at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167)
>         ... 9 more
> {code}
> The source's start() is then called again leading to another socket being opened but not closed and so on. This leads to file descriptor (socket) leaks.
> This can be easily reproduced as follows:
> 1. Set Netcat as the source in flume agent configuration.
> 2. Set the bind port for the netcat source to a port which is already in use. e.g. in my case I used 50010 which is the port for DataNode's XCeiver Protocol in use by the HDFS service.
> 3. Start flume agent and perform "lsof -p <flume_process_id> | wc -l". Notice the file descriptors keep on growing due to socket leaks with errors like: "can't identify protocol".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)