You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Will McQueen (Created) (JIRA)" <ji...@apache.org> on 2012/03/29 03:29:29 UTC

[jira] [Created] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Flume agent reconfiguration enters permanent bad state
------------------------------------------------------

                 Key: FLUME-1079
                 URL: https://issues.apache.org/jira/browse/FLUME-1079
             Project: Flume
          Issue Type: Bug
          Components: Node
    Affects Versions: v1.2.0
            Reporter: Will McQueen
             Fix For: v1.2.0


Steps:
1) Start with this config in a1.properties:
# a = agent
# r = source
# c = channel
# k = sink
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# ===SOURCES===
a1.sources.r1.type = NETCAT
a1.sources.r1.channels = c1
a1.sources.r1.bind = localhost
a1.sources.r1.port = 1473
# ===CHANNELS===
a1.channels.c1.type = MEMORY
# ===SINKS===
a1.sinks.k1.type = NULL
a1.sinks.k1.channel = c1

2) Run the flume node:
bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1

3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
# a = agent
# r = source
# c = channel
# k = sink
a1.sources = r1 r2
a1.channels = c1
a1.sinks = k1
# ===SOURCES===
a1.sources.r1.type = NETCAT
a1.sources.r1.channels = c1
a1.sources.r1.bind = localhost
a1.sources.r1.port = 1473
a1.sources.r2.type = AVRO
a1.sources.r2.channels = c1
a1.sources.r2.bind = localhost
a1.sources.r2.port = 1473
# ===CHANNELS===
a1.channels.c1.type = MEMORY
# ===SINKS===
a1.sinks.k1.type = NULL
a1.sinks.k1.channel = c1

...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
java.lang.NullPointerException
        at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
        at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
        at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
        at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
        at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

4) Now correct the config by changing r2's port to 1474:
# a = agent
# r = source
# c = channel
# k = sink
a1.sources = r1 r2
a1.channels = c1
a1.sinks = k1
# ===SOURCES===
a1.sources.r1.type = NETCAT
a1.sources.r1.channels = c1
a1.sources.r1.bind = localhost
a1.sources.r1.port = 1473
a1.sources.r2.type = AVRO
a1.sources.r2.channels = c1
a1.sources.r2.bind = localhost
a1.sources.r2.port = 1474
# ===CHANNELS===
a1.channels.c1.type = MEMORY
# ===SINKS===
a1.sinks.k1.type = NULL
a1.sinks.k1.channel = c1

...but this results in an illegal state:
java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
        at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
        at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
        at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
        at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

...which tells me that we've entered a permanent bad state that would require restarting the agent.

5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt

2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
org.apache.flume.FlumeException: RPC connection error. Exception follows.
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
        at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
        at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
        at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
        at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
        at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
        at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
        at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
        at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
        at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
        ... 6 more
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Arvind Prabhakar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240962#comment-13240962 ] 

Arvind Prabhakar commented on FLUME-1079:
-----------------------------------------

@Hari - I think this is a serious issue because there is no way to recover from it without restarting the process. Which means that due to a mistake during reconfiguration, the agent can enter this bad state and would then require a complete shutdown in order to fix it.

@Will - does this interpretation match what you have observed?
                
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Priority: Minor
>             Fix For: v1.2.0
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Hari Shreedharan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240904#comment-13240904 ] 

Hari Shreedharan commented on FLUME-1079:
-----------------------------------------

Here is an analysis:
The config changes -> AbstractFileConfigProvider$FileWatcherRunnable calls doLoad, which in turn, calls load, which tries to stop previous threads
since the old components never loaded correctly, a call to stop throws a NullPointerException
2012-03-28 17:52:36,503 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
java.lang.NullPointerException
   at org.apache.flume.source.AvroSource.stop(AvroSource.java:150)
   at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)

That causes the load function to exit, and never load the new configuration. 
                
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>             Fix For: v1.2.0
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242096#comment-13242096 ] 

Hudson commented on FLUME-1079:
-------------------------------

Integrated in flume-trunk #150 (See [https://builds.apache.org/job/flume-trunk/150/])
    FLUME-1079. Flume agent reconfiguration enters permanent bad state.

(Hari Shreedharan via Arvind Prabhakar) (Revision 1307278)

     Result = SUCCESS
arvind : http://svn.apache.org/viewvc/?view=rev&rev=1307278
Files : 
* /incubator/flume/trunk/flume-ng-node/src/main/java/org/apache/flume/node/nodemanager/DefaultLogicalNodeManager.java

                
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Assignee: Hari Shreedharan
>            Priority: Minor
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1079-1.patch, FLUME-1079-2.patch
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241931#comment-13241931 ] 

jiraposter@reviews.apache.org commented on FLUME-1079:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4551/#review6552
-----------------------------------------------------------


Changes look good Hari. One comment below:


flume-ng-node/src/main/java/org/apache/flume/node/nodemanager/DefaultLogicalNodeManager.java
<https://reviews.apache.org/r/4551/#comment14229>

    We should have the same logic here as well since otherwise if one component fails to start the others will not be attempted.



flume-ng-node/src/main/java/org/apache/flume/node/nodemanager/DefaultLogicalNodeManager.java
<https://reviews.apache.org/r/4551/#comment14230>

    Same here as well.


- Arvind


On 2012-03-29 07:25:45, Hari Shreedharan wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4551/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-29 07:25:45)
bq.  
bq.  
bq.  Review request for Flume.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Fixing a bug that causes a bad configuration to never allow reconfiguration.
bq.  
bq.  
bq.  This addresses bug FLUME-1079.
bq.      https://issues.apache.org/jira/browse/FLUME-1079
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-ng-node/src/main/java/org/apache/flume/node/nodemanager/DefaultLogicalNodeManager.java 2c0cff6 
bq.  
bq.  Diff: https://reviews.apache.org/r/4551/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Verified using the conf that produced the error. Works ok now.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Hari
bq.  
bq.


                
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Assignee: Hari Shreedharan
>            Priority: Minor
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1079-1.patch
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241047#comment-13241047 ] 

jiraposter@reviews.apache.org commented on FLUME-1079:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4551/
-----------------------------------------------------------

Review request for Flume.


Summary
-------

Fixing a bug that causes a bad configuration to never allow reconfiguration.


This addresses bug FLUME-1079.
    https://issues.apache.org/jira/browse/FLUME-1079


Diffs
-----

  flume-ng-node/src/main/java/org/apache/flume/node/nodemanager/DefaultLogicalNodeManager.java 2c0cff6 

Diff: https://reviews.apache.org/r/4551/diff


Testing
-------

Verified using the conf that produced the error. Works ok now.


Thanks,

Hari


                
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Assignee: Hari Shreedharan
>            Priority: Minor
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1079-1.patch
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Arvind Prabhakar (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arvind Prabhakar updated FLUME-1079:
------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch committed. Thanks Hari!
                
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Assignee: Hari Shreedharan
>            Priority: Minor
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1079-1.patch, FLUME-1079-2.patch
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Hari Shreedharan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hari Shreedharan updated FLUME-1079:
------------------------------------

    Status: Patch Available  (was: Open)
    
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Assignee: Hari Shreedharan
>            Priority: Minor
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1079-1.patch
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241945#comment-13241945 ] 

jiraposter@reviews.apache.org commented on FLUME-1079:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4551/
-----------------------------------------------------------

(Updated 2012-03-30 00:06:09.676444)


Review request for Flume.


Changes
-------

Adding checks for failed component starts.


Summary
-------

Fixing a bug that causes a bad configuration to never allow reconfiguration.


This addresses bug FLUME-1079.
    https://issues.apache.org/jira/browse/FLUME-1079


Diffs (updated)
-----

  flume-ng-node/src/main/java/org/apache/flume/node/nodemanager/DefaultLogicalNodeManager.java 2c0cff6 

Diff: https://reviews.apache.org/r/4551/diff


Testing
-------

Verified using the conf that produced the error. Works ok now.


Thanks,

Hari


                
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Assignee: Hari Shreedharan
>            Priority: Minor
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1079-1.patch
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Will McQueen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will McQueen updated FLUME-1079:
--------------------------------

    Description: 
Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397

Steps:
1) Start with this config in a1.properties:
# a = agent
# r = source
# c = channel
# k = sink
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# ===SOURCES===
a1.sources.r1.type = NETCAT
a1.sources.r1.channels = c1
a1.sources.r1.bind = localhost
a1.sources.r1.port = 1473
# ===CHANNELS===
a1.channels.c1.type = MEMORY
# ===SINKS===
a1.sinks.k1.type = NULL
a1.sinks.k1.channel = c1

2) Run the flume node:
bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1

3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
# a = agent
# r = source
# c = channel
# k = sink
a1.sources = r1 r2
a1.channels = c1
a1.sinks = k1
# ===SOURCES===
a1.sources.r1.type = NETCAT
a1.sources.r1.channels = c1
a1.sources.r1.bind = localhost
a1.sources.r1.port = 1473
a1.sources.r2.type = AVRO
a1.sources.r2.channels = c1
a1.sources.r2.bind = localhost
a1.sources.r2.port = 1473
# ===CHANNELS===
a1.channels.c1.type = MEMORY
# ===SINKS===
a1.sinks.k1.type = NULL
a1.sinks.k1.channel = c1

...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
java.lang.NullPointerException
        at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
        at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
        at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
        at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
        at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

4) Now correct the config by changing r2's port to 1474:
# a = agent
# r = source
# c = channel
# k = sink
a1.sources = r1 r2
a1.channels = c1
a1.sinks = k1
# ===SOURCES===
a1.sources.r1.type = NETCAT
a1.sources.r1.channels = c1
a1.sources.r1.bind = localhost
a1.sources.r1.port = 1473
a1.sources.r2.type = AVRO
a1.sources.r2.channels = c1
a1.sources.r2.bind = localhost
a1.sources.r2.port = 1474
# ===CHANNELS===
a1.channels.c1.type = MEMORY
# ===SINKS===
a1.sinks.k1.type = NULL
a1.sinks.k1.channel = c1

...but this results in an illegal state:
java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
        at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
        at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
        at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
        at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

...which tells me that we've entered a permanent bad state that would require restarting the agent.

5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt

2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
org.apache.flume.FlumeException: RPC connection error. Exception follows.
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
        at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
        at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
        at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
        at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
        at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
        at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
        at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
        at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
        at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
        ... 6 more
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting




  was:
Steps:
1) Start with this config in a1.properties:
# a = agent
# r = source
# c = channel
# k = sink
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# ===SOURCES===
a1.sources.r1.type = NETCAT
a1.sources.r1.channels = c1
a1.sources.r1.bind = localhost
a1.sources.r1.port = 1473
# ===CHANNELS===
a1.channels.c1.type = MEMORY
# ===SINKS===
a1.sinks.k1.type = NULL
a1.sinks.k1.channel = c1

2) Run the flume node:
bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1

3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
# a = agent
# r = source
# c = channel
# k = sink
a1.sources = r1 r2
a1.channels = c1
a1.sinks = k1
# ===SOURCES===
a1.sources.r1.type = NETCAT
a1.sources.r1.channels = c1
a1.sources.r1.bind = localhost
a1.sources.r1.port = 1473
a1.sources.r2.type = AVRO
a1.sources.r2.channels = c1
a1.sources.r2.bind = localhost
a1.sources.r2.port = 1473
# ===CHANNELS===
a1.channels.c1.type = MEMORY
# ===SINKS===
a1.sinks.k1.type = NULL
a1.sinks.k1.channel = c1

...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
java.lang.NullPointerException
        at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
        at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
        at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
        at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
        at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

4) Now correct the config by changing r2's port to 1474:
# a = agent
# r = source
# c = channel
# k = sink
a1.sources = r1 r2
a1.channels = c1
a1.sinks = k1
# ===SOURCES===
a1.sources.r1.type = NETCAT
a1.sources.r1.channels = c1
a1.sources.r1.bind = localhost
a1.sources.r1.port = 1473
a1.sources.r2.type = AVRO
a1.sources.r2.channels = c1
a1.sources.r2.bind = localhost
a1.sources.r2.port = 1474
# ===CHANNELS===
a1.channels.c1.type = MEMORY
# ===SINKS===
a1.sinks.k1.type = NULL
a1.sinks.k1.channel = c1

...but this results in an illegal state:
java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
        at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
        at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
        at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
        at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
        at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

...which tells me that we've entered a permanent bad state that would require restarting the agent.

5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt

2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
org.apache.flume.FlumeException: RPC connection error. Exception follows.
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
        at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
        at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
        at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
        at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
        at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
        at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
        at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
        at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
        at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
        ... 6 more
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting




    
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>            Reporter: Will McQueen
>             Fix For: v1.2.0
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Hari Shreedharan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hari Shreedharan updated FLUME-1079:
------------------------------------

    Attachment: FLUME-1079-1.patch
    
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Assignee: Hari Shreedharan
>            Priority: Minor
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1079-1.patch
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Will McQueen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will McQueen updated FLUME-1079:
--------------------------------

    Environment: 
CentOS 6.2 64-bit
JDK 1.6.0_26 64-bit
    
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>             Fix For: v1.2.0
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Arvind Prabhakar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240961#comment-13240961 ] 

Arvind Prabhakar commented on FLUME-1079:
-----------------------------------------

@Hari - I think this is a serious issue because there is no way to recover from it without restarting the process. Which means that due to a mistake during reconfiguration, the agent can enter this bad state and would then require a complete shutdown in order to fix it.

@Will - does this interpretation match what you have observed?
                
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Priority: Minor
>             Fix For: v1.2.0
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242007#comment-13242007 ] 

jiraposter@reviews.apache.org commented on FLUME-1079:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4551/#review6558
-----------------------------------------------------------

Ship it!


lgtm


- Prasad


On 2012-03-30 00:06:09, Hari Shreedharan wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4551/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-30 00:06:09)
bq.  
bq.  
bq.  Review request for Flume.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Fixing a bug that causes a bad configuration to never allow reconfiguration.
bq.  
bq.  
bq.  This addresses bug FLUME-1079.
bq.      https://issues.apache.org/jira/browse/FLUME-1079
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-ng-node/src/main/java/org/apache/flume/node/nodemanager/DefaultLogicalNodeManager.java 2c0cff6 
bq.  
bq.  Diff: https://reviews.apache.org/r/4551/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Verified using the conf that produced the error. Works ok now.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Hari
bq.  
bq.


                
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Assignee: Hari Shreedharan
>            Priority: Minor
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1079-1.patch
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Hari Shreedharan (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hari Shreedharan reassigned FLUME-1079:
---------------------------------------

    Assignee: Hari Shreedharan
    
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Assignee: Hari Shreedharan
>            Priority: Minor
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1079-1.patch
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Hari Shreedharan (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241020#comment-13241020 ] 

Hari Shreedharan edited comment on FLUME-1079 at 3/29/12 6:57 AM:
------------------------------------------------------------------

Ok - I will add a try catch around the call to stop the components. This will make sure that even if one throws an exception, we can still proceed to the next. 

Arvind - Yes, this is what happens.
                
      was (Author: hshreedharan):
    Ok - I will add a try catch around the call to stop the components. This will make sure that even if one throws an exception, we can still proceed to the next. 
                  
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Priority: Minor
>             Fix For: v1.2.0
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242013#comment-13242013 ] 

jiraposter@reviews.apache.org commented on FLUME-1079:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4551/#review6560
-----------------------------------------------------------

Ship it!


+1. Please attach patch to the Jira.

- Arvind


On 2012-03-30 00:06:09, Hari Shreedharan wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4551/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-30 00:06:09)
bq.  
bq.  
bq.  Review request for Flume.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Fixing a bug that causes a bad configuration to never allow reconfiguration.
bq.  
bq.  
bq.  This addresses bug FLUME-1079.
bq.      https://issues.apache.org/jira/browse/FLUME-1079
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    flume-ng-node/src/main/java/org/apache/flume/node/nodemanager/DefaultLogicalNodeManager.java 2c0cff6 
bq.  
bq.  Diff: https://reviews.apache.org/r/4551/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Verified using the conf that produced the error. Works ok now.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Hari
bq.  
bq.


                
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Assignee: Hari Shreedharan
>            Priority: Minor
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1079-1.patch
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Hari Shreedharan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241020#comment-13241020 ] 

Hari Shreedharan commented on FLUME-1079:
-----------------------------------------

Ok - I will add a try catch around the call to stop the components. This will make sure that even if one throws an exception, we can still proceed to the next. 
                
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Priority: Minor
>             Fix For: v1.2.0
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Hari Shreedharan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hari Shreedharan updated FLUME-1079:
------------------------------------

    Attachment: FLUME-1079-2.patch
    
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Assignee: Hari Shreedharan
>            Priority: Minor
>             Fix For: v1.2.0
>
>         Attachments: FLUME-1079-1.patch, FLUME-1079-2.patch
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (FLUME-1079) Flume agent reconfiguration enters permanent bad state

Posted by "Hari Shreedharan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/FLUME-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hari Shreedharan updated FLUME-1079:
------------------------------------

    Priority: Minor  (was: Major)

It is not a major issue. The problem happens only when one of the configs causes the components not to start(due to an error - like a port bind error etc.) and then a reconfigure occurs. 
                
> Flume agent reconfiguration enters permanent bad state
> ------------------------------------------------------
>
>                 Key: FLUME-1079
>                 URL: https://issues.apache.org/jira/browse/FLUME-1079
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.2.0
>         Environment: CentOS 6.2 64-bit
> JDK 1.6.0_26 64-bit
>            Reporter: Will McQueen
>            Priority: Minor
>             Fix For: v1.2.0
>
>
> Using flume trunk, commit ad24cb31bb1b5a0d1ee4b0ec18572a223ed9d397
> Steps:
> 1) Start with this config in a1.properties:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> 2) Run the flume node:
> bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1
> 3) Update the a1.properties file to add a new source a the same port, which would cause a port bind exception on r2 due to r1 already using port 1473:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1473
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...and updating the props file to the above config results in (after waiting a max of 30 secs for the reconfig to be noticed):
> 2012-03-28 18:11:24,027 (conf-file-poller-0) [ERROR - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:205)] Failed to load configuration data. Exception follows.
> java.lang.NullPointerException
>         at org.apache.flume.source.AvroSource.stop(AvroSource.java:137)
>         at org.apache.flume.source.EventDrivenSourceRunner.stop(EventDrivenSourceRunner.java:45)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:155)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:66)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 4) Now correct the config by changing r2's port to 1474:
> # a = agent
> # r = source
> # c = channel
> # k = sink
> a1.sources = r1 r2
> a1.channels = c1
> a1.sinks = k1
> # ===SOURCES===
> a1.sources.r1.type = NETCAT
> a1.sources.r1.channels = c1
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 1473
> a1.sources.r2.type = AVRO
> a1.sources.r2.channels = c1
> a1.sources.r2.bind = localhost
> a1.sources.r2.port = 1474
> # ===CHANNELS===
> a1.channels.c1.type = MEMORY
> # ===SINKS===
> a1.sinks.k1.type = NULL
> a1.sinks.k1.channel = c1
> ...but this results in an illegal state:
> java.lang.IllegalStateException: Unaware of SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5090d8ea counterGroup:{ name:null counters:{runner.backoffs.consecutive=5, runner.backoffs=5, runner.interruptions=1} } } - can not unsupervise
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:145)
>         at org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:61)
>         at org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:217)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:124)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>         at org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:203)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ...which tells me that we've entered a permanent bad state that would require restarting the agent.
> 5) Start the avro-client. We expect the avro-client to connect to the agent (if there would have been no errors in previous steps), but connection is refused:
> bin/flume-ng avro-client --cnf --host localhost --port 1474 --filename /home/will/bigdata.txt
> 2012-03-28 18:27:35,650 (main) [ERROR - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:72)] Unable to open connection to Flume. Exception follows.
> org.apache.flume.FlumeException: RPC connection error. Exception follows.
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:114)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:96)
>         at org.apache.flume.api.NettyAvroRpcClient.access$100(NettyAvroRpcClient.java:50)
>         at org.apache.flume.api.NettyAvroRpcClient$Builder.build(NettyAvroRpcClient.java:389)
>         at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:45)
>         at org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:120)
>         at org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:64)
> Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:1474
>         at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:250)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:199)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:148)
>         at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:116)
>         at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:107)
>         ... 6 more
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:384)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:354)
>         at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:276)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2012-03-28 18:27:35,683 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:77)] Exiting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira