You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Nicolas Cerveaux (Created) (JIRA)" <ji...@apache.org> on 2012/03/01 01:59:58 UTC

[jira] [Created] (FLUME-1010) Collector failing but status ACTIVE

Collector failing but status ACTIVE
-----------------------------------

                 Key: FLUME-1010
                 URL: https://issues.apache.org/jira/browse/FLUME-1010
             Project: Flume
          Issue Type: Bug
          Components: Build, Node
    Affects Versions: v0.9.4
         Environment: Debian Squeeze, ubuntu maverick, cloudera package
            Reporter: Nicolas Cerveaux


After a certain time I received this error, nothing was written to the log until I restart the collector:
2012-02-24 16:57:41,553 ERROR com.cloudera.flume.core.connector.DirectDriver: Exiting driver logicalNode hdflume01.lamp.com-19 in error state CollectorSource | Collector because null

My problem is the collector server is still marked as active and all the agents wont failover to the next collector, because the collector itself is still up.

My configurations is the following:
collectorSource	collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
collectorSource	collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
collectorSource	collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
tail("/var/log/nginx/localhost.access.log")	agentE2EChain("hdflume01.lamp.com", "hdflume03.lamp.com")
tail("/var/log/nginx/localhost.access.log")	agentE2EChain("hdflume01.lamp.com", "hdflume03.lamp.com")

I think was able to reproduce this error with two virtual machine when I was changing the same configuration of the collector to:
collectorSource	console

2012-02-25 13:46:17,711 INFO com.cloudera.flume.collector.CollectorSource: closed
2012-02-25 13:46:18,712 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35853...
2012-02-25 13:46:18,713 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 1 elements ...
2012-02-25 13:46:19,942 WARN com.cloudera.flume.agent.LivenessManager: Heartbeats are backing up, currently behind by 1 heartbeats
2012-02-25 13:46:24,944 WARN com.cloudera.flume.agent.LivenessManager: Heartbeats are backing up, currently behind by 2 heartbeats
2012-02-25 13:46:29,946 WARN com.cloudera.flume.agent.LivenessManager: Heartbeats are backing up, currently behind by 3 heartbeats
2012-02-25 13:46:34,949 WARN com.cloudera.flume.agent.LivenessManager: Heartbeats are backing up, currently behind by 4 heartbeats
2012-02-25 13:46:39,941 ERROR com.cloudera.flume.agent.LogicalNode: Forcing driver to exit uncleanly
2012-02-25 13:46:39,942 ERROR com.cloudera.flume.handlers.thrift.ThriftEventSource: Unexpected interrupt of close sleep interrupted
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at com.cloudera.flume.handlers.thrift.ThriftEventSource.close(ThriftEventSource.java:191)
        at com.cloudera.flume.collector.CollectorSource.close(CollectorSource.java:61)
        at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:166)
2012-02-25 13:46:39,943 ERROR com.cloudera.flume.core.connector.DirectDriver: Closing down due to exception during close calls
2012-02-25 13:46:39,943 INFO com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode gfiles02.lamp.com-18 exited with error: sleep interrupted
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at com.cloudera.flume.handlers.thrift.ThriftEventSource.close(ThriftEventSource.java:191)
        at com.cloudera.flume.collector.CollectorSource.close(CollectorSource.java:61)
        at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:166)
2012-02-25 13:46:39,944 INFO com.cloudera.flume.collector.CollectorSource: closed
2012-02-25 13:46:39,944 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35853...
2012-02-25 13:46:39,944 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 2 elements ...
2012-02-25 13:46:39,944 INFO com.cloudera.flume.agent.LogicalNode: Node config successfully set to FlumeConfigData: {srcVer:'Sat Feb 25 13:46:08 PST 2012' snkVer:'Sat Feb 25 13:46:08 PST 2012'  ts='Sat Feb 25 13:46:08 PST 2012' flowId:'default-flow' source:'collectorSource' sink:'console' }
2012-02-25 13:46:39,957 INFO com.cloudera.flume.collector.CollectorSource: opened
2012-02-25 13:46:39,988 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35853...
2012-02-25 13:46:39,997 INFO com.cloudera.flume.handlers.debug.ConsoleEventSink: ConsoleEventSink( debug ) opened
2012-02-25 13:46:49,978 WARN com.cloudera.flume.handlers.thrift.ThriftEventSource: Close timed out due to no progress.  Closing despite having 2 values still enqueued
2012-02-25 13:46:49,978 INFO com.cloudera.flume.handlers.rolling.RollSink: closing RollSink 'escapedCustomDfs("hdfs://hh01.hadoop.com:54310/logs/%Y/%m/%d/%H/","-%{rolltag}" )'
2012-02-25 13:46:49,979 WARN com.cloudera.flume.handlers.endtoend.AckChecksumChecker: partial acks abandoned: {20120225-134554606-0800.212691888462127.00000021=1330206354608}
2012-02-25 13:46:49,979 ERROR com.cloudera.flume.core.connector.DirectDriver: Exiting driver logicalNode gfiles02.lamp.com-18 in error state CollectorSource | Collector because sleep interrupted


In both case the error state that the node is in error state but they really are in an active state.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira