You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Ashish Paliwal (JIRA)" <ji...@apache.org> on 2014/11/05 12:29:34 UTC

[jira] [Resolved] (FLUME-1010) Collector failing but status ACTIVE

     [ https://issues.apache.org/jira/browse/FLUME-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Paliwal resolved FLUME-1010.
-----------------------------------
       Resolution: Won't Fix
    Fix Version/s: v0.9.5

Won't fix. 0.X branch not maintained anymore

> Collector failing but status ACTIVE
> -----------------------------------
>
>                 Key: FLUME-1010
>                 URL: https://issues.apache.org/jira/browse/FLUME-1010
>             Project: Flume
>          Issue Type: Bug
>          Components: Build, Node
>    Affects Versions: v0.9.4
>         Environment: Debian Squeeze, ubuntu maverick, cloudera package
>            Reporter: Nicolas Cerveaux
>              Labels: collector, driver
>             Fix For: v0.9.5
>
>
> After a certain time I received this error, nothing was written to the log until I restart the collector:
> 2012-02-24 16:57:41,553 ERROR com.cloudera.flume.core.connector.DirectDriver: Exiting driver logicalNode hdflume01.lamp.com-19 in error state CollectorSource | Collector because null
> My problem is the collector server is still marked as active and all the agents wont failover to the next collector, because the collector itself is still up.
> My configurations is the following:
> collectorSource	collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
> collectorSource	collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
> collectorSource	collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
> tail("/var/log/nginx/localhost.access.log")	agentE2EChain("hdflume01.lamp.com", "hdflume03.lamp.com")
> tail("/var/log/nginx/localhost.access.log")	agentE2EChain("hdflume01.lamp.com", "hdflume03.lamp.com")
> I think was able to reproduce this error with two virtual machine when I was changing the same configuration of the collector to:
> collectorSource	console
> 2012-02-25 13:46:17,711 INFO com.cloudera.flume.collector.CollectorSource: closed
> 2012-02-25 13:46:18,712 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35853...
> 2012-02-25 13:46:18,713 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 1 elements ...
> 2012-02-25 13:46:19,942 WARN com.cloudera.flume.agent.LivenessManager: Heartbeats are backing up, currently behind by 1 heartbeats
> 2012-02-25 13:46:24,944 WARN com.cloudera.flume.agent.LivenessManager: Heartbeats are backing up, currently behind by 2 heartbeats
> 2012-02-25 13:46:29,946 WARN com.cloudera.flume.agent.LivenessManager: Heartbeats are backing up, currently behind by 3 heartbeats
> 2012-02-25 13:46:34,949 WARN com.cloudera.flume.agent.LivenessManager: Heartbeats are backing up, currently behind by 4 heartbeats
> 2012-02-25 13:46:39,941 ERROR com.cloudera.flume.agent.LogicalNode: Forcing driver to exit uncleanly
> 2012-02-25 13:46:39,942 ERROR com.cloudera.flume.handlers.thrift.ThriftEventSource: Unexpected interrupt of close sleep interrupted
> java.lang.InterruptedException: sleep interrupted
>         at java.lang.Thread.sleep(Native Method)
>         at com.cloudera.flume.handlers.thrift.ThriftEventSource.close(ThriftEventSource.java:191)
>         at com.cloudera.flume.collector.CollectorSource.close(CollectorSource.java:61)
>         at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:166)
> 2012-02-25 13:46:39,943 ERROR com.cloudera.flume.core.connector.DirectDriver: Closing down due to exception during close calls
> 2012-02-25 13:46:39,943 INFO com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode gfiles02.lamp.com-18 exited with error: sleep interrupted
> java.lang.InterruptedException: sleep interrupted
>         at java.lang.Thread.sleep(Native Method)
>         at com.cloudera.flume.handlers.thrift.ThriftEventSource.close(ThriftEventSource.java:191)
>         at com.cloudera.flume.collector.CollectorSource.close(CollectorSource.java:61)
>         at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:166)
> 2012-02-25 13:46:39,944 INFO com.cloudera.flume.collector.CollectorSource: closed
> 2012-02-25 13:46:39,944 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35853...
> 2012-02-25 13:46:39,944 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 2 elements ...
> 2012-02-25 13:46:39,944 INFO com.cloudera.flume.agent.LogicalNode: Node config successfully set to FlumeConfigData: {srcVer:'Sat Feb 25 13:46:08 PST 2012' snkVer:'Sat Feb 25 13:46:08 PST 2012'  ts='Sat Feb 25 13:46:08 PST 2012' flowId:'default-flow' source:'collectorSource' sink:'console' }
> 2012-02-25 13:46:39,957 INFO com.cloudera.flume.collector.CollectorSource: opened
> 2012-02-25 13:46:39,988 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35853...
> 2012-02-25 13:46:39,997 INFO com.cloudera.flume.handlers.debug.ConsoleEventSink: ConsoleEventSink( debug ) opened
> 2012-02-25 13:46:49,978 WARN com.cloudera.flume.handlers.thrift.ThriftEventSource: Close timed out due to no progress.  Closing despite having 2 values still enqueued
> 2012-02-25 13:46:49,978 INFO com.cloudera.flume.handlers.rolling.RollSink: closing RollSink 'escapedCustomDfs("hdfs://hh01.hadoop.com:54310/logs/%Y/%m/%d/%H/","-%{rolltag}" )'
> 2012-02-25 13:46:49,979 WARN com.cloudera.flume.handlers.endtoend.AckChecksumChecker: partial acks abandoned: {20120225-134554606-0800.212691888462127.00000021=1330206354608}
> 2012-02-25 13:46:49,979 ERROR com.cloudera.flume.core.connector.DirectDriver: Exiting driver logicalNode gfiles02.lamp.com-18 in error state CollectorSource | Collector because sleep interrupted
> In both case the error state that the node is in error state but they really are in an active state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)