You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "David Stendardi (JIRA)" <ji...@apache.org> on 2014/04/30 20:34:22 UTC
[jira] [Updated] (FLUME-2375) HDFS sink's fail to recover from
datanode unavailability
[ https://issues.apache.org/jira/browse/FLUME-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Stendardi updated FLUME-2375:
-----------------------------------
Description:
Hello !
We are running flume-ng with version cdh-4.5-1.4. When a datanode used by flume-ng goes done, we get the following exceptions :
{code}
30 Apr 2014 01:10:38,130 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated:96) - Unexpected error while checking replication factor
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
at org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
at org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
{code}
These exceptions are logged but not rethrown, and the AbstractHdfsSink::isUnderReplicated still returns false so the writer continue to try writing on the node.
Here is how we configured our sink :
{code}
# Describe persistence sink
collector.sinks.hdfs.channel = hdfs
collector.sinks.hdfs.type = hdfs
collector.sinks.hdfs.hdfs.path = /flume-ng/%{env}/%{avro.fqn}/from_year=%Y/from_date=%Y-%m-%d
collector.sinks.hdfs.hdfs.filePrefix = <%= @hostname %>-%H-%{avro.fp}
collector.sinks.hdfs.hdfs.fileSuffix = .avro
collector.sinks.hdfs.hdfs.rollInterval = 3605
collector.sinks.hdfs.hdfs.rollSize = 0
collector.sinks.hdfs.hdfs.rollCount = 0
collector.sinks.hdfs.hdfs.batchSize = 1000
collector.sinks.hdfs.hdfs.txnEventMax = 1000
collector.sinks.hdfs.hdfs.callTimeout = 20000
collector.sinks.hdfs.hdfs.fileType = DataStream
collector.sinks.hdfs.serializer = com.viadeo.event.flume.serializer.AvroEventSerializer$Builder
{code}
was:
Hello !
We are running flume-ng with version cdh-4.5-1.4. When a datanode used by flume-ng goes done, we get the following exceptions :
{code}
30 Apr 2014 01:10:38,130 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated:96) - Unexpected error while checking replication factor
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
at org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
at org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
{code}
These exceptions are logged but not rethrown, and the AbstractHdfsSink::isUnderReplicated still returns false so the writer continue to try writing on the node.
> HDFS sink's fail to recover from datanode unavailability
> --------------------------------------------------------
>
> Key: FLUME-2375
> URL: https://issues.apache.org/jira/browse/FLUME-2375
> Project: Flume
> Issue Type: Bug
> Affects Versions: v1.4.0
> Reporter: David Stendardi
> Labels: hdfs, hdfssink
>
> Hello !
> We are running flume-ng with version cdh-4.5-1.4. When a datanode used by flume-ng goes done, we get the following exceptions :
> {code}
> 30 Apr 2014 01:10:38,130 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated:96) - Unexpected error while checking replication factor
> java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
> at org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
> at org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
> at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
> at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
> at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> These exceptions are logged but not rethrown, and the AbstractHdfsSink::isUnderReplicated still returns false so the writer continue to try writing on the node.
> Here is how we configured our sink :
> {code}
> # Describe persistence sink
> collector.sinks.hdfs.channel = hdfs
> collector.sinks.hdfs.type = hdfs
> collector.sinks.hdfs.hdfs.path = /flume-ng/%{env}/%{avro.fqn}/from_year=%Y/from_date=%Y-%m-%d
> collector.sinks.hdfs.hdfs.filePrefix = <%= @hostname %>-%H-%{avro.fp}
> collector.sinks.hdfs.hdfs.fileSuffix = .avro
> collector.sinks.hdfs.hdfs.rollInterval = 3605
> collector.sinks.hdfs.hdfs.rollSize = 0
> collector.sinks.hdfs.hdfs.rollCount = 0
> collector.sinks.hdfs.hdfs.batchSize = 1000
> collector.sinks.hdfs.hdfs.txnEventMax = 1000
> collector.sinks.hdfs.hdfs.callTimeout = 20000
> collector.sinks.hdfs.hdfs.fileType = DataStream
> collector.sinks.hdfs.serializer = com.viadeo.event.flume.serializer.AvroEventSerializer$Builder
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)