You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "David Stendardi (JIRA)" <ji...@apache.org> on 2014/04/30 20:34:22 UTC

[jira] [Updated] (FLUME-2375) HDFS sink's fail to recover from datanode unavailability

     [ https://issues.apache.org/jira/browse/FLUME-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Stendardi updated FLUME-2375:
-----------------------------------

    Description: 
Hello !

We are running flume-ng with version cdh-4.5-1.4. When a datanode used by flume-ng goes done, we get the following exceptions :  

{code}
30 Apr 2014 01:10:38,130 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated:96)  - Unexpected error while checking replication factor
java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
        at org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
        at org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
        at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)
{code}

These exceptions are logged but not rethrown, and the AbstractHdfsSink::isUnderReplicated still returns false so the writer continue to try writing on the node.

Here is how we configured our sink : 

{code}
# Describe persistence sink
collector.sinks.hdfs.channel = hdfs
collector.sinks.hdfs.type = hdfs
collector.sinks.hdfs.hdfs.path = /flume-ng/%{env}/%{avro.fqn}/from_year=%Y/from_date=%Y-%m-%d
collector.sinks.hdfs.hdfs.filePrefix = <%= @hostname %>-%H-%{avro.fp}
collector.sinks.hdfs.hdfs.fileSuffix = .avro
collector.sinks.hdfs.hdfs.rollInterval = 3605
collector.sinks.hdfs.hdfs.rollSize = 0
collector.sinks.hdfs.hdfs.rollCount = 0
collector.sinks.hdfs.hdfs.batchSize = 1000
collector.sinks.hdfs.hdfs.txnEventMax = 1000
collector.sinks.hdfs.hdfs.callTimeout = 20000
collector.sinks.hdfs.hdfs.fileType = DataStream
collector.sinks.hdfs.serializer = com.viadeo.event.flume.serializer.AvroEventSerializer$Builder
{code}


  was:
Hello !

We are running flume-ng with version cdh-4.5-1.4. When a datanode used by flume-ng goes done, we get the following exceptions :  

{code}
30 Apr 2014 01:10:38,130 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated:96)  - Unexpected error while checking replication factor
java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
        at org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
        at org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
        at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)
{code}

These exceptions are logged but not rethrown, and the AbstractHdfsSink::isUnderReplicated still returns false so the writer continue to try writing on the node.



> HDFS sink's fail to recover from datanode unavailability
> --------------------------------------------------------
>
>                 Key: FLUME-2375
>                 URL: https://issues.apache.org/jira/browse/FLUME-2375
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v1.4.0
>            Reporter: David Stendardi
>              Labels: hdfs, hdfssink
>
> Hello !
> We are running flume-ng with version cdh-4.5-1.4. When a datanode used by flume-ng goes done, we get the following exceptions :  
> {code}
> 30 Apr 2014 01:10:38,130 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated:96)  - Unexpected error while checking replication factor
> java.lang.reflect.InvocationTargetException
>         at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
>         at org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
>         at org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
>         at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
> {code}
> These exceptions are logged but not rethrown, and the AbstractHdfsSink::isUnderReplicated still returns false so the writer continue to try writing on the node.
> Here is how we configured our sink : 
> {code}
> # Describe persistence sink
> collector.sinks.hdfs.channel = hdfs
> collector.sinks.hdfs.type = hdfs
> collector.sinks.hdfs.hdfs.path = /flume-ng/%{env}/%{avro.fqn}/from_year=%Y/from_date=%Y-%m-%d
> collector.sinks.hdfs.hdfs.filePrefix = <%= @hostname %>-%H-%{avro.fp}
> collector.sinks.hdfs.hdfs.fileSuffix = .avro
> collector.sinks.hdfs.hdfs.rollInterval = 3605
> collector.sinks.hdfs.hdfs.rollSize = 0
> collector.sinks.hdfs.hdfs.rollCount = 0
> collector.sinks.hdfs.hdfs.batchSize = 1000
> collector.sinks.hdfs.hdfs.txnEventMax = 1000
> collector.sinks.hdfs.hdfs.callTimeout = 20000
> collector.sinks.hdfs.hdfs.fileType = DataStream
> collector.sinks.hdfs.serializer = com.viadeo.event.flume.serializer.AvroEventSerializer$Builder
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)