You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Andrew O'Neill <ao...@paytronix.com> on 2014/08/26 18:33:13 UTC
Flume 1.4 HDFS Sink Cannot Reconnect
Hello all,
My setup:
- Flume 1.4
- CDH 4.2.2 (2.0.0-cdh4.2.2)
I am testing a simple flume setup with a Sequence Generator Source, a File Channel, and an HDFS Sink (see my flume.conf below). This configuration works as expected until I reboot the cluster’s NameNode or until I restart the HDFS service on the cluster. At this point, it appears that the Flume Agent cannot reconnect to HDFS and must be manually restarted. Since this is not an uncommon occurrence in our production cluster, it is important that Flume is able to reconnect gracefully without any manual intervention.
So, how do we fix this HDFS reconnection issue?
Here is our flume.conf:
appserver.sources = rawtext
appserver.channels = testchannel
appserver.sinks = test_sink
appserver.sources.rawtext.type = seq
appserver.sources.rawtext.channels = testchannel
appserver.channels.testchannel.type = file
appserver.channels.testchannel.capacity = 10000000
appserver.channels.testchannel.minimumRequiredSpace = 214748364800
appserver.channels.testchannel.checkpointDir = /Users/aoneill/Desktop/testchannel/checkpoint
appserver.channels.testchannel.dataDirs = /Users/aoneill/Desktop/testchannel/data
appserver.channels.testchannel.maxFileSize = 20000000
appserver.sinks.test_sink.type = hdfs
appserver.sinks.test_sink.channel = testchannel
appserver.sinks.test_sink.hdfs.path = hdfs://cluster01:8020/user/aoneill/flumetest
appserver.sinks.test_sink.hdfs.closeTries = 3
appserver.sinks.test_sink.hdfs.filePrefix = events-
appserver.sinks.test_sink.hdfs.fileSuffix = .avro
appserver.sinks.test_sink.hdfs.fileType = DataStream
appserver.sinks.test_sink.hdfs.writeFormat = Text
appserver.sinks.test_sink.hdfs.inUsePrefix = inuse-
appserver.sinks.test_sink.hdfs.inUseSuffix = .avro
appserver.sinks.test_sink.hdfs.rollCount = 100000
appserver.sinks.test_sink.hdfs.rollInterval = 30
appserver.sinks.test_sink.hdfs.rollSize = 10485760
These are the two error message that the Flume Agent outputs constantly after the restart:
2014-08-26 10:47:24,572 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:96)] Unexpected error while checking replication factor
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
at org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
at org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
and
2014-08-26 10:47:29,592 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)] HDFS IO error
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
I can provide additional information if needed. Thank you very much for any insight you are able to provide into this problem.
Best,
Andrew
Re: Flume 1.4 HDFS Sink Cannot Reconnect
Posted by Andrew O'Neill <ao...@paytronix.com>.
Hello,
Did anyone have a chance to look at this issue?
Thanks,
[cid:15AE4044-2A22-4E71-959A-C79917667E52]<http://www.paytronix.com/>
Andrew O'Neill | Paytronix
74 Bridge Street, Suite 400
Newton, MA 02458
p. 617.649.3300 x256
[cid:4AEC1A82-7442-4A90-A364-8908C948D63F]<http://www.facebook.com/paytronix> [cid:61B3BE7E-F1B1-431E-A761-65A67DF26D25] <http://www.twitter.com/paytronix> [cid:2BE733E2-B0C2-4AB3-9341-26B230DFEB80] <http://www.linkedin.com/company/paytronix-systems?trk=fc_badge>
From: Andrew Neil <ao...@paytronix.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: Tuesday, August 26, 2014 at 16:35 PM
To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Subject: Re: Flume 1.4 HDFS Sink Cannot Reconnect
Per Roshan’s request, I have filed a bug for this issue. For those interested, here is the link to the issue:
https://issues.apache.org/jira/browse/FLUME-2451
Hopefully this will create some visibility on this problem.
Thanks,
Andrew
From: Roshan Naik <ro...@hortonworks.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: Tuesday, August 26, 2014 at 16:11 PM
To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Subject: Re: Flume 1.4 HDFS Sink Cannot Reconnect
Please file a bug for this with the details provided in your email.
On Tue, Aug 26, 2014 at 9:44 AM, Gary Malouf <ma...@gmail.com>> wrote:
+1 I've seen this same issue.
On Tue, Aug 26, 2014 at 12:33 PM, Andrew O'Neill <ao...@paytronix.com>> wrote:
Hello all,
My setup:
- Flume 1.4
- CDH 4.2.2 (2.0.0-cdh4.2.2)
I am testing a simple flume setup with a Sequence Generator Source, a File Channel, and an HDFS Sink (see my flume.conf below). This configuration works as expected until I reboot the cluster’s NameNode or until I restart the HDFS service on the cluster. At this point, it appears that the Flume Agent cannot reconnect to HDFS and must be manually restarted. Since this is not an uncommon occurrence in our production cluster, it is important that Flume is able to reconnect gracefully without any manual intervention.
So, how do we fix this HDFS reconnection issue?
Here is our flume.conf:
appserver.sources = rawtext
appserver.channels = testchannel
appserver.sinks = test_sink
appserver.sources.rawtext.type = seq
appserver.sources.rawtext.channels = testchannel
appserver.channels.testchannel.type = file
appserver.channels.testchannel.capacity = 10000000
appserver.channels.testchannel.minimumRequiredSpace = 214748364800
appserver.channels.testchannel.checkpointDir = /Users/aoneill/Desktop/testchannel/checkpoint
appserver.channels.testchannel.dataDirs = /Users/aoneill/Desktop/testchannel/data
appserver.channels.testchannel.maxFileSize = 20000000
appserver.sinks.test_sink.type = hdfs
appserver.sinks.test_sink.channel = testchannel
appserver.sinks.test_sink.hdfs.path = hdfs://cluster01:8020/user/aoneill/flumetest
appserver.sinks.test_sink.hdfs.closeTries = 3
appserver.sinks.test_sink.hdfs.filePrefix = events-
appserver.sinks.test_sink.hdfs.fileSuffix = .avro
appserver.sinks.test_sink.hdfs.fileType = DataStream
appserver.sinks.test_sink.hdfs.writeFormat = Text
appserver.sinks.test_sink.hdfs.inUsePrefix = inuse-
appserver.sinks.test_sink.hdfs.inUseSuffix = .avro
appserver.sinks.test_sink.hdfs.rollCount = 100000
appserver.sinks.test_sink.hdfs.rollInterval = 30
appserver.sinks.test_sink.hdfs.rollSize = 10485760
These are the two error message that the Flume Agent outputs constantly after the restart:
2014-08-26 10:47:24,572 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:96)] Unexpected error while checking replication factor
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
at org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
at org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
and
2014-08-26 10:47:29,592 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)] HDFS IO error
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
I can provide additional information if needed. Thank you very much for any insight you are able to provide into this problem.
Best,
Andrew
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Flume 1.4 HDFS Sink Cannot Reconnect
Posted by Andrew O'Neill <ao...@paytronix.com>.
Per Roshan’s request, I have filed a bug for this issue. For those interested, here is the link to the issue:
https://issues.apache.org/jira/browse/FLUME-2451
Hopefully this will create some visibility on this problem.
Thanks,
Andrew
From: Roshan Naik <ro...@hortonworks.com>>
Reply-To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Date: Tuesday, August 26, 2014 at 16:11 PM
To: "user@flume.apache.org<ma...@flume.apache.org>" <us...@flume.apache.org>>
Subject: Re: Flume 1.4 HDFS Sink Cannot Reconnect
Please file a bug for this with the details provided in your email.
On Tue, Aug 26, 2014 at 9:44 AM, Gary Malouf <ma...@gmail.com>> wrote:
+1 I've seen this same issue.
On Tue, Aug 26, 2014 at 12:33 PM, Andrew O'Neill <ao...@paytronix.com>> wrote:
Hello all,
My setup:
- Flume 1.4
- CDH 4.2.2 (2.0.0-cdh4.2.2)
I am testing a simple flume setup with a Sequence Generator Source, a File Channel, and an HDFS Sink (see my flume.conf below). This configuration works as expected until I reboot the cluster’s NameNode or until I restart the HDFS service on the cluster. At this point, it appears that the Flume Agent cannot reconnect to HDFS and must be manually restarted. Since this is not an uncommon occurrence in our production cluster, it is important that Flume is able to reconnect gracefully without any manual intervention.
So, how do we fix this HDFS reconnection issue?
Here is our flume.conf:
appserver.sources = rawtext
appserver.channels = testchannel
appserver.sinks = test_sink
appserver.sources.rawtext.type = seq
appserver.sources.rawtext.channels = testchannel
appserver.channels.testchannel.type = file
appserver.channels.testchannel.capacity = 10000000
appserver.channels.testchannel.minimumRequiredSpace = 214748364800
appserver.channels.testchannel.checkpointDir = /Users/aoneill/Desktop/testchannel/checkpoint
appserver.channels.testchannel.dataDirs = /Users/aoneill/Desktop/testchannel/data
appserver.channels.testchannel.maxFileSize = 20000000
appserver.sinks.test_sink.type = hdfs
appserver.sinks.test_sink.channel = testchannel
appserver.sinks.test_sink.hdfs.path = hdfs://cluster01:8020/user/aoneill/flumetest
appserver.sinks.test_sink.hdfs.closeTries = 3
appserver.sinks.test_sink.hdfs.filePrefix = events-
appserver.sinks.test_sink.hdfs.fileSuffix = .avro
appserver.sinks.test_sink.hdfs.fileType = DataStream
appserver.sinks.test_sink.hdfs.writeFormat = Text
appserver.sinks.test_sink.hdfs.inUsePrefix = inuse-
appserver.sinks.test_sink.hdfs.inUseSuffix = .avro
appserver.sinks.test_sink.hdfs.rollCount = 100000
appserver.sinks.test_sink.hdfs.rollInterval = 30
appserver.sinks.test_sink.hdfs.rollSize = 10485760
These are the two error message that the Flume Agent outputs constantly after the restart:
2014-08-26 10:47:24,572 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:96)] Unexpected error while checking replication factor
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
at org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
at org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
and
2014-08-26 10:47:29,592 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)] HDFS IO error
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
I can provide additional information if needed. Thank you very much for any insight you are able to provide into this problem.
Best,
Andrew
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Flume 1.4 HDFS Sink Cannot Reconnect
Posted by Roshan Naik <ro...@hortonworks.com>.
Please file a bug for this with the details provided in your email.
On Tue, Aug 26, 2014 at 9:44 AM, Gary Malouf <ma...@gmail.com> wrote:
> +1 I've seen this same issue.
>
>
> On Tue, Aug 26, 2014 at 12:33 PM, Andrew O'Neill <ao...@paytronix.com>
> wrote:
>
>> Hello all,
>>
>> My setup:
>> - Flume 1.4
>> - CDH 4.2.2 (2.0.0-cdh4.2.2)
>>
>>
>> I am testing a simple flume setup with a Sequence Generator Source, a
>> File Channel, and an HDFS Sink (see my flume.conf below). This
>> configuration works as expected until I reboot the cluster’s NameNode or
>> until I restart the HDFS service on the cluster. At this point, it appears
>> that the Flume Agent cannot reconnect to HDFS and must be manually
>> restarted. Since this is not an uncommon occurrence in our production
>> cluster, it is important that Flume is able to reconnect gracefully without
>> any manual intervention.
>>
>> So, how do we fix this HDFS reconnection issue?
>>
>>
>> Here is our flume.conf:
>>
>> appserver.sources = rawtext
>> appserver.channels = testchannel
>> appserver.sinks = test_sink
>>
>> appserver.sources.rawtext.type = seq
>> appserver.sources.rawtext.channels = testchannel
>>
>> appserver.channels.testchannel.type = file
>> appserver.channels.testchannel.capacity = 10000000
>> appserver.channels.testchannel.minimumRequiredSpace = 214748364800
>> appserver.channels.testchannel.checkpointDir =
>> /Users/aoneill/Desktop/testchannel/checkpoint
>> appserver.channels.testchannel.dataDirs =
>> /Users/aoneill/Desktop/testchannel/data
>> appserver.channels.testchannel.maxFileSize = 20000000
>>
>> appserver.sinks.test_sink.type = hdfs
>> appserver.sinks.test_sink.channel = testchannel
>> appserver.sinks.test_sink.hdfs.path =
>> hdfs://cluster01:8020/user/aoneill/flumetest
>> appserver.sinks.test_sink.hdfs.closeTries = 3
>> appserver.sinks.test_sink.hdfs.filePrefix = events-
>> appserver.sinks.test_sink.hdfs.fileSuffix = .avro
>> appserver.sinks.test_sink.hdfs.fileType = DataStream
>> appserver.sinks.test_sink.hdfs.writeFormat = Text
>> appserver.sinks.test_sink.hdfs.inUsePrefix = inuse-
>> appserver.sinks.test_sink.hdfs.inUseSuffix = .avro
>> appserver.sinks.test_sink.hdfs.rollCount = 100000
>> appserver.sinks.test_sink.hdfs.rollInterval = 30
>> appserver.sinks.test_sink.hdfs.rollSize = 10485760
>>
>>
>> These are the two error message that the Flume Agent outputs constantly
>> after the restart:
>>
>> 2014-08-26 10:47:24,572
>> (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
>> org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:96)]
>> Unexpected error while checking replication factor
>> java.lang.reflect.InvocationTargetException
>> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
>> at
>> org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
>> at
>> org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
>> at
>> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
>> at
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
>> at
>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>> at
>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>> at java.lang.Thread.run(Thread.java:744)
>> Caused by: java.net.ConnectException: Connection refused
>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
>> at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
>>
>> and
>>
>> 2014-08-26 10:47:29,592
>> (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN -
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)]
>> HDFS IO error
>> java.net.ConnectException: Connection refused
>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
>> at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
>>
>>
>> I can provide additional information if needed. Thank you very much for
>> any insight you are able to provide into this problem.
>>
>>
>> Best,
>>
>> Andrew
>>
>
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Re: Flume 1.4 HDFS Sink Cannot Reconnect
Posted by Gary Malouf <ma...@gmail.com>.
+1 I've seen this same issue.
On Tue, Aug 26, 2014 at 12:33 PM, Andrew O'Neill <ao...@paytronix.com>
wrote:
> Hello all,
>
> My setup:
> - Flume 1.4
> - CDH 4.2.2 (2.0.0-cdh4.2.2)
>
>
> I am testing a simple flume setup with a Sequence Generator Source, a File
> Channel, and an HDFS Sink (see my flume.conf below). This configuration
> works as expected until I reboot the cluster’s NameNode or until I restart
> the HDFS service on the cluster. At this point, it appears that the Flume
> Agent cannot reconnect to HDFS and must be manually restarted. Since this
> is not an uncommon occurrence in our production cluster, it is important
> that Flume is able to reconnect gracefully without any manual intervention.
>
> So, how do we fix this HDFS reconnection issue?
>
>
> Here is our flume.conf:
>
> appserver.sources = rawtext
> appserver.channels = testchannel
> appserver.sinks = test_sink
>
> appserver.sources.rawtext.type = seq
> appserver.sources.rawtext.channels = testchannel
>
> appserver.channels.testchannel.type = file
> appserver.channels.testchannel.capacity = 10000000
> appserver.channels.testchannel.minimumRequiredSpace = 214748364800
> appserver.channels.testchannel.checkpointDir =
> /Users/aoneill/Desktop/testchannel/checkpoint
> appserver.channels.testchannel.dataDirs =
> /Users/aoneill/Desktop/testchannel/data
> appserver.channels.testchannel.maxFileSize = 20000000
>
> appserver.sinks.test_sink.type = hdfs
> appserver.sinks.test_sink.channel = testchannel
> appserver.sinks.test_sink.hdfs.path =
> hdfs://cluster01:8020/user/aoneill/flumetest
> appserver.sinks.test_sink.hdfs.closeTries = 3
> appserver.sinks.test_sink.hdfs.filePrefix = events-
> appserver.sinks.test_sink.hdfs.fileSuffix = .avro
> appserver.sinks.test_sink.hdfs.fileType = DataStream
> appserver.sinks.test_sink.hdfs.writeFormat = Text
> appserver.sinks.test_sink.hdfs.inUsePrefix = inuse-
> appserver.sinks.test_sink.hdfs.inUseSuffix = .avro
> appserver.sinks.test_sink.hdfs.rollCount = 100000
> appserver.sinks.test_sink.hdfs.rollInterval = 30
> appserver.sinks.test_sink.hdfs.rollSize = 10485760
>
>
> These are the two error message that the Flume Agent outputs constantly
> after the restart:
>
> 2014-08-26 10:47:24,572
> (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
> org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:96)]
> Unexpected error while checking replication factor
> java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
> at
> org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
> at
> org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
> at
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
> at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
>
> and
>
> 2014-08-26 10:47:29,592
> (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN -
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)]
> HDFS IO error
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
>
>
> I can provide additional information if needed. Thank you very much for
> any insight you are able to provide into this problem.
>
>
> Best,
>
> Andrew
>