You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Rahul Ravindran <ra...@yahoo.com> on 2013/05/07 17:42:50 UTC

IOException with HDFS-Sink:flushOrSync

Hi,
   We have noticed this a few times now where we appear to have an IOException from HDFS and this stops draining the channel until the flume process is restarted. Below are the logs: namenode-v01-00b is the active namenode (namenode-v01-00a is standby). We are using Quorum Journal Manager for our Namenode HA, but there was no Namenode failover which was initiated. If this is an expected error, should flume handle it and gracefully retry (thereby not requiring a restart)?
Thanks,
~Rahul.

7 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
07 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;
  at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
  at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
  at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
  at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
  at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
  at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
  at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
07 May 2013 06:35:02,495 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;
  at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
  at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
  at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
  at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
  at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
  at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
  at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
07 May 2013 06:35:05,350 WARN  [hdfs-hdfs-sink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00b.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp) and rethrowing exception.
07 May 2013 06:35:05,351 WARN  [hdfs-hdfs-sink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp). Exception follows.
java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00b.a.com":8020;
  at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
  at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
  at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
  at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
  at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
  at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
  at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743)
  at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
07 May 2013 06:35:05,352 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00b.a.com":8020;
  at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
  at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
  at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
  at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
  at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
  at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
  at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743)
  at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
07 May 2013 06:35:07,497 WARN  [hdfs-hdfs-sink4-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
07 May 2013 06:35:07,497 WARN  [hdfs-hdfs-sink4-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;
  at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)

Re: IOException with HDFS-Sink:flushOrSync

Posted by Bernardo de Seabra <be...@brightroll.com>.
Accordingly to the CDH4.1.3 Release Notes the HADOOP-6762 issue was merged
into the release.

http://lists.cloudera.com/pipermail/cdh-announce/2013-February/000013.html


Bernardo


On Mon, May 13, 2013 at 8:02 PM, Rahul Ravindran <ra...@yahoo.com> wrote:

> Thanks Hari for your help in this. Appreciate it.
>
> We will work towards upgrading to CDH 4.2.1 soon, and hopefully, this
> issue is resolved.
>
> ~Rahul.
>
>   ------------------------------
>  *From:* Hari Shreedharan <hs...@cloudera.com>
> *To:* "user@flume.apache.org" <us...@flume.apache.org>
> *Sent:* Monday, May 13, 2013 7:58 PM
>
> *Subject:* Re: IOException with HDFS-Sink:flushOrSync
>
> The patch also made it to Hadoop 2.0.3.
>
> On Monday, May 13, 2013, Hari Shreedharan wrote:
>
>  Looks like CDH4.2.1 does have that patch:
> http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.2.1.CHANGES.txt (but
> it was not in CDH4.1.2)
>
>
> Hari
>
> --
> Hari Shreedharan
>
> On Monday, May 13, 2013 at 7:23 PM, Rahul Ravindran wrote:
>
> We are using cdh 4.1.2 - Hadoop version 2.0.0. Looks like cdh 4.2.1 also
> uses the same Hadoop version. Any suggestions on any mitigations?
>
> Sent from my phone.Excuse the terseness.
>
> On May 13, 2013, at 7:12 PM, Hari Shreedharan <hs...@cloudera.com>
> wrote:
>
>  What version of Hadoop are you using? Looks like you are getting hit by
> https://issues.apache.org/jira/browse/HADOOP-6762.
>
>
> Hari
>
> --
> Hari Shreedharan
>
> On Monday, May 13, 2013 at 6:50 PM, Matt Wise wrote:
>
> So we've just had this happen twice to two different flume machines...
> we're using the HDFS sink as well, but ours is writing to an S3N:// URL.
> Both times our sink stopped working and the filechannel clogged up
> immediately causing serious problems. A restart of Flume worked -- but the
> filechannel was so backed up at that point that it took a good long while
> to get Flume started up again properly.
>
> Anyone else seeing this behavior?
>
> (oh, and we're running flume 1.3.0)
> On May 7, 2013, at 8:42 AM, Rahul Ravindran <ra...@yahoo.com> wrote:
>
> Hi,
>    We have noticed this a few times now where we appear to have an
> IOException from HDFS and this stops draining the channel until the flume
> process is restarted. Below are the logs: namenode-v01-00b is the active
> namenode (namenode-v01-00a is standby). We are using Quorum Journal Manager
> for our Namenode HA, but there was no Namenode failover which was
> initiated. If this is an expected error, should flume handle it and
> gracefully retry (thereby not requiring a restart)?
> Thanks,
> ~Rahul.
>
> 7 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2]
> (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException
> writing to HDFSWriter (IOException flush:java.io.IOException: Failed on
> local exception: java.nio.channels.ClosedByInterruptException; Host Details
> : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host
> is: "namenode-v01-00a.a.com":8020; ). Closing file (
> hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp)
> and rethrowing exception.
> 07 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2]
> (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException
> while closing file (
> hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp).
> Exception follows.
> java.io.IOException: IOException flush:java.io.IOException: Failed on
> local exception: java.nio.channels.ClosedByInterruptException; Host Details
> : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host
> is: "namenode-v01-00a.a.com":8020;
>
>
>
>

Re: IOException with HDFS-Sink:flushOrSync

Posted by Rahul Ravindran <ra...@yahoo.com>.
Thanks Hari for your help in this. Appreciate it.

We will work towards upgrading to CDH 4.2.1 soon, and hopefully, this issue is resolved.

~Rahul.


________________________________
 From: Hari Shreedharan <hs...@cloudera.com>
To: "user@flume.apache.org" <us...@flume.apache.org> 
Sent: Monday, May 13, 2013 7:58 PM
Subject: Re: IOException with HDFS-Sink:flushOrSync
 


The patch also made it to Hadoop 2.0.3.

On Monday, May 13, 2013, Hari Shreedharan  wrote:

Looks like CDH4.2.1 does have that patch: http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.2.1.CHANGES.txt (but it was not in CDH4.1.2)
>
>
>
>
>Hari
>
>
>-- 
>Hari Shreedharan
>
>
>On Monday, May 13, 2013 at 7:23 PM, Rahul Ravindran wrote:
>We are using cdh 4.1.2 - Hadoop version 2.0.0. Looks like cdh 4.2.1 also uses the same Hadoop version. Any suggestions on any mitigations?
>>
>>Sent from my phone.Excuse the terseness.
>>
>>On May 13, 2013, at 7:12 PM, Hari Shreedharan <hs...@cloudera.com> wrote:
>>
>>
>>What version of Hadoop are you using? Looks like you are getting hit by https://issues.apache.org/jira/browse/HADOOP-6762. 
>>>
>>>
>>>
>>>
>>>Hari
>>>
>>>
>>>-- 
>>>Hari Shreedharan
>>>
>>>
>>>On Monday, May 13, 2013 at 6:50 PM, Matt Wise wrote:
>>>So we've just had this happen twice to two different flume machines... we're using the HDFS sink as well, but ours is writing to an S3N:// URL. Both times our sink stopped working and the filechannel clogged up immediately causing serious problems. A restart of Flume worked -- but the filechannel was so backed up at that point that it took a good long while to get Flume started up again properly.
>>>>
>>>>
>>>>Anyone else seeing this behavior?
>>>>
>>>>
>>>>(oh, and we're running flume 1.3.0)
>>>>
>>>>On May 7, 2013, at 8:42 AM, Rahul Ravindran <ra...@yahoo.com> wrote:
>>>>
>>>>Hi,
>>>>>   We have noticed this a few times now where we appear to have an IOException from HDFS and this stops draining the channel until the flume process is restarted. Below are the logs: namenode-v01-00b is the active namenode (namenode-v01-00a is standby). We are using Quorum Journal Manager for our Namenode HA, but there was no Namenode failover which was initiated. If this is an expected error, should flume handle it and gracefully retry (thereby not requiring a restart)?
>>>>>Thanks,
>>>>>~Rahul.
>>>>>
>>>>>
>>>>>7 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
>>>>>07 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
>>>>>java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;

Re: IOException with HDFS-Sink:flushOrSync

Posted by Hari Shreedharan <hs...@cloudera.com>.
The patch also made it to Hadoop 2.0.3.

On Monday, May 13, 2013, Hari Shreedharan wrote:

>  Looks like CDH4.2.1 does have that patch:
> http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.2.1.CHANGES.txt (but
> it was not in CDH4.1.2)
>
>
> Hari
>
> --
> Hari Shreedharan
>
> On Monday, May 13, 2013 at 7:23 PM, Rahul Ravindran wrote:
>
> We are using cdh 4.1.2 - Hadoop version 2.0.0. Looks like cdh 4.2.1 also
> uses the same Hadoop version. Any suggestions on any mitigations?
>
> Sent from my phone.Excuse the terseness.
>
> On May 13, 2013, at 7:12 PM, Hari Shreedharan <hs...@cloudera.com>
> wrote:
>
>  What version of Hadoop are you using? Looks like you are getting hit by
> https://issues.apache.org/jira/browse/HADOOP-6762.
>
>
> Hari
>
> --
> Hari Shreedharan
>
> On Monday, May 13, 2013 at 6:50 PM, Matt Wise wrote:
>
> So we've just had this happen twice to two different flume machines...
> we're using the HDFS sink as well, but ours is writing to an S3N:// URL.
> Both times our sink stopped working and the filechannel clogged up
> immediately causing serious problems. A restart of Flume worked -- but the
> filechannel was so backed up at that point that it took a good long while
> to get Flume started up again properly.
>
> Anyone else seeing this behavior?
>
> (oh, and we're running flume 1.3.0)
> On May 7, 2013, at 8:42 AM, Rahul Ravindran <ra...@yahoo.com> wrote:
>
> Hi,
>    We have noticed this a few times now where we appear to have an
> IOException from HDFS and this stops draining the channel until the flume
> process is restarted. Below are the logs: namenode-v01-00b is the active
> namenode (namenode-v01-00a is standby). We are using Quorum Journal Manager
> for our Namenode HA, but there was no Namenode failover which was
> initiated. If this is an expected error, should flume handle it and
> gracefully retry (thereby not requiring a restart)?
> Thanks,
> ~Rahul.
>
> 7 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2]
> (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException
> writing to HDFSWriter (IOException flush:java.io.IOException: Failed on
> local exception: java.nio.channels.ClosedByInterruptException; Host Details
> : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host
> is: "namenode-v01-00a.a.com":8020; ). Closing file (
> hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp)
> and rethrowing exception.
> 07 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2]
> (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException
> while closing file (
> hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp).
> Exception follows.
> java.io.IOException: IOException flush:java.io.IOException: Failed on
> local exception: java.nio.channels.ClosedByInterruptException; Host Details
> : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host
> is: "namenode-v01-00a.a.com":8020;
>
>

Re: IOException with HDFS-Sink:flushOrSync

Posted by Hari Shreedharan <hs...@cloudera.com>.
Looks like CDH4.2.1 does have that patch: http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.2.1.CHANGES.txt (but it was not in CDH4.1.2)


Hari 

-- 
Hari Shreedharan


On Monday, May 13, 2013 at 7:23 PM, Rahul Ravindran wrote:

> We are using cdh 4.1.2 - Hadoop version 2.0.0. Looks like cdh 4.2.1 also uses the same Hadoop version. Any suggestions on any mitigations?
> 
> Sent from my phone.Excuse the terseness.
> 
> On May 13, 2013, at 7:12 PM, Hari Shreedharan <hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)> wrote:
> 
> > What version of Hadoop are you using? Looks like you are getting hit by https://issues.apache.org/jira/browse/HADOOP-6762. 
> > 
> > 
> > Hari 
> > 
> > -- 
> > Hari Shreedharan
> > 
> > 
> > On Monday, May 13, 2013 at 6:50 PM, Matt Wise wrote:
> > 
> > > So we've just had this happen twice to two different flume machines... we're using the HDFS sink as well, but ours is writing to an S3N:// URL. Both times our sink stopped working and the filechannel clogged up immediately causing serious problems. A restart of Flume worked -- but the filechannel was so backed up at that point that it took a good long while to get Flume started up again properly.
> > > 
> > > Anyone else seeing this behavior?
> > > 
> > > (oh, and we're running flume 1.3.0)
> > > On May 7, 2013, at 8:42 AM, Rahul Ravindran <rahulrv@yahoo.com (mailto:rahulrv@yahoo.com)> wrote:
> > > > Hi,
> > > >    We have noticed this a few times now where we appear to have an IOException from HDFS and this stops draining the channel until the flume process is restarted. Below are the logs: namenode-v01-00b is the active namenode (namenode-v01-00a is standby). We are using Quorum Journal Manager for our Namenode HA, but there was no Namenode failover which was initiated. If this is an expected error, should flume handle it and gracefully retry (thereby not requiring a restart)?
> > > > Thanks,
> > > > ~Rahul.
> > > > 
> > > > 7 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
> > > > 07 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
> > > > java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020;
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
> > > >   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
> > > >   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
> > > >   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
> > > >   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
> > > >   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > >   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > >   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > >   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >   at java.lang.Thread.run(Thread.java:662)
> > > > 07 May 2013 06:35:02,495 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
> > > > java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020;
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
> > > >   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
> > > >   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
> > > >   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
> > > >   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
> > > >   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > >   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > >   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > >   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >   at java.lang.Thread.run(Thread.java:662)
> > > > 07 May 2013 06:35:05,350 WARN  [hdfs-hdfs-sink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00b.a.com (http://namenode-v01-00b.a.com)":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp) and rethrowing exception.
> > > > 07 May 2013 06:35:05,351 WARN  [hdfs-hdfs-sink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp). Exception follows.
> > > > java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00b.a.com (http://namenode-v01-00b.a.com)":8020;
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
> > > >   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
> > > >   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
> > > >   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743)
> > > >   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741)
> > > >   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > >   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > >   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > >   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >   at java.lang.Thread.run(Thread.java:662)
> > > > 07 May 2013 06:35:05,352 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
> > > > java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00b.a.com (http://namenode-v01-00b.a.com)":8020;
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
> > > >   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
> > > >   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
> > > >   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
> > > >   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743)
> > > >   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741)
> > > >   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > > >   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > > >   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > >   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >   at java.lang.Thread.run(Thread.java:662)
> > > > 07 May 2013 06:35:07,497 WARN  [hdfs-hdfs-sink4-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
> > > > 07 May 2013 06:35:07,497 WARN  [hdfs-hdfs-sink4-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
> > > > java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020;
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
> > > >   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > 
> > 


Re: IOException with HDFS-Sink:flushOrSync

Posted by Rahul Ravindran <ra...@yahoo.com>.
We are using cdh 4.1.2 - Hadoop version 2.0.0. Looks like cdh 4.2.1 also uses the same Hadoop version. Any suggestions on any mitigations?

Sent from my phone.Excuse the terseness.

On May 13, 2013, at 7:12 PM, Hari Shreedharan <hs...@cloudera.com> wrote:

> What version of Hadoop are you using? Looks like you are getting hit by https://issues.apache.org/jira/browse/HADOOP-6762.
> 
> 
> Hari
> 
> -- 
> Hari Shreedharan
> 
> On Monday, May 13, 2013 at 6:50 PM, Matt Wise wrote:
> 
>> So we've just had this happen twice to two different flume machines... we're using the HDFS sink as well, but ours is writing to an S3N:// URL. Both times our sink stopped working and the filechannel clogged up immediately causing serious problems. A restart of Flume worked -- but the filechannel was so backed up at that point that it took a good long while to get Flume started up again properly.
>> 
>> Anyone else seeing this behavior?
>> 
>> (oh, and we're running flume 1.3.0)
>> On May 7, 2013, at 8:42 AM, Rahul Ravindran <ra...@yahoo.com> wrote:
>> 
>>> Hi,
>>>    We have noticed this a few times now where we appear to have an IOException from HDFS and this stops draining the channel until the flume process is restarted. Below are the logs: namenode-v01-00b is the active namenode (namenode-v01-00a is standby). We are using Quorum Journal Manager for our Namenode HA, but there was no Namenode failover which was initiated. If this is an expected error, should flume handle it and gracefully retry (thereby not requiring a restart)?
>>> Thanks,
>>> ~Rahul.
>>> 
>>> 7 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
>>> 07 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
>>> java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
>>>   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
>>>   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
>>>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
>>>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
>>>   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
>>>   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
>>>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>   at java.lang.Thread.run(Thread.java:662)
>>> 07 May 2013 06:35:02,495 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
>>> java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
>>>   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
>>>   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
>>>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
>>>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
>>>   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
>>>   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
>>>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>   at java.lang.Thread.run(Thread.java:662)
>>> 07 May 2013 06:35:05,350 WARN  [hdfs-hdfs-sink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00b.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp) and rethrowing exception.
>>> 07 May 2013 06:35:05,351 WARN  [hdfs-hdfs-sink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp). Exception follows.
>>> java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00b.a.com":8020;
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
>>>   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
>>>   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
>>>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
>>>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
>>>   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743)
>>>   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741)
>>>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>   at java.lang.Thread.run(Thread.java:662)
>>> 07 May 2013 06:35:05,352 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
>>> java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00b.a.com":8020;
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
>>>   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
>>>   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
>>>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
>>>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
>>>   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
>>>   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743)
>>>   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741)
>>>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>   at java.lang.Thread.run(Thread.java:662)
>>> 07 May 2013 06:35:07,497 WARN  [hdfs-hdfs-sink4-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
>>> 07 May 2013 06:35:07,497 WARN  [hdfs-hdfs-sink4-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
>>> java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
>>>   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
> 

Re: IOException with HDFS-Sink:flushOrSync

Posted by Hari Shreedharan <hs...@cloudera.com>.
What version of Hadoop are you using? Looks like you are getting hit by https://issues.apache.org/jira/browse/HADOOP-6762. 


Hari 

-- 
Hari Shreedharan


On Monday, May 13, 2013 at 6:50 PM, Matt Wise wrote:

> So we've just had this happen twice to two different flume machines... we're using the HDFS sink as well, but ours is writing to an S3N:// URL. Both times our sink stopped working and the filechannel clogged up immediately causing serious problems. A restart of Flume worked -- but the filechannel was so backed up at that point that it took a good long while to get Flume started up again properly.
> 
> Anyone else seeing this behavior?
> 
> (oh, and we're running flume 1.3.0)
> On May 7, 2013, at 8:42 AM, Rahul Ravindran <rahulrv@yahoo.com (mailto:rahulrv@yahoo.com)> wrote:
> > Hi,
> >    We have noticed this a few times now where we appear to have an IOException from HDFS and this stops draining the channel until the flume process is restarted. Below are the logs: namenode-v01-00b is the active namenode (namenode-v01-00a is standby). We are using Quorum Journal Manager for our Namenode HA, but there was no Namenode failover which was initiated. If this is an expected error, should flume handle it and gracefully retry (thereby not requiring a restart)?
> > Thanks,
> > ~Rahul.
> > 
> > 7 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
> > 07 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
> > java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020;
> >   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
> >   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
> >   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
> >   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
> >   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
> >   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
> >   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
> >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
> >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
> >   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
> >   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
> >   at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
> >   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
> >   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
> >   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >   at java.lang.Thread.run(Thread.java:662)
> > 07 May 2013 06:35:02,495 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
> > java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020;
> >   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
> >   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
> >   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
> >   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
> >   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
> >   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
> >   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
> >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
> >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
> >   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
> >   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
> >   at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
> >   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
> >   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
> >   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >   at java.lang.Thread.run(Thread.java:662)
> > 07 May 2013 06:35:05,350 WARN  [hdfs-hdfs-sink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00b.a.com (http://namenode-v01-00b.a.com)":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp) and rethrowing exception.
> > 07 May 2013 06:35:05,351 WARN  [hdfs-hdfs-sink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp). Exception follows.
> > java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00b.a.com (http://namenode-v01-00b.a.com)":8020;
> >   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
> >   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
> >   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
> >   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
> >   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
> >   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
> >   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
> >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
> >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
> >   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
> >   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
> >   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743)
> >   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741)
> >   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >   at java.lang.Thread.run(Thread.java:662)
> > 07 May 2013 06:35:05,352 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
> > java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00b.a.com (http://namenode-v01-00b.a.com)":8020;
> >   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
> >   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
> >   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
> >   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
> >   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
> >   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
> >   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
> >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
> >   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
> >   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
> >   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
> >   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743)
> >   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741)
> >   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >   at java.lang.Thread.run(Thread.java:662)
> > 07 May 2013 06:35:07,497 WARN  [hdfs-hdfs-sink4-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
> > 07 May 2013 06:35:07,497 WARN  [hdfs-hdfs-sink4-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
> > java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170 (http://flumefs-v01-10a.a.com/10.40.85.170)"; destination host is: "namenode-v01-00a.a.com (http://namenode-v01-00a.a.com)":8020;
> >   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
> >   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
> > 
> > 
> > 
> 
> 
> 


Re: IOException with HDFS-Sink:flushOrSync

Posted by Matt Wise <ma...@nextdoor.com>.
So we've just had this happen twice to two different flume machines... we're using the HDFS sink as well, but ours is writing to an S3N:// URL. Both times our sink stopped working and the filechannel clogged up immediately causing serious problems. A restart of Flume worked -- but the filechannel was so backed up at that point that it took a good long while to get Flume started up again properly.

Anyone else seeing this behavior?

(oh, and we're running flume 1.3.0)
On May 7, 2013, at 8:42 AM, Rahul Ravindran <ra...@yahoo.com> wrote:

> Hi,
>    We have noticed this a few times now where we appear to have an IOException from HDFS and this stops draining the channel until the flume process is restarted. Below are the logs: namenode-v01-00b is the active namenode (namenode-v01-00a is standby). We are using Quorum Journal Manager for our Namenode HA, but there was no Namenode failover which was initiated. If this is an expected error, should flume handle it and gracefully retry (thereby not requiring a restart)?
> Thanks,
> ~Rahul.
> 
> 7 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
> 07 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
> java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;
>   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
>   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
>   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
>   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
>   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
>   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
>   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
>   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
>   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
>   at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
>   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
>   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> 07 May 2013 06:35:02,495 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
> java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;
>   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
>   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
>   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
>   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
>   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
>   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
>   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
>   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
>   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
>   at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
>   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
>   at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> 07 May 2013 06:35:05,350 WARN  [hdfs-hdfs-sink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00b.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp) and rethrowing exception.
> 07 May 2013 06:35:05,351 WARN  [hdfs-hdfs-sink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp). Exception follows.
> java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00b.a.com":8020;
>   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
>   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
>   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
>   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
>   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
>   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
>   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
>   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
>   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
>   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743)
>   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> 07 May 2013 06:35:05,352 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
> java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00b.a.com":8020;
>   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
>   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
>   at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
>   at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
>   at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
>   at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
>   at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
>   at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
>   at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
>   at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
>   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743)
>   at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> 07 May 2013 06:35:07,497 WARN  [hdfs-hdfs-sink4-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
> 07 May 2013 06:35:07,497 WARN  [hdfs-hdfs-sink4-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
> java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;
>   at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
>   at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)


Re: IOException with HDFS-Sink:flushOrSync

Posted by Rahul Ravindran <ra...@yahoo.com>.
Pinging again since this has been happening a lot more frequently recently


________________________________
 From: Rahul Ravindran <ra...@yahoo.com>
To: User-flume <us...@flume.apache.org> 
Sent: Tuesday, May 7, 2013 8:42 AM
Subject: IOException with HDFS-Sink:flushOrSync
 


Hi,
   We have noticed this a few times now where we appear to have an IOException from HDFS and this stops draining the channel until the flume process is restarted. Below are the logs: namenode-v01-00b is the active namenode (namenode-v01-00a is standby). We are using Quorum Journal Manager for our Namenode HA, but there was no Namenode failover which was initiated. If this is an expected error, should flume handle it and gracefully retry (thereby not requiring a restart)?
Thanks,
~Rahul.

7 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
07 May 2013 06:35:02,494 WARN  [hdfs-hdfs-sink4-call-runner-2] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;
  at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
  at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
  at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
  at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
  at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
  at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
  at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
07 May 2013 06:35:02,495 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;
  at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
  at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
  at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
  at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
  at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
  at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
  at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:396)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
  at org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
07 May 2013 06:35:05,350 WARN  [hdfs-hdfs-sink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00b.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp) and rethrowing exception.
07 May 2013 06:35:05,351 WARN  [hdfs-hdfs-sink1-call-runner-5] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-1//event.1367891734999.tmp). Exception follows.
java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00b.a.com":8020;
  at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
  at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
  at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
  at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
  at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
  at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
  at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743)
  at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
07 May 2013 06:35:05,352 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00b.a.com":8020;
  at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)
  at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1484)
  at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
  at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
  at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:345)
  at org.apache.flume.sink.hdfs.BucketWriter.access$500(BucketWriter.java:53)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:310)
  at org.apache.flume.sink.hdfs.BucketWriter$4.run(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
  at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:308)
  at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:743)
  at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:741)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
07 May 2013 06:35:07,497 WARN  [hdfs-hdfs-sink4-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter.append:378)  - Caught IOException writing to HDFSWriter (IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020; ). Closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) and rethrowing exception.
07 May 2013 06:35:07,497 WARN  [hdfs-hdfs-sink4-call-runner-8] (org.apache.flume.sink.hdfs.BucketWriter.append:384)  - Caught IOException while closing file (hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). Exception follows.
java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination host is: "namenode-v01-00a.a.com":8020;
  at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1617)
  at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1499)