You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Abraham Fine <ab...@brightroll.com> on 2014/03/25 19:43:44 UTC

HDFS IO Error

Hello-

We have Flume agents running 1.4.0 that sink to HDFS (version 
2.0.0-cdh4.2.1).

Exceptions start occurring at the same time across our Flume agentswhen 
a datanode in HDFS goes down. We did not have this issue whilerunning 
Flume 1.3.

We noticed a similar issue posted on the mailing list 
herehttp://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E 
<http://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E>and 
on JIRAhttps://issues.apache.org/jira/browse/FLUME-2261 
<https://issues.apache.org/jira/browse/FLUME-2261>but couldnot find a 
solution.

We have noticed the following in the Flume logs:

WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO
error
java.io.IOException: Callable timed out after 20000 ms on file <FILEPATH> :
         at 
org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:550)
         at 
org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:353)
         at 
org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:319)
         at 
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:405)
         at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
         at 
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
         at java.lang.Thread.run(Thread.java:662)
Caused by: java.util.concurrent.TimeoutException
         at 
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
         at 
org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:543)
         ... 6 more

This is usually followed by:

WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO
error
java.io.IOException: This bucket writer was closed due to idling and
this handle is thus no longer valid
         at 
org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:380)
         at 
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
         at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
         at 
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
         at java.lang.Thread.run(Thread.java:662)

When these exceptions occur, the HDFS sink does not close files. Weoften 
end up with multi-gigabyte files in HDFS.

Our sink configuration:

agentX.sinks.hdfs-sinkX-1.channel = chX
agentX.sinks.hdfs-sinkX-1.type = hdfs
agentX.sinks.hdfs-sinkX-1.hdfs.path = <FILEPATH>
agentX.sinks.hdfs-sinkX-1.hdfs.filePrefix = event
agentX.sinks.hdfs-sinkX-1.hdfs.writeFormat = Text
agentX.sinks.hdfs-sinkX-1.hdfs.rollInterval = 120
agentX.sinks.hdfs-sinkX-1.hdfs.idleTimeout= 180
agentX.sinks.hdfs-sinkX-1.hdfs.rollCount = 0
agentX.sinks.hdfs-sinkX-1.hdfs.rollSize = 0
agentX.sinks.hdfs-sinkX-1.hdfs.fileType = DataStream
agentX.sinks.hdfs-sinkX-1.hdfs.batchSize = 24000
agentX.sinks.hdfs-sinkX-1.hdfs.txnEventSize = 24000
agentX.sinks.hdfs-sinkX-1.hdfs.callTimeout = 20000
agentX.sinks.hdfs-sinkX-1.hdfs.threadsPoolSize = 1


The file paths are unique to each sink.

Thank you for your help.

--
Abraham Fine | Software Engineer
BrightRoll, Inc. | Smart Video Advertising |www.brightroll.com 
<http://www.brightroll.com/>

Re: HDFS IO Error

Posted by Abraham Fine <ab...@brightroll.com>.

Hello-

We have been able to resolve the "multi-gigabyte file" issue with the 
patch here: https://issues.apache.org/jira/browse/FLUME-1654 Although, 
this seems more like a temporary solution and we still see the 
exceptions in the logs (when flush is called on the BucketWriter).

Thanks Again,

--
Abraham Fine | Software Engineer
BrightRoll, Inc. | Smart Video Advertising |www.brightroll.com

> Bean Edwards <ma...@gmail.com>
> March 27, 2014 at 1:39 AM
> it's probably because high throughput lead slave IO ceiling,so
> I modify hdfs.calltimeout to 60000 and  set hdfs api :
> config.set("dfs.socket.timeout", "3600000");
> config.set("dfs.datanode.socket.write.timeout", "3600000");
> then problem solved !
>
>
>
> Bean Edwards <ma...@gmail.com>
> March 25, 2014 at 8:20 PM
> we got the same problem!.
>
>
>
> Abraham Fine <ma...@brightroll.com>
> March 25, 2014 at 11:43 AM
> Hello-
>
> We have Flume agents running 1.4.0 that sink to HDFS (version 
> 2.0.0-cdh4.2.1).
>
> Exceptions start occurring at the same time across our Flume 
> agentswhen a datanode in HDFS goes down. We did not have this issue 
> whilerunning Flume 1.3.
>
> We noticed a similar issue posted on the mailing list 
> herehttp://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E 
> <http://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E>and 
> on JIRAhttps://issues.apache.org/jira/browse/FLUME-2261 
> <https://issues.apache.org/jira/browse/FLUME-2261>but couldnot find a 
> solution.
>
> We have noticed the following in the Flume logs:
>
> WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO
> error
> java.io.IOException: Callable timed out after 20000 ms on file 
> <FILEPATH> :
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:550) 
>
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:353)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:319)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:405)
>         at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) 
>
>         at 
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.TimeoutException
>         at 
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:543) 
>
>         ... 6 more
>
> This is usually followed by:
>
> WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO
> error
> java.io.IOException: This bucket writer was closed due to idling and
> this handle is thus no longer valid
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:380)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
>         at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) 
>
>         at 
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
>
> When these exceptions occur, the HDFS sink does not close files. 
> Weoften end up with multi-gigabyte files in HDFS.
>
> Our sink configuration:
>
> agentX.sinks.hdfs-sinkX-1.channel = chX
> agentX.sinks.hdfs-sinkX-1.type = hdfs
> agentX.sinks.hdfs-sinkX-1.hdfs.path = <FILEPATH>
> agentX.sinks.hdfs-sinkX-1.hdfs.filePrefix = event
> agentX.sinks.hdfs-sinkX-1.hdfs.writeFormat = Text
> agentX.sinks.hdfs-sinkX-1.hdfs.rollInterval = 120
> agentX.sinks.hdfs-sinkX-1.hdfs.idleTimeout= 180
> agentX.sinks.hdfs-sinkX-1.hdfs.rollCount = 0
> agentX.sinks.hdfs-sinkX-1.hdfs.rollSize = 0
> agentX.sinks.hdfs-sinkX-1.hdfs.fileType = DataStream
> agentX.sinks.hdfs-sinkX-1.hdfs.batchSize = 24000
> agentX.sinks.hdfs-sinkX-1.hdfs.txnEventSize = 24000
> agentX.sinks.hdfs-sinkX-1.hdfs.callTimeout = 20000
> agentX.sinks.hdfs-sinkX-1.hdfs.threadsPoolSize = 1
>
>
> The file paths are unique to each sink.
>
> Thank you for your help.
>
> -- 
> Abraham Fine | Software Engineer
> BrightRoll, Inc. | Smart Video Advertising |www.brightroll.com 
> <http://www.brightroll.com/>

Re: HDFS IO Error

Posted by Bean Edwards <ed...@gmail.com>.

it's probably because high throughput lead slave IO ceiling,so
I modify hdfs.calltimeout to 60000 and  set hdfs api :
config.set("dfs.socket.timeout", "3600000");
config.set("dfs.datanode.socket.write.timeout", "3600000");
then problem solved !


On Thu, Mar 27, 2014 at 1:41 PM, Hari Shreedharan <hshreedharan@cloudera.com
> wrote:

> I filed https://issues.apache.org/jira/browse/FLUME-2348 to investigate.
>
>
> On Wed, Mar 26, 2014 at 10:32 PM, Himanshu Patidar <
> himanshu.patidar@hotmail.com> wrote:
>
>> I am getting the same error too.
>>
>>
>> Thanks,
>> Himanshu
>>
>>
>> ------------------------------
>> Date: Wed, 26 Mar 2014 11:20:10 +0800
>> Subject: Re: HDFS IO Error
>> From: edwardsbean@gmail.com
>> To: user@flume.apache.org
>>
>>
>> we got the same problem!.
>>
>>
>> On Wed, Mar 26, 2014 at 2:43 AM, Abraham Fine <ab...@brightroll.com> wrote:
>>
>> Hello-
>>
>> We have Flume agents running 1.4.0 that sink to HDFS (version
>> 2.0.0-cdh4.2.1).
>>
>> Exceptions start occurring at the same time across our Flume agentswhen a
>> datanode in HDFS goes down. We did not have this issue whilerunning Flume
>> 1.3.
>>
>> We noticed a similar issue posted on the mailing list herehttp://
>> mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-
>> vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E<http://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3cCAPZq-vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA%40mail.gmail.com%3e><
>> http://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-
>> vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E<http://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3cCAPZq-vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA%40mail.gmail.com%3e>>and
>> on JIRAhttps://issues.apache.org/jira/browse/FLUME-2261 <
>> https://issues.apache.org/jira/browse/FLUME-2261>but couldnot find a
>> solution.
>>
>> We have noticed the following in the Flume logs:
>>
>> WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO
>> error
>> java.io.IOException: Callable timed out after 20000 ms on file <FILEPATH>
>> :
>>         at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(
>> BucketWriter.java:550)
>>         at org.apache.flume.sink.hdfs.BucketWriter.doFlush(
>> BucketWriter.java:353)
>>         at org.apache.flume.sink.hdfs.BucketWriter.flush(
>> BucketWriter.java:319)
>>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(
>> HDFSEventSink.java:405)
>>         at org.apache.flume.sink.DefaultSinkProcessor.process(
>> DefaultSinkProcessor.java:68)
>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.
>> java:147)
>>         at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.util.concurrent.TimeoutException
>>         at java.util.concurrent.FutureTask$Sync.innerGet(
>> FutureTask.java:228)
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>>         at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(
>> BucketWriter.java:543)
>>         ... 6 more
>>
>> This is usually followed by:
>>
>> WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO
>> error
>> java.io.IOException: This bucket writer was closed due to idling and
>> this handle is thus no longer valid
>>         at org.apache.flume.sink.hdfs.BucketWriter.append(
>> BucketWriter.java:380)
>>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(
>> HDFSEventSink.java:392)
>>         at org.apache.flume.sink.DefaultSinkProcessor.process(
>> DefaultSinkProcessor.java:68)
>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.
>> java:147)
>>         at java.lang.Thread.run(Thread.java:662)
>>
>> When these exceptions occur, the HDFS sink does not close files. Weoften
>> end up with multi-gigabyte files in HDFS.
>>
>> Our sink configuration:
>>
>> agentX.sinks.hdfs-sinkX-1.channel = chX
>> agentX.sinks.hdfs-sinkX-1.type = hdfs
>> agentX.sinks.hdfs-sinkX-1.hdfs.path = <FILEPATH>
>> agentX.sinks.hdfs-sinkX-1.hdfs.filePrefix = event
>> agentX.sinks.hdfs-sinkX-1.hdfs.writeFormat = Text
>> agentX.sinks.hdfs-sinkX-1.hdfs.rollInterval = 120
>> agentX.sinks.hdfs-sinkX-1.hdfs.idleTimeout= 180
>> agentX.sinks.hdfs-sinkX-1.hdfs.rollCount = 0
>> agentX.sinks.hdfs-sinkX-1.hdfs.rollSize = 0
>> agentX.sinks.hdfs-sinkX-1.hdfs.fileType = DataStream
>> agentX.sinks.hdfs-sinkX-1.hdfs.batchSize = 24000
>> agentX.sinks.hdfs-sinkX-1.hdfs.txnEventSize = 24000
>> agentX.sinks.hdfs-sinkX-1.hdfs.callTimeout = 20000
>> agentX.sinks.hdfs-sinkX-1.hdfs.threadsPoolSize = 1
>>
>>
>> The file paths are unique to each sink.
>>
>> Thank you for your help.
>>
>> --
>> Abraham Fine | Software Engineer
>> BrightRoll, Inc. | Smart Video Advertising |www.brightroll.com <
>> http://www.brightroll.com/>
>>
>>
>>
>

Re: HDFS IO Error

Posted by Hari Shreedharan <hs...@cloudera.com>.

I filed https://issues.apache.org/jira/browse/FLUME-2348 to investigate.


On Wed, Mar 26, 2014 at 10:32 PM, Himanshu Patidar <
himanshu.patidar@hotmail.com> wrote:

> I am getting the same error too.
>
>
> Thanks,
> Himanshu
>
>
> ------------------------------
> Date: Wed, 26 Mar 2014 11:20:10 +0800
> Subject: Re: HDFS IO Error
> From: edwardsbean@gmail.com
> To: user@flume.apache.org
>
>
> we got the same problem!.
>
>
> On Wed, Mar 26, 2014 at 2:43 AM, Abraham Fine <ab...@brightroll.com> wrote:
>
> Hello-
>
> We have Flume agents running 1.4.0 that sink to HDFS (version
> 2.0.0-cdh4.2.1).
>
> Exceptions start occurring at the same time across our Flume agentswhen a
> datanode in HDFS goes down. We did not have this issue whilerunning Flume
> 1.3.
>
> We noticed a similar issue posted on the mailing list herehttp://
> mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-
> vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E<http://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3cCAPZq-vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA%40mail.gmail.com%3e><
> http://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-
> vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E<http://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3cCAPZq-vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA%40mail.gmail.com%3e>>and
> on JIRAhttps://issues.apache.org/jira/browse/FLUME-2261 <
> https://issues.apache.org/jira/browse/FLUME-2261>but couldnot find a
> solution.
>
> We have noticed the following in the Flume logs:
>
> WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO
> error
> java.io.IOException: Callable timed out after 20000 ms on file <FILEPATH> :
>         at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(
> BucketWriter.java:550)
>         at org.apache.flume.sink.hdfs.BucketWriter.doFlush(
> BucketWriter.java:353)
>         at org.apache.flume.sink.hdfs.BucketWriter.flush(
> BucketWriter.java:319)
>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(
> HDFSEventSink.java:405)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(
> DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.
> java:147)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.TimeoutException
>         at java.util.concurrent.FutureTask$Sync.innerGet(
> FutureTask.java:228)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>         at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(
> BucketWriter.java:543)
>         ... 6 more
>
> This is usually followed by:
>
> WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO
> error
> java.io.IOException: This bucket writer was closed due to idling and
> this handle is thus no longer valid
>         at org.apache.flume.sink.hdfs.BucketWriter.append(
> BucketWriter.java:380)
>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(
> HDFSEventSink.java:392)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(
> DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.
> java:147)
>         at java.lang.Thread.run(Thread.java:662)
>
> When these exceptions occur, the HDFS sink does not close files. Weoften
> end up with multi-gigabyte files in HDFS.
>
> Our sink configuration:
>
> agentX.sinks.hdfs-sinkX-1.channel = chX
> agentX.sinks.hdfs-sinkX-1.type = hdfs
> agentX.sinks.hdfs-sinkX-1.hdfs.path = <FILEPATH>
> agentX.sinks.hdfs-sinkX-1.hdfs.filePrefix = event
> agentX.sinks.hdfs-sinkX-1.hdfs.writeFormat = Text
> agentX.sinks.hdfs-sinkX-1.hdfs.rollInterval = 120
> agentX.sinks.hdfs-sinkX-1.hdfs.idleTimeout= 180
> agentX.sinks.hdfs-sinkX-1.hdfs.rollCount = 0
> agentX.sinks.hdfs-sinkX-1.hdfs.rollSize = 0
> agentX.sinks.hdfs-sinkX-1.hdfs.fileType = DataStream
> agentX.sinks.hdfs-sinkX-1.hdfs.batchSize = 24000
> agentX.sinks.hdfs-sinkX-1.hdfs.txnEventSize = 24000
> agentX.sinks.hdfs-sinkX-1.hdfs.callTimeout = 20000
> agentX.sinks.hdfs-sinkX-1.hdfs.threadsPoolSize = 1
>
>
> The file paths are unique to each sink.
>
> Thank you for your help.
>
> --
> Abraham Fine | Software Engineer
> BrightRoll, Inc. | Smart Video Advertising |www.brightroll.com <
> http://www.brightroll.com/>
>
>
>

RE: HDFS IO Error

Posted by Himanshu Patidar <hi...@hotmail.com>.

I am getting the same error too. 

Thanks,Himanshu

Date: Wed, 26 Mar 2014 11:20:10 +0800
Subject: Re: HDFS IO Error
From: edwardsbean@gmail.com
To: user@flume.apache.org

we got the same problem!.

On Wed, Mar 26, 2014 at 2:43 AM, Abraham Fine <ab...@brightroll.com> wrote:

Hello-

We have Flume agents running 1.4.0 that sink to HDFS (version 2.0.0-cdh4.2.1).

Exceptions start occurring at the same time across our Flume agentswhen a datanode in HDFS goes down. We did not have this issue whilerunning Flume 1.3.

We noticed a similar issue posted on the mailing list herehttp://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E <http://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E>and on JIRAhttps://issues.apache.org/jira/browse/FLUME-2261 <https://issues.apache.org/jira/browse/FLUME-2261>but couldnot find a solution.

We have noticed the following in the Flume logs:

WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]

(org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO

error

java.io.IOException: Callable timed out after 20000 ms on file <FILEPATH> :

        at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:550)

        at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:353)

        at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:319)

        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:405)

        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)

        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)

        at java.lang.Thread.run(Thread.java:662)

Caused by: java.util.concurrent.TimeoutException

        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)

        at java.util.concurrent.FutureTask.get(FutureTask.java:91)

        at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:543)

        ... 6 more

This is usually followed by:

WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]

(org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO

error

java.io.IOException: This bucket writer was closed due to idling and

this handle is thus no longer valid

        at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:380)

        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)

        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)

        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)

        at java.lang.Thread.run(Thread.java:662)

When these exceptions occur, the HDFS sink does not close files. Weoften end up with multi-gigabyte files in HDFS.

Our sink configuration:

agentX.sinks.hdfs-sinkX-1.channel = chX

agentX.sinks.hdfs-sinkX-1.type = hdfs

agentX.sinks.hdfs-sinkX-1.hdfs.path = <FILEPATH>

agentX.sinks.hdfs-sinkX-1.hdfs.filePrefix = event

agentX.sinks.hdfs-sinkX-1.hdfs.writeFormat = Text

agentX.sinks.hdfs-sinkX-1.hdfs.rollInterval = 120

agentX.sinks.hdfs-sinkX-1.hdfs.idleTimeout= 180

agentX.sinks.hdfs-sinkX-1.hdfs.rollCount = 0

agentX.sinks.hdfs-sinkX-1.hdfs.rollSize = 0

agentX.sinks.hdfs-sinkX-1.hdfs.fileType = DataStream

agentX.sinks.hdfs-sinkX-1.hdfs.batchSize = 24000

agentX.sinks.hdfs-sinkX-1.hdfs.txnEventSize = 24000

agentX.sinks.hdfs-sinkX-1.hdfs.callTimeout = 20000

agentX.sinks.hdfs-sinkX-1.hdfs.threadsPoolSize = 1

The file paths are unique to each sink.

Thank you for your help.

--

Abraham Fine | Software Engineer

BrightRoll, Inc. | Smart Video Advertising |www.brightroll.com <http://www.brightroll.com/>

Re: HDFS IO Error

Posted by Bean Edwards <ed...@gmail.com>.

we got the same problem!.


On Wed, Mar 26, 2014 at 2:43 AM, Abraham Fine <ab...@brightroll.com> wrote:

> Hello-
>
> We have Flume agents running 1.4.0 that sink to HDFS (version
> 2.0.0-cdh4.2.1).
>
> Exceptions start occurring at the same time across our Flume agentswhen a
> datanode in HDFS goes down. We did not have this issue whilerunning Flume
> 1.3.
>
> We noticed a similar issue posted on the mailing list herehttp://
> mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-
> vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E <
> http://mail-archives.apache.org/mod_mbox/flume-user/201307.mbox/%3CCAPZq-
> vkmDGptbOWEAF+rE-1neibUtQ36+EHqukn5B7FUM4QAyA@mail.gmail.com%3E>and on
> JIRAhttps://issues.apache.org/jira/browse/FLUME-2261 <
> https://issues.apache.org/jira/browse/FLUME-2261>but couldnot find a
> solution.
>
> We have noticed the following in the Flume logs:
>
> WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO
> error
> java.io.IOException: Callable timed out after 20000 ms on file <FILEPATH> :
>         at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(
> BucketWriter.java:550)
>         at org.apache.flume.sink.hdfs.BucketWriter.doFlush(
> BucketWriter.java:353)
>         at org.apache.flume.sink.hdfs.BucketWriter.flush(
> BucketWriter.java:319)
>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(
> HDFSEventSink.java:405)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(
> DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.
> java:147)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.TimeoutException
>         at java.util.concurrent.FutureTask$Sync.innerGet(
> FutureTask.java:228)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>         at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(
> BucketWriter.java:543)
>         ... 6 more
>
> This is usually followed by:
>
> WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO
> error
> java.io.IOException: This bucket writer was closed due to idling and
> this handle is thus no longer valid
>         at org.apache.flume.sink.hdfs.BucketWriter.append(
> BucketWriter.java:380)
>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(
> HDFSEventSink.java:392)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(
> DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.
> java:147)
>         at java.lang.Thread.run(Thread.java:662)
>
> When these exceptions occur, the HDFS sink does not close files. Weoften
> end up with multi-gigabyte files in HDFS.
>
> Our sink configuration:
>
> agentX.sinks.hdfs-sinkX-1.channel = chX
> agentX.sinks.hdfs-sinkX-1.type = hdfs
> agentX.sinks.hdfs-sinkX-1.hdfs.path = <FILEPATH>
> agentX.sinks.hdfs-sinkX-1.hdfs.filePrefix = event
> agentX.sinks.hdfs-sinkX-1.hdfs.writeFormat = Text
> agentX.sinks.hdfs-sinkX-1.hdfs.rollInterval = 120
> agentX.sinks.hdfs-sinkX-1.hdfs.idleTimeout= 180
> agentX.sinks.hdfs-sinkX-1.hdfs.rollCount = 0
> agentX.sinks.hdfs-sinkX-1.hdfs.rollSize = 0
> agentX.sinks.hdfs-sinkX-1.hdfs.fileType = DataStream
> agentX.sinks.hdfs-sinkX-1.hdfs.batchSize = 24000
> agentX.sinks.hdfs-sinkX-1.hdfs.txnEventSize = 24000
> agentX.sinks.hdfs-sinkX-1.hdfs.callTimeout = 20000
> agentX.sinks.hdfs-sinkX-1.hdfs.threadsPoolSize = 1
>
>
> The file paths are unique to each sink.
>
> Thank you for your help.
>
> --
> Abraham Fine | Software Engineer
> BrightRoll, Inc. | Smart Video Advertising |www.brightroll.com <
> http://www.brightroll.com/>
>