You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Snehal Nagmote <na...@gmail.com> on 2013/11/26 20:07:25 UTC

Flume HDFS Sink Issue (IO Exception - hdfs.DFSClient$DFSOutputStream.sync)

Hello All,

We are using HDFS sink with Flume and it goes into HDFS IO Exception very
often .

I am using apache Flume HDP 1.4.0. we have two tier topology and Collector
is not on datanode ,Collector fails often and it
throws  java.io.IOException: DFSOutputStream is closed

java.io.IOException: DFSOutputStream is closed
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:4097)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:4084)
at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:117)
at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:356)
at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:353)
at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536)
at
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160)
at org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56)
at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

This is how configuration looks like


agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.hdfs.filePrefix = %Y%m%d%H-events-1
agent.sinks.hdfs-sink.hdfs.path = hdfs://
bi-hdnn01.sjc.kixeye.com:8020/flume/logs/%Y%m%d/%H/
agent.sinks.hdfs-sink.hdfs.fileSuffix = .done
agent.sinks.hdfs-sink.hdfs.fileType =DataStream
agent.sinks.hdfs-sink.hdfs.writeFormat = Text
agent.sinks.hdfs-sink.hdfs.rollInterval = 0
agent.sinks.hdfs-sink.hdfs.rollSize = 0
agent.sinks.hdfs-sink.hdfs.rollCount = 0
agent.sinks.hdfs-sink.hdfs.batchSize = 10000
agent.sinks.hdfs-sink.hdfs.threadsPoolSize=10000
agent.sinks.hdfs-sink.hdfs.rollTimerPoolSize=10
agent.sinks.hdfs-sink.hdfs.callTimeout = 500000


Earlier , I was using rollInterval=30 , I changed it to 0 because of above
exception and then I started seeing new exception.

 Failed to renew lease for [DFSClient_NONMAPREDUCE_1307546979_31] for 30
seconds.  Will retry shortly ...
java.io.IOException: Call to
bi-hdnn01.sjc.kixeye.com/10.54.208.14:8020failed on local exception:
java.io.IOException:

Caused by: java.io.IOException: Connection reset by peer


Because of these exception , our production downstream process gets lot
slower and need frequent restarts and upstream process fills channels ,
Does anyone know , what could be the cause and how we can avoid this ?

Any thoughts would be really helpful , its been extremely difficult to
debug this


Thanks,
Snehal

Re: Flume HDFS Sink Issue (IO Exception - hdfs.DFSClient$DFSOutputStream.sync)

Posted by Snehal Nagmote <na...@gmail.com>.
Sorry forgot to mention , We are using Hadoop 1.2.0


On 26 November 2013 11:07, Snehal Nagmote <na...@gmail.com> wrote:

> Hello All,
>
> We are using HDFS sink with Flume and it goes into HDFS IO Exception very
> often .
>
> I am using apache Flume HDP 1.4.0. we have two tier topology and Collector
> is not on datanode ,Collector fails often and it
> throws  java.io.IOException: DFSOutputStream is closed
>
> java.io.IOException: DFSOutputStream is closed
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:4097)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:4084)
>  at
> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
> at org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:117)
>  at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:356)
> at org.apache.flume.sink.hdfs.BucketWriter$5.call(BucketWriter.java:353)
>  at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536)
> at
> org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160)
>  at
> org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56)
> at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>  at java.lang.Thread.run(Thread.java:662)
>
> This is how configuration looks like
>
>
> agent.sinks.hdfs-sink.type = hdfs
> agent.sinks.hdfs-sink.hdfs.filePrefix = %Y%m%d%H-events-1
> agent.sinks.hdfs-sink.hdfs.path = hdfs://
> bi-hdnn01.sjc.kixeye.com:8020/flume/logs/%Y%m%d/%H/
> agent.sinks.hdfs-sink.hdfs.fileSuffix = .done
> agent.sinks.hdfs-sink.hdfs.fileType =DataStream
> agent.sinks.hdfs-sink.hdfs.writeFormat = Text
> agent.sinks.hdfs-sink.hdfs.rollInterval = 0
> agent.sinks.hdfs-sink.hdfs.rollSize = 0
> agent.sinks.hdfs-sink.hdfs.rollCount = 0
> agent.sinks.hdfs-sink.hdfs.batchSize = 10000
> agent.sinks.hdfs-sink.hdfs.threadsPoolSize=10000
> agent.sinks.hdfs-sink.hdfs.rollTimerPoolSize=10
> agent.sinks.hdfs-sink.hdfs.callTimeout = 500000
>
>
> Earlier , I was using rollInterval=30 , I changed it to 0 because of above
> exception and then I started seeing new exception.
>
>  Failed to renew lease for [DFSClient_NONMAPREDUCE_1307546979_31] for 30
> seconds.  Will retry shortly ...
> java.io.IOException: Call to bi-hdnn01.sjc.kixeye.com/10.54.208.14:8020failed on local exception: java.io.IOException:
>
> Caused by: java.io.IOException: Connection reset by peer
>
>
> Because of these exception , our production downstream process gets lot
> slower and need frequent restarts and upstream process fills channels ,
> Does anyone know , what could be the cause and how we can avoid this ?
>
> Any thoughts would be really helpful , its been extremely difficult to
> debug this
>
>
> Thanks,
> Snehal
>
>
>
>
>
>