You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Tim Driscoll <ti...@gmail.com> on 2013/07/08 18:47:25 UTC

HDFS Event Sink Timeouts / Hadoop 2

This question is somewhat two part.  We have Flume agents (1.3.1, recently
updated to 1.4) with HDFS Events sinks that write to our Hadoop cluster.

We will occasionally get timeouts writing to Hadoop (stack trace below).
 Then eventually, the queues start backing up.  Under normal load, the
queues are at 1-5%.  There shouldn't be any reason the sinks can't keep up,
however, the queues eventually fill up and we have to restart the agent.

Has anyone had issues similar to this?  This happens often enough that we
have to restart the agent every day or two.

The other possible issue is that we're running a Hadoop 2.0.5-alpha cluster
with 25 data nodes.  How much (if any) testing has been done against Hadoop
2?  I saw the build scripts had a hadoop-2 profile, but I had to modify it
to get it to build the HDFS Event Sink, so I wasn't sure the state of
compatibility or support with it.

Any help anyone can provide would be appreciated.

-Tim

=============
Stack Trace
=============
06 Jul 2013 20:18:27,360 WARN
 [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.
apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO error
java.io.IOException: Callable timed out after 30000 ms on file:
/user/svc-neb/rest_x
action_logs/date=2013-07-06/p3nlnebss004.prod.phx3.secureserver.net.20.1373166185264
.avro.tmp
        at
org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java
:550)
        at
org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:353)
        at
org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:319)
        at
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:405)
        at
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.j
ava:68)
        at
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.TimeoutException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at
org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java
:543)
        ... 6 more