You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "chenshangan (JIRA)" <ji...@apache.org> on 2014/07/22 16:30:39 UTC

[jira] [Commented] (FLUME-2429) Callable timed out in HDFS sink

    [ https://issues.apache.org/jira/browse/FLUME-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070311#comment-14070311 ] 

chenshangan commented on FLUME-2429:
------------------------------------

testAgent.sinks.testSink.hdfs.callTimeout = 15000
the callTimeout here is too short for a hdfs operation, I use 180000 in production env. Keep in mind, hdfs operation sometimes cost a lot of time, and error might happens, so you should deal with these exceptions. Sometimes blocks of a file might lost, and file can never got closed.In flume-1.5, there's a parameter to control how many times you want to try to close a file.  

> Callable timed out in HDFS sink
> -------------------------------
>
>                 Key: FLUME-2429
>                 URL: https://issues.apache.org/jira/browse/FLUME-2429
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v1.4.0
>            Reporter: Jay
>
> Hi.
> I got a warning msg using HDFS sink.
> AVRO source > Memory (or File) channel > HDFS sink
> Switching channel type didn't solve the problem.
> Error occurs once a day or several days.
> Any Solution?
> Here is my configuration.
> --------------------------------------------------------------------
> testAgent.sources = testSrc
> testAgent.channels = testChannel
> testAgent.sinks = testSink
> testAgent.sources.testSrc.type = avro
> testAgent.sources.testSrc.channels = testChannel
> testAgent.channels.testChannel.type = memory
> testAgent.sources.testSrc.bind = 0.0.0.0
> testAgent.sources.testSrc.port = 4141
> testAgent.sinks.testSink.type = hdfs
> testAgent.sinks.testSink.channel = testChannel
> testAgent.sources.testSrc.interceptors = testInterceptor
> testAgent.sources.testSrc.interceptors.testInterceptor.type = static
> testAgent.sources.testSrc.interceptors.testInterceptor.preserveExisting = true
> testAgent.sources.testSrc.interceptors.testInterceptor.key = testKey
> testAgent.sources.testSrc.interceptors.testInterceptor.value = .testfile
> testAgent.sinks.testSink.hdfs.path = hdfs://hadoop-cluster:8020/flume/%Y%m%d
> testAgent.sinks.testSink.hdfs.filePrefix = %Y%m%d%H%M
> testAgent.sinks.testSink.hdfs.fileSuffix = .testfile
> testAgent.sinks.testSink.hdfs.fileType = DataStream
> testAgent.sinks.testSink.hdfs.rollInterval = 1
> testAgent.sinks.testSink.hdfs.rollCount = 0
> testAgent.sinks.testSink.hdfs.rollSize = 0
> testAgent.sinks.testSink.hdfs.batchSize = 150000
> testAgent.sinks.testSink.hdfs.callTimeout = 15000
> testAgent.sinks.testSink.hdfs.useLocalTimeStamp = true
> testAgent.sinks.testSink.serializer = text
> testAgent.sinks.testSink.serializer.appendNewline = false
> testAgent.channels.testChannel.keep-alive = 1
> testAgent.channels.testChannel.write-timeout = 1
> testAgent.channels.testChannel.transactionCapacity = 150000
> testAgent.channels.testChannel.capacity = 18000000
> #testAgent.channels.testChannel.checkpointDir = /data/flumedata/checkpoint
> #testAgent.channels.testChannel.useDualCheckpoints = true
> #testAgent.channels.testChannel.backupCheckpointDir = /data/flumedata_backup/checkpoint
> #testAgent.channels.testChannel.dataDirs = /data/flumedata/data
> testAgent.channels.testChannel.byteCapacityBufferPercentage = 20
> testAgent.channels.testChannel.byteCapacity = 1000000000
> --------------------------------------------------------------------
> I sometimes get a warning message in a flume log.
> --------------------------------------------------------------------
> 2014-07-22 16:28:20,186 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:477)] Caught IOException writing to HDFSWriter (Callable timed out after 15000 ms on file: hdfs://hadoop-cluster:8020/flume/20140722/201407221628.1406014084417.testfile.tmp). Closing file (hdfs://hadoop-cluster:8020/flume/20140722/201407221628.1406014084417.testfile.tmp) and rethrowing exception.
> 2014-07-22 16:28:35,187 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:483)] Caught IOException while closing file (hdfs://hadoop-cluster:8020/flume/20140722/201407221628.1406014084417.testfile.tmp). Exception follows.
> java.io.IOException: Callable timed out after 15000 ms on file: hdfs://search-hdanal-cluster:8020/flume/20140722/201407221628.1406014084417.testfile.tmp
>         at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:603)
>         at org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:381)
>         at org.apache.flume.sink.hdfs.BucketWriter.flush(BucketWriter.java:343)
>         at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:292)
>         at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:481)
>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.TimeoutException
>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>         at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:596)
>         ... 8 more
> 2014-07-22 16:28:35,187 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:438)] HDFS IO error
> java.io.IOException: Callable timed out after 15000 ms on file: hdfs://hadoop-cluster:8020/flume/20140722/201407221628.1406014084417.testfile.tmp
>         at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:603)
>         at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:469)
>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.TimeoutException
>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>         at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:596)
>         ... 5 more
> --------------------------------------------------------------------



--
This message was sent by Atlassian JIRA
(v6.2#6252)