You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flume.apache.org by "Anes Mukhametov (Jira)" <ji...@apache.org> on 2020/05/11 14:36:00 UTC

[jira] [Created] (FLUME-3366) hdfs sink can't rename file after hdfs.DFSClien timeout

Anes Mukhametov created FLUME-3366:
--------------------------------------

             Summary: hdfs sink can't rename file after hdfs.DFSClien timeout
                 Key: FLUME-3366
                 URL: https://issues.apache.org/jira/browse/FLUME-3366
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: 1.6.0
         Environment: cdh 5.16.2

Sink configuration

2hdfs.sinks.scribe2hdfsprd.type         = hdfs
2hdfs.sinks.scribe2hdfsprd.channel      = kafka
2hdfs.sinks.scribe2hdfsprd.hdfs.path = hdfs://clname/data/logs/
2hdfs.sinks.scribe2hdfsprd.hdfs.filePrefix = logone-%Y-%m-%d-%H
2hdfs.sinks.scribe2hdfsprd.hdfs.rollSize = 268435456
2hdfs.sinks.scribe2hdfsprd.hdfs.rollInterval = 3600
2hdfs.sinks.scribe2hdfsprd.hdfs.rollCount = 0
2hdfs.sinks.scribe2hdfsprd.hdfs.idleTimeout = 300
2hdfs.sinks.scribe2hdfsprd.hdfs.batchSize = 1000
2hdfs.sinks.scribe2hdfsprd.hdfs.minBlockReplicas = 3
2hdfs.sinks.scribe2hdfsprd.hdfs.threadsPoolSize = 600
2hdfs.sinks.scribe2hdfsprd.hdfs.rollTimerPoolSize = 10
2hdfs.sinks.scribe2hdfsprd.hdfs.callTimeout = 300000
2hdfs.sinks.scribe2hdfsprd.hdfs.fileType = DataStream
2hdfs.sinks.scribe2hdfsprd.hdfs.writeFormat = Text
2hdfs.sinks.scribe2hdfsprd.hdfs.proxyUser = stat
2hdfs.sinks.scribe2hdfsprd.sink.serializer = text
2hdfs.sinks.scribe2hdfsprd.sink.serializer.appendNewline = false
            Reporter: Anes Mukhametov


There seems to be a problem related to hdfs timeouts.
{{
2020-05-03 20:50:39,819 INFO org.apache.flume.sink.hdfs.BucketWriter: Creating hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp
2020-05-03 21:01:52,208 INFO org.apache.flume.sink.hdfs.BucketWriter: Closing idle bucketWriter hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp at 1588528912208
2020-05-03 21:01:52,208 INFO org.apache.flume.sink.hdfs.HDFSEventSink: Writer callback called.
2020-05-03 21:01:52,208 INFO org.apache.flume.sink.hdfs.BucketWriter: Closing hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp
2020-05-03 21:03:07,282 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-1441209260-10.30.0.19-1387400921010:blk_5841169679_1121097069614
java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.30.3.187:40062 remote=/10.30.3.187:50010]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
        at java.io.FilterInputStream.read(FilterInputStream.java:83)
        at java.io.FilterInputStream.read(FilterInputStream.java:83)
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2354)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:235)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:1093)
2020-05-03 21:03:07,283 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-1441209260-10.30.0.19-1387400921010:blk_5841169679_1121097069614 in pipeline DatanodeInfoWithStorage[10.30.3.187:50010,DS-306b4cb3-ba0d-4970-80cd-894083759467,DISK], DatanodeInfoWithStorage[10.30.3.196:50010,DS-bdc91c20-e8d0-4d43-a1b0-41cabb8306b3,DISK], DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK]: bad datanode DatanodeInfoWithStorage[10.30.3.187:50010,DS-306b4cb3-ba0d-4970-80cd-894083759467,DISK]
020-05-03 21:05:22,463 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds.
java.io.IOException: All datanodes DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting...
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739)
2020-05-03 21:05:22,474 INFO org.apache.flume.sink.hdfs.BucketWriter: Renaming hdfs://prdhdp/data/mt/stream-flume/gekko_bahromator/prdhdp881.g/gekko_bahromator-2020-05-03-20.1588528239807.tmp to hdfs://prdhdp/data/mt/stream-flume/gekko_bahromator/prdhdp881.g/gekko_bahromator-2020-05-03-20.1588528239807
2020-05-03 21:08:22,466 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds.
java.io.IOException: All datanodes DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting...
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739)
2020-05-03 21:11:22,467 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds.
java.io.IOException: All datanodes DatanodeInfoWithStorage[10.30.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting...
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739)

...

2020-05-10 13:14:08,742 WARN org.apache.flume.sink.hdfs.BucketWriter: Closing file: hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807.tmp failed. Will retry again in 180 seconds.
java.io.IOException: All datanodes DatanodeInfoWithStorage[10.121.3.10:50010,DS-5592761f-80c3-473c-b34b-536d52f3908e,DISK] are bad. Aborting...
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1512)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1254)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:739)

}}

Flume infinitely tries to close the file.

First of all, the file is renamed from .tmp to it's normal name already. But still logged with tmp suffix.
{{
-rw-r--r--   3 flume flume    5054814 2020-05-03 20:50 hdfs://clname/data/logs/logone-2020-05-03-20.1588528239807
}}

It is in open state also
{{
$ hdfs dfsadmin -listOpenFiles|grep 5-03-20.1588528239807
10.121.3.187            DFSClient_NONMAPREDUCE_833895359_50     /data/logs/logone-2020-05-03-20.1588528239807
}}

hdfs fsck -files -blocks -locations on this file runs without any error, but with no any block listed.

Could this be a problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@flume.apache.org
For additional commands, e-mail: issues-help@flume.apache.org