You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flume.apache.org by "Xinyuan Liu (Jira)" <ji...@apache.org> on 2021/06/08 08:29:00 UTC

[jira] [Updated] (FLUME-3392) kafka channel flume hdfs file close failed cause retry and data losse

     [ https://issues.apache.org/jira/browse/FLUME-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xinyuan Liu updated FLUME-3392:
-------------------------------
    Description: 
 

Due to the failure of hdfs BucketWriter close file and unlimited retries, the pressure of nenode is too high, and data may be lost due to subsequent successful consumption and offset submission.
{code:java}
2021-05-30 02:33:45,045 (hdfs-s1-roll-timer-0) [WARN - org.apache.flume.sink.hdfs.BucketWriter$CloseHandler.close(BucketWriter.java:348)] Closing file: hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622312610822.json.tmp failed. Will retry again in 180 seconds.2021-05-30 02:33:45,045 (hdfs-s1-roll-timer-0) [WARN - org.apache.flume.sink.hdfs.BucketWriter$CloseHandler.close(BucketWriter.java:348)] Closing file: hdfs://x.x.x.x:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622312610822.json.tmp failed. Will retry again in 180 seconds.java.io.IOException: Unable to close file because the last block BP-1006090754-10.1.53.214-1490513887497:blk_6365012859_5293069088 does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2865) at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2810) at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2794) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2737) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:135) at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:319) at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:316) at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:727) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:724) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)2021-05-30 02:33:45,062 (hdfs-s1-call-runner-12) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:681)] Renaming hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622312610822.json.tmp to hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/t_note_phone.1622312610822.json2021-05-30 02:34:04,157 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:57)] Serializer = TEXT, UseRawLocalFileSystem = false2021-05-30 02:34:04,181 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:246)] Creating hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622313244158.json.tmp2021-05-30 02:36:45,047 (hdfs-s1-call-runner-2) [ERROR - org.apache.flume.sink.hdfs.AbstractHDFSWriter.hflushOrSync(AbstractHDFSWriter.java:269)] Error while trying to hflushOrSync!2021-05-30 02:36:45,048 (hdfs-s1-roll-timer-0) [WARN - org.apache.flume.sink.hdfs.BucketWriter$CloseHandler.close(BucketWriter.java:348)] Closing file: hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622312610822.json.tmp failed. Will retry again in 180 seconds.java.nio.channels.ClosedChannelException at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:2039) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2461) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:2395) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:138) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flume.sink.hdfs.AbstractHDFSWriter.hflushOrSync(AbstractHDFSWriter.java:266) at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:134) at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:319) at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:316) at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:727) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:724) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
{code}

  was:
 

Due to the failure of hdfs BucketWriter close file and unlimited retries, the pressure of nenode is too high, and data may be lost due to subsequent successful consumption and offset submission.
{code:java}
2021-05-30 02:33:45,045 (hdfs-s1-roll-timer-0) [WARN - org.apache.flume.sink.hdfs.BucketWriter$CloseHandler.close(BucketWriter.java:348)] Closing file: hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622312610822.json.tmp failed. Will retry again in 180 seconds.2021-05-30 02:33:45,045 (hdfs-s1-roll-timer-0) [WARN - org.apache.flume.sink.hdfs.BucketWriter$CloseHandler.close(BucketWriter.java:348)] Closing file: hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622312610822.json.tmp failed. Will retry again in 180 seconds.java.io.IOException: Unable to close file because the last block BP-1006090754-10.1.53.214-1490513887497:blk_6365012859_5293069088 does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2865) at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2810) at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2794) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2737) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:135) at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:319) at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:316) at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:727) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:724) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)2021-05-30 02:33:45,062 (hdfs-s1-call-runner-12) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:681)] Renaming hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622312610822.json.tmp to hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/t_note_phone.1622312610822.json2021-05-30 02:34:04,157 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:57)] Serializer = TEXT, UseRawLocalFileSystem = false2021-05-30 02:34:04,181 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:246)] Creating hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622313244158.json.tmp2021-05-30 02:36:45,047 (hdfs-s1-call-runner-2) [ERROR - org.apache.flume.sink.hdfs.AbstractHDFSWriter.hflushOrSync(AbstractHDFSWriter.java:269)] Error while trying to hflushOrSync!2021-05-30 02:36:45,048 (hdfs-s1-roll-timer-0) [WARN - org.apache.flume.sink.hdfs.BucketWriter$CloseHandler.close(BucketWriter.java:348)] Closing file: hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622312610822.json.tmp failed. Will retry again in 180 seconds.java.nio.channels.ClosedChannelException at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:2039) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2461) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:2395) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:138) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flume.sink.hdfs.AbstractHDFSWriter.hflushOrSync(AbstractHDFSWriter.java:266) at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:134) at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:319) at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:316) at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:727) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:724) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
{code}


> kafka channel flume hdfs file close failed cause retry and data losse
> ---------------------------------------------------------------------
>
>                 Key: FLUME-3392
>                 URL: https://issues.apache.org/jira/browse/FLUME-3392
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>         Environment: flume 1.9.0
> kafka channel
> hdfs sink
>            Reporter: Xinyuan Liu
>            Priority: Blocker
>
>  
> Due to the failure of hdfs BucketWriter close file and unlimited retries, the pressure of nenode is too high, and data may be lost due to subsequent successful consumption and offset submission.
> {code:java}
> 2021-05-30 02:33:45,045 (hdfs-s1-roll-timer-0) [WARN - org.apache.flume.sink.hdfs.BucketWriter$CloseHandler.close(BucketWriter.java:348)] Closing file: hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622312610822.json.tmp failed. Will retry again in 180 seconds.2021-05-30 02:33:45,045 (hdfs-s1-roll-timer-0) [WARN - org.apache.flume.sink.hdfs.BucketWriter$CloseHandler.close(BucketWriter.java:348)] Closing file: hdfs://x.x.x.x:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622312610822.json.tmp failed. Will retry again in 180 seconds.java.io.IOException: Unable to close file because the last block BP-1006090754-10.1.53.214-1490513887497:blk_6365012859_5293069088 does not have enough number of replicas. at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2865) at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2810) at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2794) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2737) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:135) at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:319) at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:316) at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:727) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:724) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)2021-05-30 02:33:45,062 (hdfs-s1-call-runner-12) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:681)] Renaming hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622312610822.json.tmp to hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/t_note_phone.1622312610822.json2021-05-30 02:34:04,157 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:57)] Serializer = TEXT, UseRawLocalFileSystem = false2021-05-30 02:34:04,181 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:246)] Creating hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622313244158.json.tmp2021-05-30 02:36:45,047 (hdfs-s1-call-runner-2) [ERROR - org.apache.flume.sink.hdfs.AbstractHDFSWriter.hflushOrSync(AbstractHDFSWriter.java:269)] Error while trying to hflushOrSync!2021-05-30 02:36:45,048 (hdfs-s1-roll-timer-0) [WARN - org.apache.flume.sink.hdfs.BucketWriter$CloseHandler.close(BucketWriter.java:348)] Closing file: hdfs://10.1.53.19:9020/warehouse/ods_flume_json_yunc_transcation/ods_flume_json_thirdparty_db__t_note_phone/datelabel=20210530/__t_note_phonet_note_phone.1622312610822.json.tmp failed. Will retry again in 180 seconds.java.nio.channels.ClosedChannelException at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:2039) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2461) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:2395) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:138) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flume.sink.hdfs.AbstractHDFSWriter.hflushOrSync(AbstractHDFSWriter.java:266) at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:134) at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:319) at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:316) at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:727) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:724) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@flume.apache.org
For additional commands, e-mail: issues-help@flume.apache.org