You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Hadiiiiiiiii (Jira)" <ji...@apache.org> on 2022/03/14 10:10:00 UTC
[jira] [Commented] (NIFI-9792) hadoop.ListHDFS.java Data may be lost(not listed)
[ https://issues.apache.org/jira/browse/NIFI-9792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506124#comment-17506124 ]
Hadiiiiiiiii commented on NIFI-9792:
------------------------------------
and When List HDFS dirs too slowly, it will lost many files in a high probability.
my hdfs has too many files, average task duration is over 5min to list.
!image-2022-03-14-18-09-07-395.png!
> hadoop.ListHDFS.java Data may be lost(not listed)
> -------------------------------------------------
>
> Key: NIFI-9792
> URL: https://issues.apache.org/jira/browse/NIFI-9792
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.15.3
> Reporter: Hadiiiiiiiii
> Priority: Major
> Attachments: image-2022-03-14-15-27-14-585.png, image-2022-03-14-15-27-21-977.png, image-2022-03-14-18-09-07-395.png
>
>
> in this code, compute the fileAge, the CurrentTimeMillis() is not same. so maybe have this wrong:
> !image-2022-03-14-15-27-21-977.png!
>
> see this:
> !image-2022-03-14-15-27-14-585.png!
> normal case:
> hdfsFileTime whenListTheFileTime Difference(theFileExistenceTime) isTakeTheFile
> 14:46:00 15:00:00 4min false
> 14:47:00 15:00:00 3min false
> 14:48:00 15:00:00 2min false
> all is ok
> abnormal case:
> hdfsFileTime whenListTheFileTime Difference(theFileExistenceTime) isTakeTheFile
> 14:46:00 15:00:00 4min false
> 14:47:00 15:02:00(delayed for 2min) 5min true
> 14:48:00 15:02:00 4min false
> lost first file and last file.
>
>
> so, do not use the System.currentTime.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)