You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Hadiiiiiiiii (Jira)" <ji...@apache.org> on 2022/03/14 10:10:00 UTC

[jira] [Commented] (NIFI-9792) hadoop.ListHDFS.java Data may be lost(not listed)

    [ https://issues.apache.org/jira/browse/NIFI-9792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506124#comment-17506124 ] 

Hadiiiiiiiii commented on NIFI-9792:
------------------------------------

and When List HDFS dirs too slowly, it will lost many files in a high probability.

my hdfs has too many files, average task duration is over 5min to list. 

!image-2022-03-14-18-09-07-395.png!

> hadoop.ListHDFS.java Data may be lost(not listed)
> -------------------------------------------------
>
>                 Key: NIFI-9792
>                 URL: https://issues.apache.org/jira/browse/NIFI-9792
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.15.3
>            Reporter: Hadiiiiiiiii
>            Priority: Major
>         Attachments: image-2022-03-14-15-27-14-585.png, image-2022-03-14-15-27-21-977.png, image-2022-03-14-18-09-07-395.png
>
>
> in this code, compute the fileAge, the CurrentTimeMillis() is not same. so maybe have this wrong:
> !image-2022-03-14-15-27-21-977.png!
>  
> see this:
> !image-2022-03-14-15-27-14-585.png!
> normal case:
> hdfsFileTime     whenListTheFileTime      Difference(theFileExistenceTime)     isTakeTheFile
> 14:46:00         15:00:00                     4min                               false
> 14:47:00         15:00:00                     3min                               false
> 14:48:00         15:00:00                     2min                               false
> all is ok
> abnormal case:
> hdfsFileTime     whenListTheFileTime      Difference(theFileExistenceTime)     isTakeTheFile
> 14:46:00         15:00:00                     4min                               false
> 14:47:00         15:02:00(delayed for 2min) 5min                               true
> 14:48:00         15:02:00                     4min                               false
> lost first file and last file.
>  
>  
> so, do not use the System.currentTime.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)