You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/02/04 16:07:01 UTC

[jira] [Commented] (NIFI-8081) List[S]FTP can miss files when multiple subdirectories are written while listing

    [ https://issues.apache.org/jira/browse/NIFI-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17278925#comment-17278925 ] 

ASF subversion and git services commented on NIFI-8081:
-------------------------------------------------------

Commit b55998afc18e6765204bac5493f29c47c9f66f9a in nifi's branch refs/heads/main from Tamas Palfy
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=b55998a ]

NIFI-8081 Added new Listing Strategy to ListFTP and ListSFTP: Time Window

NIFI-8081 Added new Listing Strategy to ListFTP and ListSFTP: Adjusted Time Window. User can specify the time zone or time difference (compared to where NiFi runs) of the system hosting the files and based on the calculates the current time there. Lists files modified before this adjusted current time (and after the last listing).
NIFI-8081 'Time Adjustment' validated not to be set if listing strategy is not 'Adjusted Time Window'. Extracted validator to a separate class. Added more tests. Minor refactor. Typo fix.
NIFI-8081 Improved validation.
NIFI-8081 'Time Adjustment' is not necessary - in fact it can cause problems. SFTP (and usually FTP - which has a more general bug at the moment) returns a timestamp that doesn't really need adjustment. (SFTP in particular returns the an 'epoch' time.) Everything remains the same - the new listing strategy relies on a sliding time window, but without the unnecessary option to adjust for the modification time.
NIFI-8081 Resolved conflicts after rebasing to main.
NIFI-8081 Renamed 'AbstractListProcessor.listByAdjustedSlidingTimeWindow' to 'listByTimeWindow'. Post main rebase correction.
NIFI-8081 Updated user doc for the BY_TIME_WINDOW strategy to warn user on it's reliance of accurate time.

This closes #4721.

Signed-off-by: Peter Turcsanyi <tu...@apache.org>


> List[S]FTP can miss files when multiple subdirectories are written while listing
> --------------------------------------------------------------------------------
>
>                 Key: NIFI-8081
>                 URL: https://issues.apache.org/jira/browse/NIFI-8081
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Tamas Palfy
>            Assignee: Tamas Palfy
>            Priority: Major
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> ListFTP and ListSFTP scans subdirectories one after the other and because of this they can have the following issue when using 'Tracking Timestamps' as 'Listing Strategy':
> # Processor starts and finishes listing directory1
> # Processor starts listing directory2
> # file1 arrives in directory1 with ts(timestamp)=1
> # file2 arrives in directory2 (or any other, not yet listed directory) with ts=2
> # Processor finishes listing director2
> # Processor returns result which will contain file2(ts=2) but not file1(ts=1)
> # Processor stores ts=2 as the latest seen timestamp
> # file1 will be filtered out next time (and every subsequent listing) because it's timestamp is less than the lates seen timestamp
> Fix: Leave 'Tracking Timestamps' behaviour as it is (just update documentation) and create a new strategy. This strategy checks the current time in each cycle and lists all files that have arrived before the current time (but after the previous cycle). Compares file timestamps to the current time so it needs to be adjusted with the timezone difference of NiFi and the file hosting system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)