You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Alessandro D'Armiento (JIRA)" <ji...@apache.org> on 2019/07/22 12:53:00 UTC
[jira] [Created] (NIFI-6465) ListHDFS: skip last should be optional
Alessandro D'Armiento created NIFI-6465:
-------------------------------------------
Summary: ListHDFS: skip last should be optional
Key: NIFI-6465
URL: https://issues.apache.org/jira/browse/NIFI-6465
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Affects Versions: 1.9.2
Reporter: Alessandro D'Armiento
h2. Current Situation
From [official documentation|https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.9.2/org.apache.nifi.processors.hadoop.ListHDFS/index.html]
* Each time a listing is performed, the files with the latest timestamp will be excluded and picked up during the next execution of the processor. This is done to ensure that we do not miss any files, or produce duplicates, in the cases where files with the same timestamp are written immediately before and after a single execution of the processor.
h2. Improvement Proposal
* If we are calling the ListHDFS only after a certain operation which populates an HDFS directory has finished, it is pointless to skip the last file, and avoiding this behavior is tricky.
* A mandatory property "skip last" should be implemented in order to be able to actively decide whether or not this behavior is necessary, based on the use case.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)