You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Frederick Pletz (Jira)" <ji...@apache.org> on 2020/04/10 16:16:00 UTC

[jira] [Created] (NIFI-7352) Improve PutFile State Handling

Frederick Pletz created NIFI-7352:
-------------------------------------

             Summary: Improve PutFile State Handling
                 Key: NIFI-7352
                 URL: https://issues.apache.org/jira/browse/NIFI-7352
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
            Reporter: Frederick Pletz


Currently PutFile has three conflict resolution states: REPLACE, IGNORE, FAIL.  REPLACE writes the new file to disk over the old file and transfers the file to SUCCESS.  FAIL does not replace the file on disk and transfers the file to FAIL.  IGNORE does not replace the file on disk and transfers the file to SUCCESS.  This breakout is less than useful, it is actively inviting misunderstanding and miss-use.  It is very easy to assume IGNORE would instead have the following behavior: write to disk, but keep both original and new file by appending notation information to the end of the filename - similar to how filename conflicts are handled in other programs.  I have personal experience with this misinterpretation causing a project to drop data for an extended period of time without realizing it.  Additionally, the FAIL state is not optimally useful in its current state as it is indistinguishable from other failure states, such as folder does not exist or lack of write permissions.

 

Desired result: there should be a way to key off a greater degree of detail from a PutFile processor.  The easiest from a user perspective would be correcting the output queues to include a "FAIL_DUPLICATE" output, opposed to a single generic "FAIL" output.  This would remove the need for "IGNORE", since that function could be performed by using "FAIL_DUPLICATE" in the desired way - most likely by auto-terminating that relationship.  Barring that, an attribute added to the flow file on output could give better indication of what happened related to the success or failure of the processor - was it ignored?  Written to disk?  if it failed, what was the failure: duplicate filename, write permission, folder didn't exist?

 

A note toward backwards compatibility: I think the more likely result from the NiFi team is the attribute route since it prevents breaking backwards compatibility, however, I would caution that this also means teams which are using "IGNORE" with an incorrect understanding of what that option means will continue to be unaware they are dropping data.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)