You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@nifi.apache.org by "Alessandro D'Armiento (JIRA)" <ji...@apache.org> on 2019/07/20 13:18:00 UTC

[jira] [Updated] (NIFI-6462) ListHDFS should be triggerable

     [ https://issues.apache.org/jira/browse/NIFI-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alessandro D'Armiento updated NIFI-6462:
----------------------------------------
    Priority: Minor  (was: Major)

> ListHDFS should be triggerable
> ------------------------------
>
>                 Key: NIFI-6462
>                 URL: https://issues.apache.org/jira/browse/NIFI-6462
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.9.2
>            Reporter: Alessandro D'Armiento
>            Priority: Minor
>
> h2. Current Situation
> ListHDFS is designed to be (only) the entry point of a data integration pipeline, and therefore can only be triggered on a cron or time base. 
> h2. Improvement Proposal
> ListHDFS should be able to be used as part of your pipeline even if you do not expect to have it as the entry point. To obtain it: 
> * It has to be triggerable
> * Trigger flowfile should be able to bring the listing directory as an attribute
> * Some logic, such as the "skip the last file in the listing directory" should be made optional
> * Since the processor will work on a 1:N semantic (1 input trigger flowfile, N output flowfiles) it would be nice to support fragmentation attributes (for example for subsequent merge operations)
>   * It would be also useful to support different fragmentation strategies, in order to support multiple user cases. For example, it should be possible to select:
>     *  A "one for all" fragmentation strategy which will create a single fragmentation group. Therefore, all files will have the same fragment.identifier, the same fragment.count, equal to the total number N of listed files, and fragment.index ∈ [0, N).
>     *  A "per subdir" fragmentation strategy which will create different fragmentation groups, one for each scanned subdirectory of the given path. Therefore, for each subfolder, flowfiles will have a specific fragment.identifier, fragment.count will be, for each flowfile, equal to the number Ni of files in the i-th directory, and fragment.index ∈ [0, Ni).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)