You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Sivaprasanna Sethuraman (JIRA)" <ji...@apache.org> on 2018/03/21 16:45:00 UTC

[jira] [Resolved] (NIFI-2705) ListHDFS Cannot Be Re-run

     [ https://issues.apache.org/jira/browse/NIFI-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sivaprasanna Sethuraman resolved NIFI-2705.
-------------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.1.0

Fixed in 1.1.0 release. See the related issue NIFI-2831

> ListHDFS Cannot Be Re-run
> -------------------------
>
>                 Key: NIFI-2705
>                 URL: https://issues.apache.org/jira/browse/NIFI-2705
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework, Documentation &amp; Website
>    Affects Versions: 1.0.0
>            Reporter: Alan Jackoway
>            Priority: Major
>             Fix For: 1.1.0
>
>
> I have a use case where every day I want to go through a directory in HDFS and do something to the files more than a month old.
> I was trying to do this with a flow like ListHDFS -> RouteOnAttribute (hdfs.lastModified) -> FetchHDFS -> Processing.
> However, after I ran it once, old files were not pulled any more. I turned on debug logging and got this:
> {noformat}
> 2016-08-30 06:15:17,473 DEBUG [Timer-Driven Process Thread-9] o.apache.nifi.processors.hadoop.ListHDFS ListHDFS[id=d80a1ceb-0156-1000-595d-978dcf53ecb6] Found a total of 3 files in HDFS
> 2016-08-30 06:15:17,473 DEBUG [Timer-Driven Process Thread-9] o.apache.nifi.processors.hadoop.ListHDFS ListHDFS[id=d80a1ceb-0156-1000-595d-978dcf53ecb6] Of the 3 files found in HDFS, 0 are listable
> 2016-08-30 06:15:17,473 DEBUG [Timer-Driven Process Thread-9] o.apache.nifi.processors.hadoop.ListHDFS ListHDFS[id=d80a1ceb-0156-1000-595d-978dcf53ecb6] There is no data to list. Yielding.
> {noformat}
> It turns out that ListHDFS maintains state called {{latestTimestampListed}} that prevents it from re-listing files unless you change the directory being listed. At a minimum, that should be mentioned in the docs on ListHDFS. Better would be to make it configurable more like GetHDFS.
> In my case I think I can change to using GetHDFS without causing trouble, but the behavior of ListHDFS was surprising to me, and as far as I can tell is not documented anywhere.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)