You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@nifi.apache.org by "Pierre Villard (JIRA)" <ji...@apache.org> on 2017/01/03 15:23:58 UTC

[jira] [Commented] (NIFI-2859) List + Fetch HDFS processors are reading part files from HDFS

    [ https://issues.apache.org/jira/browse/NIFI-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795291#comment-15795291 ] 

Pierre Villard commented on NIFI-2859:
--------------------------------------

The problem is that PutHDFS is using a dot as a prefix when putting files into HDFS and then the processor renames the files. The easiest fix is to remove files starting with a "." from the listed files (I'll issue a PR implementing this change). However, one could say that this is a breaking change even though I believe that listing such files is a bug. If this change is not acceptable, another option is to add a true/false property ("ignore temp files") to let the user choose the behavior but it seems like an overkill and users would certainly ask why this property is defaulted to false.

> List + Fetch HDFS processors are reading part files from HDFS
> -------------------------------------------------------------
>
>                 Key: NIFI-2859
>                 URL: https://issues.apache.org/jira/browse/NIFI-2859
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.0.0
>            Reporter: Mahesh Nayak
>
> Create the following ProcessGroups
> GetFile --> PutHdfs --> PutFile
> ListHDFS --> FetchHdfs --> putFile
> 2. Now start both the processGroups
> 3. Write lots of files into HDFS so that ListHDFS keeps listing and FetchHdfs fetches.
> 4. An exception is thrown because the processor reads the part file from the putHdfs folder
> {code:none}
> java.io.FileNotFoundException: File does not exist: /tmp/HDFSProcessorsTest_visjJMcHORUwigw/.ycnVSpBOzEaoTWk_7f37d5af-d4a4-4521-b60d-c3c11ae19669
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1860)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1831)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1744)
> {code}
> Note that eventually the file is copied to the output successfully, but at the same time there are some files in the failure/comms failure relationship



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)