You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Pierre Villard (JIRA)" <ji...@apache.org> on 2017/01/03 15:23:58 UTC
[jira] [Commented] (NIFI-2859) List + Fetch HDFS processors are
reading part files from HDFS
[ https://issues.apache.org/jira/browse/NIFI-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795291#comment-15795291 ]
Pierre Villard commented on NIFI-2859:
--------------------------------------
The problem is that PutHDFS is using a dot as a prefix when putting files into HDFS and then the processor renames the files. The easiest fix is to remove files starting with a "." from the listed files (I'll issue a PR implementing this change). However, one could say that this is a breaking change even though I believe that listing such files is a bug. If this change is not acceptable, another option is to add a true/false property ("ignore temp files") to let the user choose the behavior but it seems like an overkill and users would certainly ask why this property is defaulted to false.
> List + Fetch HDFS processors are reading part files from HDFS
> -------------------------------------------------------------
>
> Key: NIFI-2859
> URL: https://issues.apache.org/jira/browse/NIFI-2859
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 1.0.0
> Reporter: Mahesh Nayak
>
> Create the following ProcessGroups
> GetFile --> PutHdfs --> PutFile
> ListHDFS --> FetchHdfs --> putFile
> 2. Now start both the processGroups
> 3. Write lots of files into HDFS so that ListHDFS keeps listing and FetchHdfs fetches.
> 4. An exception is thrown because the processor reads the part file from the putHdfs folder
> {code:none}
> java.io.FileNotFoundException: File does not exist: /tmp/HDFSProcessorsTest_visjJMcHORUwigw/.ycnVSpBOzEaoTWk_7f37d5af-d4a4-4521-b60d-c3c11ae19669
> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
> at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1860)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1831)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1744)
> {code}
> Note that eventually the file is copied to the output successfully, but at the same time there are some files in the failure/comms failure relationship
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)