You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Dere (JIRA)" <ji...@apache.org> on 2014/02/12 22:58:21 UTC

[jira] [Commented] (MAPREDUCE-5756) FileInputFormat.listStatus() including directories in its results

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899641#comment-13899641 ] 

Jason Dere commented on MAPREDUCE-5756:
---------------------------------------

In the 2.x code, isn't that what the recursive flag is there for (mapreduce.input.fileinputformat.input.dir.recursive), to recurse into directories if needed?
If the generated input splits include a directory, it looks like this causes the job to fail because it's expecting a file as opposed to a directory.  Is the onus then on the caller of listStatus() to go through the file list and remove any directories that were included?

Looks like the recursive stuff (with lots of discussion) was added in MAPREDUCE-3193.

> FileInputFormat.listStatus() including directories in its results
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-5756
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5756
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Jason Dere
>
> Trying to track down HIVE-6401, where we see some "is not a file" errors because getSplits() is giving us directories.  I believe the culprit is FileInputFormat.listStatus():
> {code}
>                 if (recursive && stat.isDirectory()) {
>                   addInputPathRecursively(result, fs, stat.getPath(),
>                       inputFilter);
>                 } else {
>                   result.add(stat);
>                 }
> {code}
> Which seems to be allowing directories to be added to the results if recursive is false.  Is this meant to return directories? If not, I think it should look like this:
> {code}
>                 if (stat.isDirectory()) {
>                  if (recursive) {
>                   addInputPathRecursively(result, fs, stat.getPath(),
>                       inputFilter);
>                  }
>                 } else {
>                   result.add(stat);
>                 }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)