You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2012/08/02 11:35:02 UTC
[jira] [Updated] (PIG-2856) AvroStorage doesn't load files in the
directories when a glob pattern matches both files and directories.
[ https://issues.apache.org/jira/browse/PIG-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-2856:
-------------------------------
Attachment: PIG-2856.patch
Attach is a patch that fixes the bug in getAllSubDirs() and updates the unit test testGlob1.
Regarding the test, "expected_test_dir_1.avro" includes files in test_dir1 but doesn't include ones in its sub-directory test_subdir. On the other hand, "expected_testDir.avro" includes files not only test_dir1 but also its sub-directory test_subdir.
Since all files in test_dir1 and its sub-directory are supposed to be loaded, "expected_testDir.avro" is used.
> AvroStorage doesn't load files in the directories when a glob pattern matches both files and directories.
> ---------------------------------------------------------------------------------------------------------
>
> Key: PIG-2856
> URL: https://issues.apache.org/jira/browse/PIG-2856
> Project: Pig
> Issue Type: Bug
> Components: piggybank
> Affects Versions: 0.10.0
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Attachments: PIG-2856.patch
>
>
> This is a regression from PIG-2492.
> When a glob pattern such as '*' matches not only files but also directories, AvroStorage does not load files in the directories. This is a bug in getAllSubDirs() that can be fixed as follows:
> {code}
> static boolean getAllSubDirs(Path path, Job job, Set<Path> paths)
> ...
> FileStatus[] matchedFiles = fs.globStatus(path, PATH_FILTER);
> ...
> for (FileStatus file : matchedFiles) {
> if (file.isDir()) {
> - for (FileStatus sub : fs.listStatus(path)) {
> + for (FileStatus sub : fs.listStatus(file.getPath())) {
> getAllSubDirs(sub.getPath(), job, paths);
> }
> }
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira