You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Kevin Weil (JIRA)" <ji...@apache.org> on 2008/12/17 21:18:44 UTC
[jira] Created: (PIG-569) Inconsistency with Hadoop in Pig load
statements involving globs with subdirectories
Inconsistency with Hadoop in Pig load statements involving globs with subdirectories
------------------------------------------------------------------------------------
Key: PIG-569
URL: https://issues.apache.org/jira/browse/PIG-569
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: types_branch
Environment: FC Linux x86/64
Reporter: Kevin Weil
Fix For: types_branch
Pig cannot handle LOAD statements with Hadoop globs where the globs have subdirectories. For example,
A = LOAD 'dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}' USING ...
A similar statement in Hadoop, hadoop dfs -ls dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}, does work correctly.
The output of running the above load statement in pig, built from svn revision 724576, is:
2008-12-17 12:02:28,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Map reduce job failed
2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: Unable to get collect for pattern dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}} [Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}]
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:231)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:40)
at org.apache.pig.impl.io.FileLocalizer.globMatchesFiles(FileLocalizer.java:486)
at org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:455)
at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:108)
at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
at org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.pig.backend.datastorage.DataStorageException: Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}
... 13 more
Caused by: java.io.IOException: Illegal file pattern: Expecting set closure character or end of range, or } for glob {dir1 at 5
at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1084)
at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1069)
at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:987)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:953)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:902)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:862)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:215)
... 12 more
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-569) Inconsistency with Hadoop in Pig load
statements involving globs with subdirectories
Posted by "Richard Spring (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Spring updated PIG-569:
-------------------------------
Comment: was deleted
> Inconsistency with Hadoop in Pig load statements involving globs with subdirectories
> ------------------------------------------------------------------------------------
>
> Key: PIG-569
> URL: https://issues.apache.org/jira/browse/PIG-569
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Environment: FC Linux x86/64, Pig revision 724576
> Reporter: Kevin Weil
> Fix For: types_branch
>
>
> Pig cannot handle LOAD statements with Hadoop globs where the globs have subdirectories. For example,
> A = LOAD 'dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}' USING ...
> A similar statement in Hadoop, hadoop dfs -ls dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}, does work correctly.
> The output of running the above load statement in pig, built from svn revision 724576, is:
> 2008-12-17 12:02:28,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
> 2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Map reduce job failed
> 2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: Unable to get collect for pattern dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}} [Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}]
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:231)
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:40)
> at org.apache.pig.impl.io.FileLocalizer.globMatchesFiles(FileLocalizer.java:486)
> at org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:455)
> at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:108)
> at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> at org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.pig.backend.datastorage.DataStorageException: Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}
> ... 13 more
> Caused by: java.io.IOException: Illegal file pattern: Expecting set closure character or end of range, or } for glob {dir1 at 5
> at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1084)
> at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1069)
> at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:987)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:953)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:902)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:862)
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:215)
> ... 12 more
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-569) Inconsistency with Hadoop in Pig load
statements involving globs with subdirectories
Posted by "Richard Spring (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665537#action_12665537 ]
Richard Spring commented on PIG-569:
------------------------------------
I've come across a similar issue when trying to load subdirectories using globs:
The command works fine via the shell:
hadoop dfs -ls /data/2008/{11/13,11/15}/14/video_impressions
But the following exception occurs when run via Pig if one of the globs does not return any files.:
impressions = LOAD '/data/2008/{11/13,11/15}/14/video_impressions'....;
Exception in thread "Thread-6" java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:873)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:846)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:215)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:40)
at org.apache.pig.impl.io.FileLocalizer.globMatchesFiles(FileLocalizer.java:486)
at org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:455)
at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:114)
at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
at org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:782)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Unknown Source)
> Inconsistency with Hadoop in Pig load statements involving globs with subdirectories
> ------------------------------------------------------------------------------------
>
> Key: PIG-569
> URL: https://issues.apache.org/jira/browse/PIG-569
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Environment: FC Linux x86/64, Pig revision 724576
> Reporter: Kevin Weil
> Fix For: types_branch
>
>
> Pig cannot handle LOAD statements with Hadoop globs where the globs have subdirectories. For example,
> A = LOAD 'dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}' USING ...
> A similar statement in Hadoop, hadoop dfs -ls dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}, does work correctly.
> The output of running the above load statement in pig, built from svn revision 724576, is:
> 2008-12-17 12:02:28,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
> 2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Map reduce job failed
> 2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: Unable to get collect for pattern dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}} [Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}]
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:231)
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:40)
> at org.apache.pig.impl.io.FileLocalizer.globMatchesFiles(FileLocalizer.java:486)
> at org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:455)
> at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:108)
> at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> at org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.pig.backend.datastorage.DataStorageException: Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}
> ... 13 more
> Caused by: java.io.IOException: Illegal file pattern: Expecting set closure character or end of range, or } for glob {dir1 at 5
> at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1084)
> at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1069)
> at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:987)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:953)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:902)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:862)
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:215)
> ... 12 more
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-569) Inconsistency with Hadoop in Pig load
statements involving globs with subdirectories
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659462#action_12659462 ]
Tom White commented on PIG-569:
-------------------------------
Hadoop has supported this only since 0.19 (see HADOOP-3498, PIG-252). Are you using Hadoop 0.19? You could be getting this error since Pig is using the libraries from Hadoop 0.17 or 0.18.
> Inconsistency with Hadoop in Pig load statements involving globs with subdirectories
> ------------------------------------------------------------------------------------
>
> Key: PIG-569
> URL: https://issues.apache.org/jira/browse/PIG-569
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Environment: FC Linux x86/64, Pig revision 724576
> Reporter: Kevin Weil
> Fix For: types_branch
>
>
> Pig cannot handle LOAD statements with Hadoop globs where the globs have subdirectories. For example,
> A = LOAD 'dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}' USING ...
> A similar statement in Hadoop, hadoop dfs -ls dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}, does work correctly.
> The output of running the above load statement in pig, built from svn revision 724576, is:
> 2008-12-17 12:02:28,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
> 2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Map reduce job failed
> 2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: Unable to get collect for pattern dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}} [Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}]
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:231)
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:40)
> at org.apache.pig.impl.io.FileLocalizer.globMatchesFiles(FileLocalizer.java:486)
> at org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:455)
> at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:108)
> at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> at org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.pig.backend.datastorage.DataStorageException: Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}
> ... 13 more
> Caused by: java.io.IOException: Illegal file pattern: Expecting set closure character or end of range, or } for glob {dir1 at 5
> at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1084)
> at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1069)
> at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:987)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:953)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:902)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:862)
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:215)
> ... 12 more
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-569) Inconsistency with Hadoop in Pig load
statements involving globs with subdirectories
Posted by "Kevin Weil (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Weil updated PIG-569:
---------------------------
Environment: FC Linux x86/64, Pig revision 724576 (was: FC Linux x86/64)
> Inconsistency with Hadoop in Pig load statements involving globs with subdirectories
> ------------------------------------------------------------------------------------
>
> Key: PIG-569
> URL: https://issues.apache.org/jira/browse/PIG-569
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Environment: FC Linux x86/64, Pig revision 724576
> Reporter: Kevin Weil
> Fix For: types_branch
>
>
> Pig cannot handle LOAD statements with Hadoop globs where the globs have subdirectories. For example,
> A = LOAD 'dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}' USING ...
> A similar statement in Hadoop, hadoop dfs -ls dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}, does work correctly.
> The output of running the above load statement in pig, built from svn revision 724576, is:
> 2008-12-17 12:02:28,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
> 2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Map reduce job failed
> 2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: Unable to get collect for pattern dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}} [Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}]
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:231)
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:40)
> at org.apache.pig.impl.io.FileLocalizer.globMatchesFiles(FileLocalizer.java:486)
> at org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:455)
> at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:108)
> at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> at org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.pig.backend.datastorage.DataStorageException: Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}
> ... 13 more
> Caused by: java.io.IOException: Illegal file pattern: Expecting set closure character or end of range, or } for glob {dir1 at 5
> at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1084)
> at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1069)
> at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:987)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:953)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:902)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:862)
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:215)
> ... 12 more
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-569) Inconsistency with Hadoop in Pig load
statements involving globs with subdirectories
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai resolved PIG-569.
----------------------------
Resolution: Duplicate
It is a duplicate of [PIG-252|https://issues.apache.org/jira/browse/PIG-252]
> Inconsistency with Hadoop in Pig load statements involving globs with subdirectories
> ------------------------------------------------------------------------------------
>
> Key: PIG-569
> URL: https://issues.apache.org/jira/browse/PIG-569
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: types_branch
> Environment: FC Linux x86/64, Pig revision 724576
> Reporter: Kevin Weil
> Fix For: types_branch
>
>
> Pig cannot handle LOAD statements with Hadoop globs where the globs have subdirectories. For example,
> A = LOAD 'dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}' USING ...
> A similar statement in Hadoop, hadoop dfs -ls dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}, does work correctly.
> The output of running the above load statement in pig, built from svn revision 724576, is:
> 2008-12-17 12:02:28,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
> 2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Map reduce job failed
> 2008-12-17 12:02:28,480 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - java.io.IOException: Unable to get collect for pattern dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}} [Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}]
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:231)
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:40)
> at org.apache.pig.impl.io.FileLocalizer.globMatchesFiles(FileLocalizer.java:486)
> at org.apache.pig.impl.io.FileLocalizer.fileExists(FileLocalizer.java:455)
> at org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:108)
> at org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
> at org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
> at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.pig.backend.datastorage.DataStorageException: Failed to obtain glob for dir/{dir1/subdir1,dir2/subdir2,dir3/subdir3}
> ... 13 more
> Caused by: java.io.IOException: Illegal file pattern: Expecting set closure character or end of range, or } for glob {dir1 at 5
> at org.apache.hadoop.fs.FileSystem$GlobFilter.error(FileSystem.java:1084)
> at org.apache.hadoop.fs.FileSystem$GlobFilter.setRegex(FileSystem.java:1069)
> at org.apache.hadoop.fs.FileSystem$GlobFilter.<init>(FileSystem.java:987)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:953)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:962)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:902)
> at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:862)
> at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asCollection(HDataStorage.java:215)
> ... 12 more
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.