You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Arun Jacob (JIRA)" <ji...@apache.org> on 2009/05/11 19:35:45 UTC
[jira] Created: (HADOOP-5805) problem using top level s3 buckets as
input/output directories
problem using top level s3 buckets as input/output directories
--------------------------------------------------------------
Key: HADOOP-5805
URL: https://issues.apache.org/jira/browse/HADOOP-5805
Project: Hadoop Core
Issue Type: Bug
Components: fs/s3
Affects Versions: 0.18.3
Environment: ec2, cloudera AMI, 20 nodes
Reporter: Arun Jacob
When I specify top level s3 buckets as input or output directories, I get the following exception.
hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
The workaround is to specify input/output buckets with sub-directories:
hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5805) problem using top level s3 buckets as
input/output directories
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White updated HADOOP-5805:
------------------------------
Status: Patch Available (was: Open)
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Assignee: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch, HADOOP-5805-2.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5805) problem using top level s3 buckets
as input/output directories
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711910#action_12711910 ]
Hadoop QA commented on HADOOP-5805:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12408752/HADOOP-5805-1.patch
against trunk revision 777330.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 4 new or modified tests.
-1 patch. The patch command could not apply the patch.
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/375/console
This message is automatically generated.
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Assignee: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5805) problem using top level s3 buckets
as input/output directories
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718620#action_12718620 ]
Hudson commented on HADOOP-5805:
--------------------------------
Integrated in Hadoop-trunk #863 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/])
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Assignee: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch, HADOOP-5805-2.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5805) problem using top level s3 buckets
as input/output directories
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713840#action_12713840 ]
Hadoop QA commented on HADOOP-5805:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12409019/HADOOP-5805-2.patch
against trunk revision 779338.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 4 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 Eclipse classpath. The patch retains Eclipse classpath integrity.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/415/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/415/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/415/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/415/console
This message is automatically generated.
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Assignee: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch, HADOOP-5805-2.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5805) problem using top level s3 buckets as
input/output directories
Posted by "Ian Nowland (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Nowland updated HADOOP-5805:
--------------------------------
Attachment: HADOOP-5805-1.patch
New patch against trunk. Moved test and added assert.
Also created https://issues.apache.org/jira/browse/HADOOP-5889
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Assignee: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5805) problem using top level s3 buckets as
input/output directories
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White updated HADOOP-5805:
------------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
I've just committed this. Thanks Ian!
(The contrib test failure was unrelated.)
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Assignee: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch, HADOOP-5805-2.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5805) problem using top level s3 buckets as
input/output directories
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White updated HADOOP-5805:
------------------------------
Attachment: HADOOP-5805-2.patch
For some reason the patch didn't apply. Here's a regenerated version.
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Assignee: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch, HADOOP-5805-2.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5805) problem using top level s3 buckets as
input/output directories
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White updated HADOOP-5805:
------------------------------
Status: Open (was: Patch Available)
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Assignee: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5805-0.patch, HADOOP-5805-1.patch, HADOOP-5805-2.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5805) problem using top level s3 buckets as
input/output directories
Posted by "Ian Nowland (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Nowland updated HADOOP-5805:
--------------------------------
Attachment: HADOOP-5805-0.patch
There are two problems here.
The first is that S3N currently requires a terminating slash on the URI to indicate the root of a bucket. That is it accepts s3n://infocloud-input/ but not s3n://infocloud-input. This is fixed by the attached patch which allows either form to be used.
This fixes the input bucket case but not the output one.
The second problem is then that S3N requires any bucket to exist for it to be able to use it. But if you attempt to use its "root" as the output then you will get the standard Hadoop behavior of throwing an FileAlreadyExistsException exception from FileOutputFormat, even if the bucket is empty, as the root directory "/" of the bucket does exist. To me the ideal fix for this second problem is to change FileOutputFormat to not throw if the output directory exists but is empty. However that seems a fairly large change to the established behavior, so I did not include it with the more trivial patch.
As an aside since each AWS account only gets 100 buckets that it can use, you generally don't want to be writing the output of each job to a new bucket anyway.
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Attachments: HADOOP-5805-0.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5805) problem using top level s3 buckets
as input/output directories
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711643#action_12711643 ]
Tom White commented on HADOOP-5805:
-----------------------------------
This looks like a good fix. The test should do an assert to check that it gets back an appropriate FileStatus object.
The patch needs to be regenerated since the tests have moved from src/test to src/test/core.
For the second problem, you could subclass your output format to override checkOutputSpecs() so it doesn't throw FileAlreadyExistsException. But I agree it would be nicer to deal with this generally. Perhaps open a separate Jira as it would affect more than NativeS3FileSystem.
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Fix For: 0.21.0
>
> Attachments: HADOOP-5805-0.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-5805) problem using top level s3 buckets
as input/output directories
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom White reassigned HADOOP-5805:
---------------------------------
Assignee: Ian Nowland
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Assignee: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5805-0.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5805) problem using top level s3 buckets as
input/output directories
Posted by "Ian Nowland (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Nowland updated HADOOP-5805:
--------------------------------
Fix Version/s: 0.21.0
Status: Patch Available (was: Open)
> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
> Key: HADOOP-5805
> URL: https://issues.apache.org/jira/browse/HADOOP-5805
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Environment: ec2, cloudera AMI, 20 nodes
> Reporter: Arun Jacob
> Fix For: 0.21.0
>
> Attachments: HADOOP-5805-0.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.