You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2009/05/21 17:06:45 UTC
[jira] Commented: (HADOOP-5805) problem using top level s3 buckets as input/output directories

    [ https://issues.apache.org/jira/browse/HADOOP-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711643#action_12711643 ] 

Tom White commented on HADOOP-5805:
-----------------------------------

This looks like a good fix. The test should do an assert to check that it gets back an appropriate FileStatus object.

The patch needs to be regenerated since the tests have moved from src/test to src/test/core.

For the second problem, you could subclass your output format to override checkOutputSpecs() so it doesn't throw FileAlreadyExistsException. But I agree it would be nicer to deal with this generally. Perhaps open a separate Jira as it would affect more than NativeS3FileSystem.  



> problem using top level s3 buckets as input/output directories
> --------------------------------------------------------------
>
>                 Key: HADOOP-5805
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5805
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 0.18.3
>         Environment: ec2, cloudera AMI, 20 nodes
>            Reporter: Arun Jacob
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5805-0.patch
>
>
> When I specify top level s3 buckets as input or output directories, I get the following exception.
> hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output
> java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
>         at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
>         at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
>         at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
>         at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>         at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> The workaround is to specify input/output buckets with sub-directories:
>  
> hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir  s3n://infocloud-output/output-subdir

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.