You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Benjamin Kim (JIRA)" <ji...@apache.org> on 2012/10/10 13:31:03 UTC
[jira] [Created] (MAPREDUCE-4718) MapReduce fails If I pass a
parameter as a S3 folder
Benjamin Kim created MAPREDUCE-4718:
---------------------------------------
Summary: MapReduce fails If I pass a parameter as a S3 folder
Key: MAPREDUCE-4718
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4718
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: job submission
Affects Versions: 1.0.3, 1.0.0
Environment: Hadoop with default configurations
Reporter: Benjamin Kim
I'm running a wordcount MR as follows
hadoop jar WordCount.jar wordcount.WordCountDriver s3n://bucket/wordcount/input s3n://bucket/wordcount/output
s3n://bucket/wordcount/input is a s3 object that contains other input files.
However I get following NPE error
12/10/02 18:56:23 INFO mapred.JobClient: map 0% reduce 0%
12/10/02 18:56:54 INFO mapred.JobClient: map 50% reduce 0%
12/10/02 18:56:56 INFO mapred.JobClient: Task Id : attempt_201210021853_0001_m_000001_0, Status : FAILED
java.lang.NullPointerException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:106)
at java.io.BufferedInputStream.close(BufferedInputStream.java:451)
at java.io.FilterInputStream.close(FilterInputStream.java:155)
at org.apache.hadoop.util.LineReader.close(LineReader.java:83)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.close(LineRecordReader.java:144)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:497)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
MR runs fine if i specify more specific input path such as s3n://bucket/wordcount/input/file.txt
MR fails if I pass s3 folder as a parameter
In summary,
This works
hadoop jar ./hadoop-examples-1.0.3.jar wordcount /user/hadoop/wordcount/input/ s3n://bucket/wordcount/output/
This doesn't work
hadoop jar ./hadoop-examples-1.0.3.jar wordcount s3n://bucket/wordcount/input/ s3n://bucket/wordcount/output/
(both input path are directories)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4718) MapReduce fails If I pass a
parameter as a S3 folder
Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473964#comment-13473964 ]
Steve Loughran commented on MAPREDUCE-4718:
-------------------------------------------
Looks like this is triggered on a line that is {{in.close()}}, if the input stream is null.
This shouldn't happen at construction time -because the opening code should have failed, but it does appear possible
in the {{seek()}} command, which first closes the existing stream, then calls {{store.retrieve(key, pos)}} -an operation that can return null if S3 doesn't have that key.
At the very least, {{close()}} should be made robust against a null inner input stream; maybe the seek operation should convert a null retrieve operation into an {{IOException}}.
> MapReduce fails If I pass a parameter as a S3 folder
> ----------------------------------------------------
>
> Key: MAPREDUCE-4718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4718
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: job submission
> Affects Versions: 1.0.0, 1.0.3
> Environment: Hadoop with default configurations
> Reporter: Benjamin Kim
>
> I'm running a wordcount MR as follows
> hadoop jar WordCount.jar wordcount.WordCountDriver s3n://bucket/wordcount/input s3n://bucket/wordcount/output
>
> s3n://bucket/wordcount/input is a s3 object that contains other input files.
> However I get following NPE error
> 12/10/02 18:56:23 INFO mapred.JobClient: map 0% reduce 0%
> 12/10/02 18:56:54 INFO mapred.JobClient: map 50% reduce 0%
> 12/10/02 18:56:56 INFO mapred.JobClient: Task Id : attempt_201210021853_0001_m_000001_0, Status : FAILED
> java.lang.NullPointerException
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:106)
> at java.io.BufferedInputStream.close(BufferedInputStream.java:451)
> at java.io.FilterInputStream.close(FilterInputStream.java:155)
> at org.apache.hadoop.util.LineReader.close(LineReader.java:83)
> at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.close(LineRecordReader.java:144)
> at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:497)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> MR runs fine if i specify more specific input path such as s3n://bucket/wordcount/input/file.txt
> MR fails if I pass s3 folder as a parameter
> In summary,
> This works
> hadoop jar ./hadoop-examples-1.0.3.jar wordcount /user/hadoop/wordcount/input/ s3n://bucket/wordcount/output/
> This doesn't work
> hadoop jar ./hadoop-examples-1.0.3.jar wordcount s3n://bucket/wordcount/input/ s3n://bucket/wordcount/output/
> (both input path are directories)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira