You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2015/01/17 14:08:34 UTC
[jira] [Created] (HADOOP-11487) NativeS3FileSystem.getStatus must
retry on FileNotFoundException
Paulo Motta created HADOOP-11487:
------------------------------------
Summary: NativeS3FileSystem.getStatus must retry on FileNotFoundException
Key: HADOOP-11487
URL: https://issues.apache.org/jira/browse/HADOOP-11487
Project: Hadoop Common
Issue Type: Bug
Components: fs, fs/s3
Reporter: Paulo Motta
I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm getting the following exception:
{code:java}
2015-01-16 20:53:18,187 ERROR [main] org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
java.io.FileNotFoundException: No such file or directory 's3n://s3-bucket/file.gz'
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
at org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.FileNotFoundException: No such file or directory 's3n://s3-bucket/file.gz'
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
at org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
{code}
However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. So probably due to Amazon's S3 eventual consistency the job failure.
In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus must use fs.s3.maxRetries property in order to avoid failures like this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)