You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by Samir J Patel <Sa...@notesdev.ibm.com> on 2011/01/14 23:51:06 UTC
Unable to copy data from S3 Block FileSystem (URI scheme: s3) using hadoop
distcp
I have been trying to copy block data from S3 using the hadoop distcp
command but it doesn't seem to work. I am noticing that the distcp gets
stuck in an infinite loop. I am creating an S3 URI with the AWS access
key id, AWS Secret AccessKey, and bucket name (as specified in the Hadoop
wiki: http://wiki.apache.org/hadoop/AmazonS3)
Example:
Say I have a S3 bucket with block data called "blockDir" and I want to
copy the content to my local hdfs in a top level directory called
"myHadoopDir". From my hadoop home directory, I use the following command
to perform the distcp:
bin/hadoop distcp s3://<awsAccessKeyId>:<awsSecretAccessKey>@blockDir/
/myHadoopDir
This causes the distcp to hang and the map reduce job is never started to
copy the data. I am using the s3 URI scheme since the data I am trying to
copy is block-based.
If I try to copy data from a S3 Native FileSystem directory, it works
correctly (example: bin/hadoop distcp
s3n://<awsAccessKeyId>:<awsSecretAccessKey>@fileDir/ /myHadoopDir). In
this case, I used the s3n URI scheme. Does anyone have an idea on why the
s3 URI copy would fail?