You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by Samir J Patel <Sa...@notesdev.ibm.com> on 2011/01/14 23:51:06 UTC

Unable to copy data from S3 Block FileSystem (URI scheme: s3) using hadoop distcp

I have been trying to copy block data from S3 using the hadoop distcp 
command but it doesn't seem to work.  I am noticing that the distcp gets 
stuck in an infinite loop.  I am creating an S3 URI with the AWS access 
key id, AWS Secret AccessKey, and bucket name (as specified in the Hadoop 
wiki: http://wiki.apache.org/hadoop/AmazonS3)

Example:

Say I have a S3 bucket with block data called "blockDir" and I want to 
copy the content to my local hdfs in a top level directory called 
"myHadoopDir".  From my hadoop home directory, I use the following command 
to perform the distcp:

bin/hadoop distcp s3://<awsAccessKeyId>:<awsSecretAccessKey>@blockDir/ 
/myHadoopDir

This causes the distcp to hang and the map reduce job is never started to 
copy the data.  I am using the s3 URI scheme since the data I am trying to 
copy is block-based.

If I try to copy data from a S3 Native FileSystem directory, it works 
correctly (example: bin/hadoop distcp 
s3n://<awsAccessKeyId>:<awsSecretAccessKey>@fileDir/ /myHadoopDir).  In 
this case, I used the s3n URI scheme.  Does anyone have an idea on why the 
s3 URI copy would fail?