You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Tom White <to...@gmail.com> on 2008/06/01 08:58:09 UTC

Re: distcp/ls fails on Hadoop-0.17.0 on ec2.

Hi Einar,

How did you put the data onto S3, using Hadoop's S3 FileSystem or
using other S3 tools? If it's the latter then it won't work as the s3
scheme is for Hadoop's block-based S3 storage. Native S3 support is
coming - see https://issues.apache.org/jira/browse/HADOOP-930, but
it's not integrated yet.

Tom

On Thu, May 29, 2008 at 10:15 PM, Einar Vollset
<ei...@somethingsimpler.com> wrote:
> Hi,
>
> I'm using the current Hadoop ec2 image (ami-ee53b687), and am having
> some trouble getting hadoop
> to access S3. Specifically, I'm trying to copy files from my bucket,
> into HDFS on the running cluster, so
> (on the master on the booted cluster) I do:
>
> hadoop-0.17.0 einar$ bin/hadoop distcp
> s3://ID:SECRET@my-test-bucket-with-alot-of-data/ input
> 08/05/29 14:10:44 INFO util.CopyFiles: srcPaths=[
> s3://ID:SECRET@my-test-bucket-with-alot-of-data/]
> 08/05/29 14:10:44 INFO util.CopyFiles: destPath=input
> 08/05/29 14:10:46 WARN fs.FileSystem: "localhost:9000" is a deprecated
> filesystem name. Use "hdfs://localhost:9000/" instead.
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input
> source  s3://ID:SECRET@my-test-bucket-with-alot-of-data/ does not
> exist.
>        at org.apache.hadoop.util.CopyFiles.checkSrcPath(CopyFiles.java:578)
>        at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:594)
>        at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)
>
> ..which clearly doesn't work. The ID:SECRET are right - as if I change
> them I get :
>
> org.jets3t.service.S3ServiceException: S3 HEAD request failed.
> ResponseCode=403, ResponseMessage=Forbidden
> ..etc
>
> I suspect it might be a generic problem, as if I do:
>
> bin/hadoop fs -ls  s3://ID:SECRET@my-test-bucket-with-alot-of-data/
>
> I get:
> ls: Cannot access s3://ID:SECRET@my-test-bucket-with-alot-of-data/ :
> No such file or directory.
>
>
> ..even though the bucket is there and has a lot of data in it.
>
>
> Any thoughts?
>
> Cheers,
>
> Einar
>

Re: distcp/ls fails on Hadoop-0.17.0 on ec2.

Posted by Einar Vollset <ei...@somethingsimpler.com>.

Hi Tom.

Ah... From reading (your?) article:

http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873&categoryID=112

I got confused; it seems to suggest that distcp is used to move
ordinary S3 objects onto HDFS..

Thanks for the clarification.

Cheers,

Einar


On Sat, May 31, 2008 at 11:58 PM, Tom White <to...@gmail.com> wrote:
> Hi Einar,
>
> How did you put the data onto S3, using Hadoop's S3 FileSystem or
> using other S3 tools? If it's the latter then it won't work as the s3
> scheme is for Hadoop's block-based S3 storage. Native S3 support is
> coming - see https://issues.apache.org/jira/browse/HADOOP-930, but
> it's not integrated yet.
>
> Tom
>
> On Thu, May 29, 2008 at 10:15 PM, Einar Vollset
> <ei...@somethingsimpler.com> wrote:
>> Hi,
>>
>> I'm using the current Hadoop ec2 image (ami-ee53b687), and am having
>> some trouble getting hadoop
>> to access S3. Specifically, I'm trying to copy files from my bucket,
>> into HDFS on the running cluster, so
>> (on the master on the booted cluster) I do:
>>
>> hadoop-0.17.0 einar$ bin/hadoop distcp
>> s3://ID:SECRET@my-test-bucket-with-alot-of-data/ input
>> 08/05/29 14:10:44 INFO util.CopyFiles: srcPaths=[
>> s3://ID:SECRET@my-test-bucket-with-alot-of-data/]
>> 08/05/29 14:10:44 INFO util.CopyFiles: destPath=input
>> 08/05/29 14:10:46 WARN fs.FileSystem: "localhost:9000" is a deprecated
>> filesystem name. Use "hdfs://localhost:9000/" instead.
>> With failures, global counters are inaccurate; consider running with -i
>> Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input
>> source  s3://ID:SECRET@my-test-bucket-with-alot-of-data/ does not
>> exist.
>>        at org.apache.hadoop.util.CopyFiles.checkSrcPath(CopyFiles.java:578)
>>        at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:594)
>>        at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>        at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)
>>
>> ..which clearly doesn't work. The ID:SECRET are right - as if I change
>> them I get :
>>
>> org.jets3t.service.S3ServiceException: S3 HEAD request failed.
>> ResponseCode=403, ResponseMessage=Forbidden
>> ..etc
>>
>> I suspect it might be a generic problem, as if I do:
>>
>> bin/hadoop fs -ls  s3://ID:SECRET@my-test-bucket-with-alot-of-data/
>>
>> I get:
>> ls: Cannot access s3://ID:SECRET@my-test-bucket-with-alot-of-data/ :
>> No such file or directory.
>>
>>
>> ..even though the bucket is there and has a lot of data in it.
>>
>>
>> Any thoughts?
>>
>> Cheers,
>>
>> Einar
>>
>



-- 
Einar Vollset
Chief Scientist
Something Simpler Systems

690 - 220 Cambie St
Vancouver, BC V6B 2M9
Canada

ph: +1-778-987-4256
http://somethingsimpler.com