You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Siddhartha Reddy <si...@grok.in> on 2008/04/04 10:32:33 UTC

distcp fails when copying from s3 to hdfs

I am trying to run a Hadoop cluster on Amazon EC2 and backup all the data on
Amazon S3 between the runs. I am using Hadoop 0.16.1 on a cluster made up of
CentOS 5 images (ami-08f41161).


I am able to copy from hdfs to S3 using the following command:

bin/hadoop distcp file.txt s3://id:secret@bucket-name/file.txt


But copying from S3 to hdfs with the following command fails:

bin/hadoop distcp s3://id:secret@bucket-name/file.txt file2.txt


with the following error:

With failures, global counters are inaccurate; consider running with -i
Copy failed: java.lang.IllegalArgumentException: Hook previously registered
    at
java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:45)
    at java.lang.Runtime.addShutdownHook(Runtime.java:192)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1194)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:148)
    at org.apache.hadoop.fs.s3.S3FileSystem.initialize(S3FileSystem.java:81)
    at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1180)
    at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1197)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:148)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at org.apache.hadoop.util.CopyFiles.checkSrcPath(CopyFiles.java:482)
    at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:504)
    at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:580)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:596)


Can someone please point out if and what I am doing wrong?

Thanks,
Siddhartha Reddy

Re: distcp fails when copying from s3 to hdfs

Posted by Siddhartha Reddy <si...@grok.in>.
Thanks for the quick response, Tom.
I have just switched to Hadoop 0.16.2 and tried this again. Now I am getting
the following error:

Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source
s3://id:secret@bucket-name/file.txt does not exist.


I copied the file to S3 using the following command:

bin/hadoop distcp file.txt s3://id:secret@bucket-name/file.txt


To check that the file actually exists on S3, I tried the following
commands:

bin/hadoop fs -fs s3://id:secret@bucket-name -ls
bin/hadoop fs -fs s3://id:secret@bucket-name -ls

The first returned nothing, while the second returned the following:

Found 1 items
/_distcp_logs_5vzva5    <dir>           1969-12-31 19:00        rwxrwxrwx


And when I tried to copy it back to hdfs using the following command:

bin/hadoop distcp s3://id:secret@bucket-name/file.txt file2.txt


I got this error:

Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source
s3://id:secret@bucket-name/file.txt does not exist.
        at org.apache.hadoop.util.CopyFiles.checkSrcPath(CopyFiles.java:504)
        at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:520)
        at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:596)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:612)

Any pointers on why this could be happening?

Thanks,
Siddhartha

On Fri, Apr 4, 2008 at 2:13 PM, Tom White <to...@gmail.com> wrote:

> Hi Siddhartha,
>
> This is a problem in 0.16.1
> (https://issues.apache.org/jira/browse/HADOOP-3027) that is fixed in
> 0.16.2, which was released yesterday.
>
> Tom
>
> On 04/04/2008, Siddhartha Reddy <si...@grok.in> wrote:
> > I am trying to run a Hadoop cluster on Amazon EC2 and backup all the
> data on
> >  Amazon S3 between the runs. I am using Hadoop 0.16.1 on a cluster made
> up of
> >  CentOS 5 images (ami-08f41161).
> >
> >
> >  I am able to copy from hdfs to S3 using the following command:
> >
> >  bin/hadoop distcp file.txt s3://id:secret@bucket-name/file.txt
> >
> >
> >  But copying from S3 to hdfs with the following command fails:
> >
> >  bin/hadoop distcp s3://id:secret@bucket-name/file.txt file2.txt
> >
> >
> >  with the following error:
> >
> >  With failures, global counters are inaccurate; consider running with -i
> >  Copy failed: java.lang.IllegalArgumentException: Hook previously
> registered
> >     at
> >
>  java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:45)
> >     at java.lang.Runtime.addShutdownHook(Runtime.java:192)
> >     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1194)
> >     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:148)
> >     at
> org.apache.hadoop.fs.s3.S3FileSystem.initialize(S3FileSystem.java:81)
> >     at
> >  org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1180)
> >     at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
> >     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1197)
> >     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:148)
> >     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
> >     at org.apache.hadoop.util.CopyFiles.checkSrcPath(CopyFiles.java:482)
> >     at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:504)
> >     at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:580)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >     at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:596)
> >
> >
> >  Can someone please point out if and what I am doing wrong?
> >
> >  Thanks,
> >
> > Siddhartha Reddy
> >
>



-- 
http://sids.in
"If you are not having fun, you are not doing it right."

Re: distcp fails when copying from s3 to hdfs

Posted by Tom White <to...@gmail.com>.
Hi Siddhartha,

This is a problem in 0.16.1
(https://issues.apache.org/jira/browse/HADOOP-3027) that is fixed in
0.16.2, which was released yesterday.

Tom

On 04/04/2008, Siddhartha Reddy <si...@grok.in> wrote:
> I am trying to run a Hadoop cluster on Amazon EC2 and backup all the data on
>  Amazon S3 between the runs. I am using Hadoop 0.16.1 on a cluster made up of
>  CentOS 5 images (ami-08f41161).
>
>
>  I am able to copy from hdfs to S3 using the following command:
>
>  bin/hadoop distcp file.txt s3://id:secret@bucket-name/file.txt
>
>
>  But copying from S3 to hdfs with the following command fails:
>
>  bin/hadoop distcp s3://id:secret@bucket-name/file.txt file2.txt
>
>
>  with the following error:
>
>  With failures, global counters are inaccurate; consider running with -i
>  Copy failed: java.lang.IllegalArgumentException: Hook previously registered
>     at
>  java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:45)
>     at java.lang.Runtime.addShutdownHook(Runtime.java:192)
>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1194)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:148)
>     at org.apache.hadoop.fs.s3.S3FileSystem.initialize(S3FileSystem.java:81)
>     at
>  org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1180)
>     at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1197)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:148)
>     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>     at org.apache.hadoop.util.CopyFiles.checkSrcPath(CopyFiles.java:482)
>     at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:504)
>     at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:580)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:596)
>
>
>  Can someone please point out if and what I am doing wrong?
>
>  Thanks,
>
> Siddhartha Reddy
>