You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Phil Hagelberg <ph...@hagelb.org> on 2009/08/13 21:48:05 UTC

Failure distcping to S3

I'm trying to perform a distcp from HDFS to S3. I start it with something like:

  $ hadoop distcp /data s3n://my-bucket/packed/

The output shows a lot of 404 warnings. However, it never shows any
errors that presumably would be causing these 404s to begin with. I'm
assuming lots of upload attempts to S3 are failing, but for some reason
Hadoop is only logging the attempts to read these files it expects to
exist and not showing the log output for the original failures. I'm not
seeing errors in the logs of the individual nodes either.

In the end my job fails, but it gives precious little explanation as to
why. I've attached the output of distcp.

I've also tried increasing the retry max to 10 and the timeout to 2
minutes via jets3t.properties, but this doesn't seem to have an effect.

Any ideas?

thanks,
Phil Hagelberg
http://technomancy.us