You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Gary Yngve <ga...@gmail.com> on 2008/09/25 10:31:00 UTC

fsck ok, distcp errors hdfs->S3

Hi all,

I just inherited a hadoop/EC2/nutch project that has been running for a few
weeks, and lo and behold, the task magically entered a state that is not
running, nor complete nor failed.

I'd love to figure out why it is what it is right now, as well as resume it
without losing what it's done, but that's a topic for later.

Right now I'm just trying to back up what's already done to S3,

I've already dealt w/ getting the key/secret passed in (s3://u:p@... didn't
do it.. had to specify properties) and using buckets that don't have _.

I'm running 0.17.0, but I don't think that's an issue, and furthermore, I'm
not sure if 0.18.x is ready for EC2+nutch?

Here's what fsck says:
Status: HEALTHY
 Total size:    13980327325 B
 Total dirs:    111
 Total files:    196
 Total blocks:    384 (avg. block size 36407102 B)
 Minimally replicated blocks:    384 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    3.0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        3
 Number of racks:        1

And the distcp
[root@ip-... ~]#  hadoop distcp -D fs.s3.awsAccessKey=... -D
fs.s3.awsSecretAccessKey=... hdfs://ip-....ec2.internal:50001/
s3://com....hadoopbackup/2008-09-24/
fails on:
...
08/09/24 15:33:42 INFO mapred.JobClient: Task Id :
task_200808301100_0041_m_000002_0, Status : FAILED
java.lang.NullPointerException
    at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:196)
    at
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.blockExists(Jets3tFileSystemStore.java:178)
...
08/09/24 15:34:34 INFO mapred.JobClient: Task Id :
task_200808301100_0041_m_000002_1, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 1 Failed: 3
    at
org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
...
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
    at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:604)

Does anyone have any ideas what's going on?

I tried the distcp a second time and got very similar failures.
I haven't counted exactly how much got copied to S3, but seems on the order
of a few hundred megs, not 13+ GB.

I'm brand new to this stuff so I feel pretty clueless.

Thanks,
Gary