You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Saptarshi Guha <sa...@gmail.com> on 2009/02/13 03:50:55 UTC

Very large file copied to cluster, and the copy fails. All blocks bad

hello,
I have a 42 GB file on the local fs(call the machine A)  which i need
to copy to a HDFS (replicattion 1), according the HDFS webtracker it
has 208GB across 7 machines.
Note, the machine A has about 80 GB total, so there is no place to
store copies of the file.
Using the command bin/hadoop dfs -put /local/x /remote/tmp/ fails,
with all blocks being bad. This is not surprising since the file is
copied entirely to the HDFS region that resides on A. Had the file
been copied across all machines, this would not have failed.

I have more experience with mapreduce and not much with the hdfs side
of things.
Is there a configuration option i'm missing that forces the file to be
split across the machines(when it is being copied)?
-- 
Saptarshi Guha - saptarshi.guha@gmail.com

Re: Very large file copied to cluster, and the copy fails. All blocks bad

Posted by Saptarshi Guha <sa...@gmail.com>.

> Did you run the copy command from machine A?
Yes, exactly.
> I had to have the client doing the copy either on the master or on an "off-cluster"
 Thanks! I uploaded it from an off cluster (i.e not participating in
the hdfs) and it worked splendidly.

Regards
Saptarshi


On Thu, Feb 12, 2009 at 11:03 PM, TCK <mo...@yahoo.com> wrote:
>
 I believe that if you do the copy from an hdfs client that is on the
same machine as a data node, then for each block the primary copy
always goes to that data node, and only the replicas get distributed
among other data nodes. I ran into this issue once -- I had to have
the client doing the copy either on the master or on an "off-cluster"
node.
> -TCK
>
>
>
> --- On Thu, 2/12/09, Saptarshi Guha <sa...@gmail.com> wrote:
> From: Saptarshi Guha <sa...@gmail.com>
> Subject: Very large file copied to cluster, and the copy fails. All blocks bad
> To: "core-user@hadoop.apache.org" <co...@hadoop.apache.org>
> Date: Thursday, February 12, 2009, 9:50 PM
>
> hello,
> I have a 42 GB file on the local fs(call the machine A)  which i need
> to copy to a HDFS (replicattion 1), according the HDFS webtracker it
> has 208GB across 7 machines.
> Note, the machine A has about 80 GB total, so there is no place to
> store copies of the file.
> Using the command bin/hadoop dfs -put /local/x /remote/tmp/ fails,
> with all blocks being bad. This is not surprising since the file is
> copied entirely to the HDFS region that resides on A. Had the file
> been copied across all machines, this would not have failed.
>
> I have more experience with mapreduce and not much with the hdfs side
> of things.
> Is there a configuration option i'm missing that forces the file to be
> split across the machines(when it is being copied)?
> --
> Saptarshi Guha - saptarshi.guha@gmail.com
>
>
>
>



-- 
Saptarshi Guha - saptarshi.guha@gmail.com

Re: Very large file copied to cluster, and the copy fails. All blocks bad

Posted by TCK <mo...@yahoo.com>.

Did you run the copy command from machine A? I believe that if you do the copy from an hdfs client that is on the same machine as a data node, then for each block the primary copy always goes to that data node, and only the replicas get distributed among other data nodes. I ran into this issue once -- I had to have the client doing the copy either on the master or on an "off-cluster" node.
-TCK

--- On Thu, 2/12/09, Saptarshi Guha <sa...@gmail.com> wrote:
From: Saptarshi Guha <sa...@gmail.com>
Subject: Very large file copied to cluster, and the copy fails. All blocks bad
To: "core-user@hadoop.apache.org" <co...@hadoop.apache.org>
Date: Thursday, February 12, 2009, 9:50 PM

hello,
I have a 42 GB file on the local fs(call the machine A)  which i need
to copy to a HDFS (replicattion 1), according the HDFS webtracker it
has 208GB across 7 machines.
Note, the machine A has about 80 GB total, so there is no place to
store copies of the file.
Using the command bin/hadoop dfs -put /local/x /remote/tmp/ fails,
with all blocks being bad. This is not surprising since the file is
copied entirely to the HDFS region that resides on A. Had the file
been copied across all machines, this would not have failed.

I have more experience with mapreduce and not much with the hdfs side
of things.
Is there a configuration option i'm missing that forces the file to be
split across the machines(when it is being copied)?
-- 
Saptarshi Guha - saptarshi.guha@gmail.com