You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Alfonso Olias Sanz <al...@gmail.com> on 2008/03/24 15:35:12 UTC

[core] problems while coping files from local file system to dfs

Hi

I want to copy 1000 files (37GB) of data to the dfs.  I have a set up
of 9-10 nodes, each one has between 5 to 15GB of free space.

While coping the files from the local file system on nodeA, the node
gets full of data and the the process gets stalled.

I have another free node with 80GB of free space. After adding the
datanode to the cluster, I run again the same copy process

hadoo dfs  -copyFromLocal ...

During the copy of these files to the DFS, I have run a java
application in order to check where the data is located (replication
level is set to 2)

String [][] hostnames = dfs.getFileCacheHints(inFile, 0, 100L);

The output I print is the following

File name = GASS.0011.63800-0011.63900.zip
File cache hints =   gaiawl07.net4.lan gaiawl02.net4.lan
############################################
File name = GASS.0011.53100-0011.53200.zip
File cache hints =   gaiawl03.net4.lan gaiawl02.net4.lan
############################################
File name = GASS.0011.23800-0011.23900.zip
File cache hints =   gaiawl08.net4.lan gaiawl02.net4.lan
############################################
File name = GASS.0011.18800-0011.18900.zip
File cache hints =   gaiawl02.net4.lan gaiawl06.net4.lan
....

In these small sample  gaiawl02.net4.lan appears for every file, and
this is currently happening for every copied file.    I launch the
copy process from that machine which is also the one which has 80GB of
free space.  I did this because of the problem I pointed previously of
filling up a node and stalling the copy operation.

Shouldn't be the data dispersed in all the nodes, because if that data
node crashes, only 1 replica of the data is going to exist at the
cluster.

During the "staging" phase I understand that that perticulary node
contains a local copy of the file being added to the HDFS. But once a
block is filled this doesn't mean that the block has to be also on
that node. Am I right?

Is it possible to spread the data among all the data nodes to avoid
that a node keeps 1 replica of every copied file?

thanks

Re: [core] problems while coping files from local file system to dfs

Posted by Alfonso Olias Sanz <al...@gmail.com>.
Ok it seems I have the file system corrupted. How can I recover from this
 bin/hadoop fsck /
....
/tmp/hadoop-aolias/mapred/system/job_200803241610_0001/job.jar:  Under
replicated blk_4445907956276011533. Target Replicas is 10 but found 7
replica(s).
...........
/user/aolias/IDT/tm/GASS.0011.98100-0011.98200.zip: MISSING 1 blocks
of total size 14684276 B.
Status: CORRUPT
 Total size:    16621314 B
 Total dirs:    13
 Total files:   15
 Total blocks:  5 (avg. block size 3324262 B)
  ********************************
  CORRUPT FILES:        1
  MISSING BLOCKS:       1
  MISSING SIZE:         14684276 B
  ********************************
 Minimally replicated blocks:   4 (80.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       1 (20.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    2
 Average block replication:     2.6
 Missing replicas:              3 (23.076923 %)
 Number of data-nodes:          13
 Number of racks:               1


The filesystem under path '/' is CORRUPT


On 24/03/2008, Alfonso Olias Sanz <al...@gmail.com> wrote:
> Hi Ted
>  Thanks for the info. But running the distfs I got this exception
>
>  bin/hadoop distcp -update
>  "file:///home2/mtlinden/simdata/GASS-RDS-3-G/tm" "/user/aolias/IDT"
>
>  With failures, global counters are inaccurate; consider running with -i
>  Copy failed: org.apache.hadoop.ipc.RemoteException:
>  org.apache.hadoop.dfs.SafeModeException: Cannot create
>  file/tmp/hadoop-aolias/mapred/system/distcp_idcrwx/_distcp_src_files.
>  Name node is in safe mode.
>  Safe mode will be turned off automatically.
>         at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:945)
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:929)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:280)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:512)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
>         at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
>         at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:1928)
>         at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:382)
>         at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:123)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436)
>         at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:827)
>         at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:379)
>         at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270)
>         at org.apache.hadoop.util.CopyFiles.setup(CopyFiles.java:686)
>         at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:475)
>         at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:550)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:563)
>
>
>
>  On 24/03/2008, Ted Dunning <td...@veoh.com> wrote:
>  >
>  >
>  >  Copy from a machine that is *not* running as a data node in order to get
>  >  better balancing.  Using distcp may also help because the nodes actually
>  >  doing the copying will be spread across the cluster.
>  >
>  >  You should probably be running a rebalancing script as well if your nodes
>  >  have differing sizes.
>  >
>  >
>  >  On 3/24/08 7:35 AM, "Alfonso Olias Sanz" <al...@gmail.com>
>  >  wrote:
>  >
>  >
>  >  > Hi
>  >  >
>  >  > I want to copy 1000 files (37GB) of data to the dfs.  I have a set up
>  >  > of 9-10 nodes, each one has between 5 to 15GB of free space.
>  >  >
>  >  > While coping the files from the local file system on nodeA, the node
>  >  > gets full of data and the the process gets stalled.
>  >  >
>  >  > I have another free node with 80GB of free space. After adding the
>  >  > datanode to the cluster, I run again the same copy process
>  >  >
>  >  > hadoo dfs  -copyFromLocal ...
>  >  >
>  >  > During the copy of these files to the DFS, I have run a java
>  >  > application in order to check where the data is located (replication
>  >  > level is set to 2)
>  >  >
>  >  > String [][] hostnames = dfs.getFileCacheHints(inFile, 0, 100L);
>  >  >
>  >  > The output I print is the following
>  >  >
>  >  > File name = GASS.0011.63800-0011.63900.zip
>  >  > File cache hints =   gaiawl07.net4.lan gaiawl02.net4.lan
>  >  > ############################################
>  >  > File name = GASS.0011.53100-0011.53200.zip
>  >  > File cache hints =   gaiawl03.net4.lan gaiawl02.net4.lan
>  >  > ############################################
>  >  > File name = GASS.0011.23800-0011.23900.zip
>  >  > File cache hints =   gaiawl08.net4.lan gaiawl02.net4.lan
>  >  > ############################################
>  >  > File name = GASS.0011.18800-0011.18900.zip
>  >  > File cache hints =   gaiawl02.net4.lan gaiawl06.net4.lan
>  >  > ....
>  >  >
>  >  > In these small sample  gaiawl02.net4.lan appears for every file, and
>  >  > this is currently happening for every copied file.    I launch the
>  >  > copy process from that machine which is also the one which has 80GB of
>  >  > free space.  I did this because of the problem I pointed previously of
>  >  > filling up a node and stalling the copy operation.
>  >  >
>  >  > Shouldn't be the data dispersed in all the nodes, because if that data
>  >  > node crashes, only 1 replica of the data is going to exist at the
>  >  > cluster.
>  >  >
>  >  > During the "staging" phase I understand that that perticulary node
>  >  > contains a local copy of the file being added to the HDFS. But once a
>  >  > block is filled this doesn't mean that the block has to be also on
>  >  > that node. Am I right?
>  >  >
>  >  > Is it possible to spread the data among all the data nodes to avoid
>  >  > that a node keeps 1 replica of every copied file?
>  >  >
>  >  > thanks
>  >
>  >
>

Re: [core] problems while coping files from local file system to dfs

Posted by Alfonso Olias Sanz <al...@gmail.com>.
Hi Ted
Thanks for the info. But running the distfs I got this exception

bin/hadoop distcp -update
"file:///home2/mtlinden/simdata/GASS-RDS-3-G/tm" "/user/aolias/IDT"

With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.dfs.SafeModeException: Cannot create
file/tmp/hadoop-aolias/mapred/system/distcp_idcrwx/_distcp_src_files.
Name node is in safe mode.
Safe mode will be turned off automatically.
        at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:945)
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:929)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:280)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:910)

        at org.apache.hadoop.ipc.Client.call(Client.java:512)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
        at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:1928)
        at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:382)
        at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:123)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:827)
        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:379)
        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270)
        at org.apache.hadoop.util.CopyFiles.setup(CopyFiles.java:686)
        at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:475)
        at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:550)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:563)


On 24/03/2008, Ted Dunning <td...@veoh.com> wrote:
>
>
>  Copy from a machine that is *not* running as a data node in order to get
>  better balancing.  Using distcp may also help because the nodes actually
>  doing the copying will be spread across the cluster.
>
>  You should probably be running a rebalancing script as well if your nodes
>  have differing sizes.
>
>
>  On 3/24/08 7:35 AM, "Alfonso Olias Sanz" <al...@gmail.com>
>  wrote:
>
>
>  > Hi
>  >
>  > I want to copy 1000 files (37GB) of data to the dfs.  I have a set up
>  > of 9-10 nodes, each one has between 5 to 15GB of free space.
>  >
>  > While coping the files from the local file system on nodeA, the node
>  > gets full of data and the the process gets stalled.
>  >
>  > I have another free node with 80GB of free space. After adding the
>  > datanode to the cluster, I run again the same copy process
>  >
>  > hadoo dfs  -copyFromLocal ...
>  >
>  > During the copy of these files to the DFS, I have run a java
>  > application in order to check where the data is located (replication
>  > level is set to 2)
>  >
>  > String [][] hostnames = dfs.getFileCacheHints(inFile, 0, 100L);
>  >
>  > The output I print is the following
>  >
>  > File name = GASS.0011.63800-0011.63900.zip
>  > File cache hints =   gaiawl07.net4.lan gaiawl02.net4.lan
>  > ############################################
>  > File name = GASS.0011.53100-0011.53200.zip
>  > File cache hints =   gaiawl03.net4.lan gaiawl02.net4.lan
>  > ############################################
>  > File name = GASS.0011.23800-0011.23900.zip
>  > File cache hints =   gaiawl08.net4.lan gaiawl02.net4.lan
>  > ############################################
>  > File name = GASS.0011.18800-0011.18900.zip
>  > File cache hints =   gaiawl02.net4.lan gaiawl06.net4.lan
>  > ....
>  >
>  > In these small sample  gaiawl02.net4.lan appears for every file, and
>  > this is currently happening for every copied file.    I launch the
>  > copy process from that machine which is also the one which has 80GB of
>  > free space.  I did this because of the problem I pointed previously of
>  > filling up a node and stalling the copy operation.
>  >
>  > Shouldn't be the data dispersed in all the nodes, because if that data
>  > node crashes, only 1 replica of the data is going to exist at the
>  > cluster.
>  >
>  > During the "staging" phase I understand that that perticulary node
>  > contains a local copy of the file being added to the HDFS. But once a
>  > block is filled this doesn't mean that the block has to be also on
>  > that node. Am I right?
>  >
>  > Is it possible to spread the data among all the data nodes to avoid
>  > that a node keeps 1 replica of every copied file?
>  >
>  > thanks
>
>

Re: [core] problems while coping files from local file system to dfs

Posted by Ted Dunning <td...@veoh.com>.

Copy from a machine that is *not* running as a data node in order to get
better balancing.  Using distcp may also help because the nodes actually
doing the copying will be spread across the cluster.

You should probably be running a rebalancing script as well if your nodes
have differing sizes.


On 3/24/08 7:35 AM, "Alfonso Olias Sanz" <al...@gmail.com>
wrote:

> Hi
> 
> I want to copy 1000 files (37GB) of data to the dfs.  I have a set up
> of 9-10 nodes, each one has between 5 to 15GB of free space.
> 
> While coping the files from the local file system on nodeA, the node
> gets full of data and the the process gets stalled.
> 
> I have another free node with 80GB of free space. After adding the
> datanode to the cluster, I run again the same copy process
> 
> hadoo dfs  -copyFromLocal ...
> 
> During the copy of these files to the DFS, I have run a java
> application in order to check where the data is located (replication
> level is set to 2)
> 
> String [][] hostnames = dfs.getFileCacheHints(inFile, 0, 100L);
> 
> The output I print is the following
> 
> File name = GASS.0011.63800-0011.63900.zip
> File cache hints =   gaiawl07.net4.lan gaiawl02.net4.lan
> ############################################
> File name = GASS.0011.53100-0011.53200.zip
> File cache hints =   gaiawl03.net4.lan gaiawl02.net4.lan
> ############################################
> File name = GASS.0011.23800-0011.23900.zip
> File cache hints =   gaiawl08.net4.lan gaiawl02.net4.lan
> ############################################
> File name = GASS.0011.18800-0011.18900.zip
> File cache hints =   gaiawl02.net4.lan gaiawl06.net4.lan
> ....
> 
> In these small sample  gaiawl02.net4.lan appears for every file, and
> this is currently happening for every copied file.    I launch the
> copy process from that machine which is also the one which has 80GB of
> free space.  I did this because of the problem I pointed previously of
> filling up a node and stalling the copy operation.
> 
> Shouldn't be the data dispersed in all the nodes, because if that data
> node crashes, only 1 replica of the data is going to exist at the
> cluster.
> 
> During the "staging" phase I understand that that perticulary node
> contains a local copy of the file being added to the HDFS. But once a
> block is filled this doesn't mean that the block has to be also on
> that node. Am I right?
> 
> Is it possible to spread the data among all the data nodes to avoid
> that a node keeps 1 replica of every copied file?
> 
> thanks


Re: [core] problems while coping files from local file system to dfs

Posted by Ted Dunning <td...@veoh.com>.

Copy from a machine that is *not* running as a data node in order to get
better balancing.  Using distcp may also help because the nodes actually
doing the copying will be spread across the cluster.

You should probably be running a rebalancing script as well if your nodes
have differing sizes.


On 3/24/08 7:35 AM, "Alfonso Olias Sanz" <al...@gmail.com>
wrote:

> Hi
> 
> I want to copy 1000 files (37GB) of data to the dfs.  I have a set up
> of 9-10 nodes, each one has between 5 to 15GB of free space.
> 
> While coping the files from the local file system on nodeA, the node
> gets full of data and the the process gets stalled.
> 
> I have another free node with 80GB of free space. After adding the
> datanode to the cluster, I run again the same copy process
> 
> hadoo dfs  -copyFromLocal ...
> 
> During the copy of these files to the DFS, I have run a java
> application in order to check where the data is located (replication
> level is set to 2)
> 
> String [][] hostnames = dfs.getFileCacheHints(inFile, 0, 100L);
> 
> The output I print is the following
> 
> File name = GASS.0011.63800-0011.63900.zip
> File cache hints =   gaiawl07.net4.lan gaiawl02.net4.lan
> ############################################
> File name = GASS.0011.53100-0011.53200.zip
> File cache hints =   gaiawl03.net4.lan gaiawl02.net4.lan
> ############################################
> File name = GASS.0011.23800-0011.23900.zip
> File cache hints =   gaiawl08.net4.lan gaiawl02.net4.lan
> ############################################
> File name = GASS.0011.18800-0011.18900.zip
> File cache hints =   gaiawl02.net4.lan gaiawl06.net4.lan
> ....
> 
> In these small sample  gaiawl02.net4.lan appears for every file, and
> this is currently happening for every copied file.    I launch the
> copy process from that machine which is also the one which has 80GB of
> free space.  I did this because of the problem I pointed previously of
> filling up a node and stalling the copy operation.
> 
> Shouldn't be the data dispersed in all the nodes, because if that data
> node crashes, only 1 replica of the data is going to exist at the
> cluster.
> 
> During the "staging" phase I understand that that perticulary node
> contains a local copy of the file being added to the HDFS. But once a
> block is filled this doesn't mean that the block has to be also on
> that node. Am I right?
> 
> Is it possible to spread the data among all the data nodes to avoid
> that a node keeps 1 replica of every copied file?
> 
> thanks