You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Yi Zhao <yi...@alibaba-inc.com> on 2008/07/04 15:47:14 UTC

how to disperse the data uniformly?

I have two datanode as below:
10.62.136.10
10.62.136.11
one namenode below:
10.62.136.10
master:
10.62.136.10
salves:
10.62.136.10
10.62.136.11

when I put a file 600M and a file 20M into dfs, I found that all data is
centralized in one datanode!!

how to disperse all the data to all datanode uniformly?

thanks.

my hadoop-site.xml is:
------------------------------
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
	<property>
		<name>fs.default.name</name>
		<value>10.62.142.62:7770</value>
		<description>the name of the default file system, either the literal
string "local" or a host:port for DFS.</description>
	</property>
	<property>
		<name>mapred.job.tracker</name>
		<value>10.62.142.62:7771</value>
		<description>the host and port that MapReduce job tracker runs at. if
"local", then jobs are run in-process as a single map and reduce
task.</description>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/home/kongming/devel/hadoop/tmp</value>
		<description>a base for other temporary directories.</description>
	</property>
	<property>
		<name>dfs.name.dir</name>
		<value>/home/kongming/devel/hadoop/fs/name</value>
		<description>determines where on the local filesystem the DFS name
node should store the name table. if this is a comma-delimited list of
directories then the name table is replicated in all of the directories,
for redundancy.</description>
	</property>
	<property>
		<name>dfs.data.dir</name>
		<value>/home/kongming/devel/hadoop/fs/data</value>
		<description>determines where on the local filesystem on DFS data node
should store its blocks. if this is a comma-delimited list of
directories, then data will be stored in all named directories,
typically on different devices. derectories that do not exist are
ignored.</description>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
		<description>default block replication. the actual number of
replications can be specified when the file is created. the default is
used if replication is not specified in create time</description>
	</property>
</configuration>

Re: how to disperse the data uniformly?

Posted by gm...@cs.ucf.edu.

for starters, you should probably change your replication value to 
something other than 1, it kinda defeats the purpose of a parallel 
distributed file system otherwise. What version of hadoop are you running? 
you should just be able to run the command

$bin/hadoop balancer 

to get the nodes balanced. However I don't remember what the tolerance 
value on the balancer is for the generic hadoop-default.xml file is. Since 
you only have 620MB on the cluster, then it might be within hadoop's range 
of okay data distribution. Run the balancer first and see what happens\

 - Grant

On Jul 4 2008, Yi Zhao wrote:

>I have two datanode as below:
>10.62.136.10
>ï»¿10.62.136.11
>one namenode below:
>10.62.136.10
>master:
>ï»¿10.62.136.10
>salves:
>ï»¿10.62.136.10
>ï»¿10.62.136.11
>
>when I put a file 600M and a file 20M into dfs, I found that all data is
>centralized in one datanode!!
>
>how to disperse all the data to all datanode uniformly?
>
>thanks.
>
>my hadoop-site.xml is:
>------------------------------
><?xml version="1.0"?>
><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
><!-- Put site-specific property overrides in this file. -->
>
><configuration>
>	<property>
>		<name>fs.default.name</name>
>		<value>10.62.142.62:7770</value>
>		<description>the name of the default file system, either the literal
>string "local" or a host:port for DFS.</description>
>	</property>
>	<property>
>		<name>mapred.job.tracker</name>
>		<value>10.62.142.62:7771</value>
>		<description>the host and port that MapReduce job tracker runs at. if
>"local", then jobs are run in-process as a single map and reduce
>task.</description>
>	</property>
>	<property>
>		<name>hadoop.tmp.dir</name>
>		<value>/home/kongming/devel/hadoop/tmp</value>
>		<description>a base for other temporary directories.</description>
>	</property>
>	<property>
>		<name>dfs.name.dir</name>
>		<value>/home/kongming/devel/hadoop/fs/name</value>
>		<description>determines where on the local filesystem the DFS name
>node should store the name table. if this is a comma-delimited list of
>directories then the name table is replicated in all of the directories,
>for redundancy.</description>
>	</property>
>	<property>
>		<name>dfs.data.dir</name>
>		<value>/home/kongming/devel/hadoop/fs/data</value>
>		<description>determines where on the local filesystem on DFS data node
>should store its blocks. if this is a comma-delimited list of
>directories, then data will be stored in all named directories,
>typically on different devices. derectories that do not exist are
>ignored.</description>
>	</property>
>	<property>
>		<name>dfs.replication</name>
>		<value>1</value>
>		<description>default block replication. the actual number of
>replications can be specified when the file is created. the default is
>used if replication is not specified in create time</description>
>	</property>
></configuration>
>

-- 
Grant Mackey
UCF Researcher
Eng. III Rm238