You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Paul Rimba <pa...@gmail.com> on 2011/07/01 05:56:16 UTC

Dynamic Cluster Node addition

Hey there,

i am trying to add a new datanode/tasktracker to a currently running
cluster.

Is this feasible? And if yes, how do i change the masters, slaves and
dfs.replication(in hdfs-site.xml) configuration?

can i add the new slave to the slaves configuration file while the cluster
is running?

i found this ./bin/hadoop dfs -setrep -w 4 /path/to/file command to change
the dfs.replication on the fly.

Is there a better way to do it?



Thank you for your kind attention.


Kind Regards,
Paul

Re: Dynamic Cluster Node addition

Posted by Harsh J <ha...@cloudera.com>.

Paul,

You can inspect the data used by your new nodes after the balancer
operation runs. "hadoop dfsadmin -report" should tell you detailed
stats about each of the DNs, or look at /fsck

(Note: by default, the balancer operation may be bandwidth limited,
for performance reasons and may take a while to happen -- although
this is configurable)

On Fri, Jul 1, 2011 at 10:42 AM, Paul Rimba <pa...@gmail.com> wrote:
> Hey Matei,
> what if you do the bin/hadoop-daemon.sh start tasktracker
> bin/hadoop-daemon.sh start datanode.
> Does it move the old data to the new slave?
> I run that scenario a couple of times and run the start-balancer.sh. It
> always says that the cluster is balanced. Does it mean that the has been
> spread out?
> Thanks
> Paul
> On Fri, Jul 1, 2011 at 2:05 PM, Matei Zaharia <ma...@eecs.berkeley.edu>
> wrote:
>>
>> You can have a new TaskTracker or DataNode join the cluster by just
>> starting that daemon on the slave (e.g. bin/hadoop-daemon.sh start
>> tasktracker) and making sure it is configured to connect to the right
>> JobTracker or NameNode (through the mapred.job.tracker and fs.default.name
>> properties in the config files). The slaves file is only used for the
>> bin/start-* and bin/stop-* scripts, but Hadoop doesn't look at it at
>> runtime. There may be other similar files that it can look at though, such
>> as a blacklist, but I think that in the default configuration you can just
>> launch the daemon and it will work.
>> Note that if you add a new DataNode, Hadoop won't automatically move old
>> data to it (to spread out the across the cluster) unless you run the HDFS
>> rebalancer, at least as far as I know.
>> Matei
>> On Jun 30, 2011, at 8:56 PM, Paul Rimba wrote:
>>
>> Hey there,
>> i am trying to add a new datanode/tasktracker to a currently running
>> cluster.
>> Is this feasible? And if yes, how do i change the masters, slaves and
>> dfs.replication(in hdfs-site.xml) configuration?
>> can i add the new slave to the slaves configuration file while the cluster
>> is running?
>> i found this ./bin/hadoop dfs -setrep -w 4 /path/to/file command to change
>> the dfs.replication on the fly.
>> Is there a better way to do it?
>>
>>
>> Thank you for your kind attention.
>>
>> Kind Regards,
>> Paul
>
>



-- 
Harsh J

Re: Dynamic Cluster Node addition

Posted by Paul Rimba <pa...@gmail.com>.

Hey Matei,

what if you do the bin/hadoop-daemon.sh start tasktracker
bin/hadoop-daemon.sh start datanode.

Does it move the old data to the new slave?

I run that scenario a couple of times and run the start-balancer.sh. It
always says that the cluster is balanced. Does it mean that the has been
spread out?

Thanks
Paul

On Fri, Jul 1, 2011 at 2:05 PM, Matei Zaharia <ma...@eecs.berkeley.edu>wrote:

> You can have a new TaskTracker or DataNode join the cluster by just
> starting that daemon on the slave (e.g. bin/hadoop-daemon.sh start
> tasktracker) and making sure it is configured to connect to the right
> JobTracker or NameNode (through the mapred.job.tracker and fs.default.nameproperties in the config files). The slaves file is only used for the
> bin/start-* and bin/stop-* scripts, but Hadoop doesn't look at it at
> runtime. There may be other similar files that it can look at though, such
> as a blacklist, but I think that in the default configuration you can just
> launch the daemon and it will work.
>
> Note that if you add a new DataNode, Hadoop won't automatically move old
> data to it (to spread out the across the cluster) unless you run the HDFS
> rebalancer, at least as far as I know.
>
> Matei
>
> On Jun 30, 2011, at 8:56 PM, Paul Rimba wrote:
>
> Hey there,
>
> i am trying to add a new datanode/tasktracker to a currently running
> cluster.
>
> Is this feasible? And if yes, how do i change the masters, slaves and
> dfs.replication(in hdfs-site.xml) configuration?
>
> can i add the new slave to the slaves configuration file while the cluster
> is running?
>
> i found this ./bin/hadoop dfs -setrep -w 4 /path/to/file command to change
> the dfs.replication on the fly.
>
> Is there a better way to do it?
>
>
>
> Thank you for your kind attention.
>
>
> Kind Regards,
> Paul
>
>
>

Re: Dynamic Cluster Node addition

Posted by Matei Zaharia <ma...@eecs.berkeley.edu>.

You can have a new TaskTracker or DataNode join the cluster by just starting that daemon on the slave (e.g. bin/hadoop-daemon.sh start tasktracker) and making sure it is configured to connect to the right JobTracker or NameNode (through the mapred.job.tracker and fs.default.name properties in the config files). The slaves file is only used for the bin/start-* and bin/stop-* scripts, but Hadoop doesn't look at it at runtime. There may be other similar files that it can look at though, such as a blacklist, but I think that in the default configuration you can just launch the daemon and it will work.

Note that if you add a new DataNode, Hadoop won't automatically move old data to it (to spread out the across the cluster) unless you run the HDFS rebalancer, at least as far as I know.

Matei

On Jun 30, 2011, at 8:56 PM, Paul Rimba wrote:

> Hey there,
> 
> i am trying to add a new datanode/tasktracker to a currently running cluster.
> 
> Is this feasible? And if yes, how do i change the masters, slaves and dfs.replication(in hdfs-site.xml) configuration?
> 
> can i add the new slave to the slaves configuration file while the cluster is running?
> 
> i found this ./bin/hadoop dfs -setrep -w 4 /path/to/file command to change the dfs.replication on the fly.
> 
> Is there a better way to do it?
> 
> 
> 
> Thank you for your kind attention.
> 
> 
> Kind Regards,
> Paul