You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by openresearch <Qi...@openresearchinc.com> on 2009/06/18 21:57:03 UTC

HDFS is not loading evenly across all nodes.

Hi all

I "dfs put" a large dataset onto a 10-node cluster.

When I observe the Hadoop progress (via web:50070) and each local file
system (via df -k),
I notice that my master node is hit 5-10 times harder than others, so hard
drive is get full quicker than others. Last night load, it actually crash
when hard drive was full. 

To my understand,  data should wrap around all nodes evenly (in a
round-robin fashion using 64M as a unit). 

Is it expected behavior of Hadoop? Can anyone suggest a good troubleshooting
way?

Thanks


-- 
View this message in context: http://www.nabble.com/HDFS-is-not-loading-evenly-across-all-nodes.-tp24099585p24099585.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: HDFS is not loading evenly across all nodes.

Posted by Aaron Kimball <aa...@cloudera.com>.
As an addendum, running a DataNode on the same machine as a NameNode is
generally considered a bad idea because it hurts the NameNode's ability to
maintain high throughput.

- Aaron

On Thu, Jun 18, 2009 at 1:26 PM, Aaron Kimball <aa...@cloudera.com> wrote:

> Did you run the dfs put commands from the master node?  If you're inserting
> into HDFS from a machine running a DataNode, the local datanode will always
> be chosen as one of the three replica targets. For more balanced loading,
> you should use an off-cluster machine as the point of origin.
>
> If you experience uneven block distribution, you should also periodically
> rebalance your cluster by running bin/start-balancer.sh every so often. It
> will work in the background to move blocks from heavily-laden nodes to
> underutilized ones.
>
> - Aaron
>
>
> On Thu, Jun 18, 2009 at 12:57 PM, openresearch <
> Qiming.He@openresearchinc.com> wrote:
>
>>
>> Hi all
>>
>> I "dfs put" a large dataset onto a 10-node cluster.
>>
>> When I observe the Hadoop progress (via web:50070) and each local file
>> system (via df -k),
>> I notice that my master node is hit 5-10 times harder than others, so hard
>> drive is get full quicker than others. Last night load, it actually crash
>> when hard drive was full.
>>
>> To my understand,  data should wrap around all nodes evenly (in a
>> round-robin fashion using 64M as a unit).
>>
>> Is it expected behavior of Hadoop? Can anyone suggest a good
>> troubleshooting
>> way?
>>
>> Thanks
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/HDFS-is-not-loading-evenly-across-all-nodes.-tp24099585p24099585.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>

Re: HDFS is not loading evenly across all nodes.

Posted by Taeho Kang <tk...@gmail.com>.
Yes, it will be kept on the machine you issue the "dfs -put" command if it's
got a datanode running. Otherwise, a random datanode will be chosen to store
the datablocks.


On Fri, Jun 19, 2009 at 10:41 AM, Rajeev Gupta <gr...@in.ibm.com> wrote:

> "If you're inserting
> into HDFS from a machine running a DataNode, the local datanode will always
> be chosen as one of the three replica targets."
> Does that mean that if replication factor is 1, whole file will be kept on
> one node only?
>
> Thanks and regards.
> -Rajeev Gupta
>
>
>
>
>             Aaron Kimball
>             <aaron@cloudera.c
>             om>                                                        To
>                                       core-user@hadoop.apache.org
>             06/19/2009 01:56                                           cc
>             AM
>                                                                   Subject
>                                       Re: HDFS is not loading evenly
>             Please respond to         across all nodes.
>             core-user@hadoop.
>                apache.org
>
>
>
>
>
>
>
>
> Did you run the dfs put commands from the master node?  If you're inserting
> into HDFS from a machine running a DataNode, the local datanode will always
> be chosen as one of the three replica targets. For more balanced loading,
> you should use an off-cluster machine as the point of origin.
>
> If you experience uneven block distribution, you should also periodically
> rebalance your cluster by running bin/start-balancer.sh every so often. It
> will work in the background to move blocks from heavily-laden nodes to
> underutilized ones.
>
> - Aaron
>
> On Thu, Jun 18, 2009 at 12:57 PM, openresearch <
> Qiming.He@openresearchinc.com> wrote:
>
> >
> > Hi all
> >
> > I "dfs put" a large dataset onto a 10-node cluster.
> >
> > When I observe the Hadoop progress (via web:50070) and each local file
> > system (via df -k),
> > I notice that my master node is hit 5-10 times harder than others, so
> hard
> > drive is get full quicker than others. Last night load, it actually crash
> > when hard drive was full.
> >
> > To my understand,  data should wrap around all nodes evenly (in a
> > round-robin fashion using 64M as a unit).
> >
> > Is it expected behavior of Hadoop? Can anyone suggest a good
> > troubleshooting
> > way?
> >
> > Thanks
> >
> >
> > --
> > View this message in context:
> >
>
> http://www.nabble.com/HDFS-is-not-loading-evenly-across-all-nodes.-tp24099585p24099585.html
>
> > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >
> >
>
>
>

Re: HDFS is not loading evenly across all nodes.

Posted by Rajeev Gupta <gr...@in.ibm.com>.
"If you're inserting
into HDFS from a machine running a DataNode, the local datanode will always
be chosen as one of the three replica targets."
Does that mean that if replication factor is 1, whole file will be kept on
one node only?

Thanks and regards.
-Rajeev Gupta



                                                                           
             Aaron Kimball                                                 
             <aaron@cloudera.c                                             
             om>                                                        To 
                                       core-user@hadoop.apache.org         
             06/19/2009 01:56                                           cc 
             AM                                                            
                                                                   Subject 
                                       Re: HDFS is not loading evenly      
             Please respond to         across all nodes.                   
             core-user@hadoop.                                             
                apache.org                                                 
                                                                           
                                                                           
                                                                           
                                                                           




Did you run the dfs put commands from the master node?  If you're inserting
into HDFS from a machine running a DataNode, the local datanode will always
be chosen as one of the three replica targets. For more balanced loading,
you should use an off-cluster machine as the point of origin.

If you experience uneven block distribution, you should also periodically
rebalance your cluster by running bin/start-balancer.sh every so often. It
will work in the background to move blocks from heavily-laden nodes to
underutilized ones.

- Aaron

On Thu, Jun 18, 2009 at 12:57 PM, openresearch <
Qiming.He@openresearchinc.com> wrote:

>
> Hi all
>
> I "dfs put" a large dataset onto a 10-node cluster.
>
> When I observe the Hadoop progress (via web:50070) and each local file
> system (via df -k),
> I notice that my master node is hit 5-10 times harder than others, so
hard
> drive is get full quicker than others. Last night load, it actually crash
> when hard drive was full.
>
> To my understand,  data should wrap around all nodes evenly (in a
> round-robin fashion using 64M as a unit).
>
> Is it expected behavior of Hadoop? Can anyone suggest a good
> troubleshooting
> way?
>
> Thanks
>
>
> --
> View this message in context:
>
http://www.nabble.com/HDFS-is-not-loading-evenly-across-all-nodes.-tp24099585p24099585.html

> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



Re: HDFS is not loading evenly across all nodes.

Posted by Aaron Kimball <aa...@cloudera.com>.
Did you run the dfs put commands from the master node?  If you're inserting
into HDFS from a machine running a DataNode, the local datanode will always
be chosen as one of the three replica targets. For more balanced loading,
you should use an off-cluster machine as the point of origin.

If you experience uneven block distribution, you should also periodically
rebalance your cluster by running bin/start-balancer.sh every so often. It
will work in the background to move blocks from heavily-laden nodes to
underutilized ones.

- Aaron

On Thu, Jun 18, 2009 at 12:57 PM, openresearch <
Qiming.He@openresearchinc.com> wrote:

>
> Hi all
>
> I "dfs put" a large dataset onto a 10-node cluster.
>
> When I observe the Hadoop progress (via web:50070) and each local file
> system (via df -k),
> I notice that my master node is hit 5-10 times harder than others, so hard
> drive is get full quicker than others. Last night load, it actually crash
> when hard drive was full.
>
> To my understand,  data should wrap around all nodes evenly (in a
> round-robin fashion using 64M as a unit).
>
> Is it expected behavior of Hadoop? Can anyone suggest a good
> troubleshooting
> way?
>
> Thanks
>
>
> --
> View this message in context:
> http://www.nabble.com/HDFS-is-not-loading-evenly-across-all-nodes.-tp24099585p24099585.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>