You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by SF Hadoop <sf...@gmail.com> on 2014/10/09 08:01:00 UTC

Hadoop / HBase hotspotting / overloading specific nodes

I'm not sure if this is an HBase issue or an Hadoop issue so if this is
"off-topic" please forgive.

I am having a problem with Hadoop maxing out drive space on a select few
nodes when I am running an HBase job.  The scenario is this:

- The job is a data import using Map/Reduce / HBase
- The data is being imported to one table
- The table only has a couple of regions
- As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the
datanode / regionserver that is hosting  the regions
- As the job progresses (and more data is imported) the two datanodes
hosting the regions start to get full and eventually drive space hits 100%
utilization whilst the other nodes in the cluster are at 40% or less drive
space utilization
- The job in Hadoop then begins to hang with multiple "out of space" errors
and eventually fails.

I have tried running hadoop balancer during the job run and this helped but
only really succeeded in prolonging the eventual job failure.

How can I get Hadoop / HBase to distribute the data to HDFS more evenly
when it is favoring the nodes that the regions are on?

Am I missing something here?

Thanks for any help.

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by SF Hadoop <sf...@gmail.com>.
Haven't tried this. I'll give it a shot.

Thanks

On Thursday, October 9, 2014, Ted Yu <yu...@gmail.com> wrote:

> Looks like the number of regions is lower than the number of nodes in the
> cluster.
>
> Can you split the table such that, after hbase balancer is run, there is
> region hosted by every node ?
>
> Cheers
>
> On Oct 8, 2014, at 11:01 PM, SF Hadoop <sfhadoop@gmail.com <javascript:;>>
> wrote:
>
> > I'm not sure if this is an HBase issue or an Hadoop issue so if this is
> "off-topic" please forgive.
> >
> > I am having a problem with Hadoop maxing out drive space on a select few
> nodes when I am running an HBase job.  The scenario is this:
> >
> > - The job is a data import using Map/Reduce / HBase
> > - The data is being imported to one table
> > - The table only has a couple of regions
> > - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on
> the datanode / regionserver that is hosting  the regions
> > - As the job progresses (and more data is imported) the two datanodes
> hosting the regions start to get full and eventually drive space hits 100%
> utilization whilst the other nodes in the cluster are at 40% or less drive
> space utilization
> > - The job in Hadoop then begins to hang with multiple "out of space"
> errors and eventually fails.
> >
> > I have tried running hadoop balancer during the job run and this helped
> but only really succeeded in prolonging the eventual job failure.
> >
> > How can I get Hadoop / HBase to distribute the data to HDFS more evenly
> when it is favoring the nodes that the regions are on?
> >
> > Am I missing something here?
> >
> > Thanks for any help.
>

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by SF Hadoop <sf...@gmail.com>.
Haven't tried this. I'll give it a shot.

Thanks

On Thursday, October 9, 2014, Ted Yu <yu...@gmail.com> wrote:

> Looks like the number of regions is lower than the number of nodes in the
> cluster.
>
> Can you split the table such that, after hbase balancer is run, there is
> region hosted by every node ?
>
> Cheers
>
> On Oct 8, 2014, at 11:01 PM, SF Hadoop <sfhadoop@gmail.com <javascript:;>>
> wrote:
>
> > I'm not sure if this is an HBase issue or an Hadoop issue so if this is
> "off-topic" please forgive.
> >
> > I am having a problem with Hadoop maxing out drive space on a select few
> nodes when I am running an HBase job.  The scenario is this:
> >
> > - The job is a data import using Map/Reduce / HBase
> > - The data is being imported to one table
> > - The table only has a couple of regions
> > - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on
> the datanode / regionserver that is hosting  the regions
> > - As the job progresses (and more data is imported) the two datanodes
> hosting the regions start to get full and eventually drive space hits 100%
> utilization whilst the other nodes in the cluster are at 40% or less drive
> space utilization
> > - The job in Hadoop then begins to hang with multiple "out of space"
> errors and eventually fails.
> >
> > I have tried running hadoop balancer during the job run and this helped
> but only really succeeded in prolonging the eventual job failure.
> >
> > How can I get Hadoop / HBase to distribute the data to HDFS more evenly
> when it is favoring the nodes that the regions are on?
> >
> > Am I missing something here?
> >
> > Thanks for any help.
>

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by SF Hadoop <sf...@gmail.com>.
Haven't tried this. I'll give it a shot.

Thanks

On Thursday, October 9, 2014, Ted Yu <yu...@gmail.com> wrote:

> Looks like the number of regions is lower than the number of nodes in the
> cluster.
>
> Can you split the table such that, after hbase balancer is run, there is
> region hosted by every node ?
>
> Cheers
>
> On Oct 8, 2014, at 11:01 PM, SF Hadoop <sfhadoop@gmail.com <javascript:;>>
> wrote:
>
> > I'm not sure if this is an HBase issue or an Hadoop issue so if this is
> "off-topic" please forgive.
> >
> > I am having a problem with Hadoop maxing out drive space on a select few
> nodes when I am running an HBase job.  The scenario is this:
> >
> > - The job is a data import using Map/Reduce / HBase
> > - The data is being imported to one table
> > - The table only has a couple of regions
> > - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on
> the datanode / regionserver that is hosting  the regions
> > - As the job progresses (and more data is imported) the two datanodes
> hosting the regions start to get full and eventually drive space hits 100%
> utilization whilst the other nodes in the cluster are at 40% or less drive
> space utilization
> > - The job in Hadoop then begins to hang with multiple "out of space"
> errors and eventually fails.
> >
> > I have tried running hadoop balancer during the job run and this helped
> but only really succeeded in prolonging the eventual job failure.
> >
> > How can I get Hadoop / HBase to distribute the data to HDFS more evenly
> when it is favoring the nodes that the regions are on?
> >
> > Am I missing something here?
> >
> > Thanks for any help.
>

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by SF Hadoop <sf...@gmail.com>.
Haven't tried this. I'll give it a shot.

Thanks

On Thursday, October 9, 2014, Ted Yu <yu...@gmail.com> wrote:

> Looks like the number of regions is lower than the number of nodes in the
> cluster.
>
> Can you split the table such that, after hbase balancer is run, there is
> region hosted by every node ?
>
> Cheers
>
> On Oct 8, 2014, at 11:01 PM, SF Hadoop <sfhadoop@gmail.com <javascript:;>>
> wrote:
>
> > I'm not sure if this is an HBase issue or an Hadoop issue so if this is
> "off-topic" please forgive.
> >
> > I am having a problem with Hadoop maxing out drive space on a select few
> nodes when I am running an HBase job.  The scenario is this:
> >
> > - The job is a data import using Map/Reduce / HBase
> > - The data is being imported to one table
> > - The table only has a couple of regions
> > - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on
> the datanode / regionserver that is hosting  the regions
> > - As the job progresses (and more data is imported) the two datanodes
> hosting the regions start to get full and eventually drive space hits 100%
> utilization whilst the other nodes in the cluster are at 40% or less drive
> space utilization
> > - The job in Hadoop then begins to hang with multiple "out of space"
> errors and eventually fails.
> >
> > I have tried running hadoop balancer during the job run and this helped
> but only really succeeded in prolonging the eventual job failure.
> >
> > How can I get Hadoop / HBase to distribute the data to HDFS more evenly
> when it is favoring the nodes that the regions are on?
> >
> > Am I missing something here?
> >
> > Thanks for any help.
>

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by Ted Yu <yu...@gmail.com>.
Looks like the number of regions is lower than the number of nodes in the cluster. 

Can you split the table such that, after hbase balancer is run, there is region hosted by every node ?

Cheers

On Oct 8, 2014, at 11:01 PM, SF Hadoop <sf...@gmail.com> wrote:

> I'm not sure if this is an HBase issue or an Hadoop issue so if this is "off-topic" please forgive.
> 
> I am having a problem with Hadoop maxing out drive space on a select few nodes when I am running an HBase job.  The scenario is this:
> 
> - The job is a data import using Map/Reduce / HBase
> - The data is being imported to one table
> - The table only has a couple of regions
> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the datanode / regionserver that is hosting  the regions
> - As the job progresses (and more data is imported) the two datanodes hosting the regions start to get full and eventually drive space hits 100% utilization whilst the other nodes in the cluster are at 40% or less drive space utilization
> - The job in Hadoop then begins to hang with multiple "out of space" errors and eventually fails.
> 
> I have tried running hadoop balancer during the job run and this helped but only really succeeded in prolonging the eventual job failure.
> 
> How can I get Hadoop / HBase to distribute the data to HDFS more evenly when it is favoring the nodes that the regions are on?
> 
> Am I missing something here?
> 
> Thanks for any help.

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by Ted Yu <yu...@gmail.com>.
Looks like the number of regions is lower than the number of nodes in the cluster. 

Can you split the table such that, after hbase balancer is run, there is region hosted by every node ?

Cheers

On Oct 8, 2014, at 11:01 PM, SF Hadoop <sf...@gmail.com> wrote:

> I'm not sure if this is an HBase issue or an Hadoop issue so if this is "off-topic" please forgive.
> 
> I am having a problem with Hadoop maxing out drive space on a select few nodes when I am running an HBase job.  The scenario is this:
> 
> - The job is a data import using Map/Reduce / HBase
> - The data is being imported to one table
> - The table only has a couple of regions
> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the datanode / regionserver that is hosting  the regions
> - As the job progresses (and more data is imported) the two datanodes hosting the regions start to get full and eventually drive space hits 100% utilization whilst the other nodes in the cluster are at 40% or less drive space utilization
> - The job in Hadoop then begins to hang with multiple "out of space" errors and eventually fails.
> 
> I have tried running hadoop balancer during the job run and this helped but only really succeeded in prolonging the eventual job failure.
> 
> How can I get Hadoop / HBase to distribute the data to HDFS more evenly when it is favoring the nodes that the regions are on?
> 
> Am I missing something here?
> 
> Thanks for any help.

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by SF Hadoop <sf...@gmail.com>.
This doesn't help because the space is simply reserved for the OS. Hadoop
still maxes out its quota and spits out "out of space" errors.

Thanks

On Wednesday, October 8, 2014, Bing Jiang <ji...@gmail.com> wrote:

> Could you set a reserved room for non-dfs usage? Just to avoid the disk
> gets full.  <hdfs-site.xml>
>
> <property>
>
> <name>dfs.datanode.du.reserved</name>
>
> <value></value>
>
> <description>Reserved space in bytes per volume. Always leave this much
> space free for non dfs use.
>
> </description>
>
> </property>
>
> 2014-10-09 14:01 GMT+08:00 SF Hadoop <sfhadoop@gmail.com
> <javascript:_e(%7B%7D,'cvml','sfhadoop@gmail.com');>>:
>
>> I'm not sure if this is an HBase issue or an Hadoop issue so if this is
>> "off-topic" please forgive.
>>
>> I am having a problem with Hadoop maxing out drive space on a select few
>> nodes when I am running an HBase job.  The scenario is this:
>>
>> - The job is a data import using Map/Reduce / HBase
>> - The data is being imported to one table
>> - The table only has a couple of regions
>> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on
>> the datanode / regionserver that is hosting  the regions
>> - As the job progresses (and more data is imported) the two datanodes
>> hosting the regions start to get full and eventually drive space hits 100%
>> utilization whilst the other nodes in the cluster are at 40% or less drive
>> space utilization
>> - The job in Hadoop then begins to hang with multiple "out of space"
>> errors and eventually fails.
>>
>> I have tried running hadoop balancer during the job run and this helped
>> but only really succeeded in prolonging the eventual job failure.
>>
>> How can I get Hadoop / HBase to distribute the data to HDFS more evenly
>> when it is favoring the nodes that the regions are on?
>>
>> Am I missing something here?
>>
>> Thanks for any help.
>>
>
>
>
> --
> Bing Jiang
>
>

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by SF Hadoop <sf...@gmail.com>.
This doesn't help because the space is simply reserved for the OS. Hadoop
still maxes out its quota and spits out "out of space" errors.

Thanks

On Wednesday, October 8, 2014, Bing Jiang <ji...@gmail.com> wrote:

> Could you set a reserved room for non-dfs usage? Just to avoid the disk
> gets full.  <hdfs-site.xml>
>
> <property>
>
> <name>dfs.datanode.du.reserved</name>
>
> <value></value>
>
> <description>Reserved space in bytes per volume. Always leave this much
> space free for non dfs use.
>
> </description>
>
> </property>
>
> 2014-10-09 14:01 GMT+08:00 SF Hadoop <sfhadoop@gmail.com
> <javascript:_e(%7B%7D,'cvml','sfhadoop@gmail.com');>>:
>
>> I'm not sure if this is an HBase issue or an Hadoop issue so if this is
>> "off-topic" please forgive.
>>
>> I am having a problem with Hadoop maxing out drive space on a select few
>> nodes when I am running an HBase job.  The scenario is this:
>>
>> - The job is a data import using Map/Reduce / HBase
>> - The data is being imported to one table
>> - The table only has a couple of regions
>> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on
>> the datanode / regionserver that is hosting  the regions
>> - As the job progresses (and more data is imported) the two datanodes
>> hosting the regions start to get full and eventually drive space hits 100%
>> utilization whilst the other nodes in the cluster are at 40% or less drive
>> space utilization
>> - The job in Hadoop then begins to hang with multiple "out of space"
>> errors and eventually fails.
>>
>> I have tried running hadoop balancer during the job run and this helped
>> but only really succeeded in prolonging the eventual job failure.
>>
>> How can I get Hadoop / HBase to distribute the data to HDFS more evenly
>> when it is favoring the nodes that the regions are on?
>>
>> Am I missing something here?
>>
>> Thanks for any help.
>>
>
>
>
> --
> Bing Jiang
>
>

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by SF Hadoop <sf...@gmail.com>.
This doesn't help because the space is simply reserved for the OS. Hadoop
still maxes out its quota and spits out "out of space" errors.

Thanks

On Wednesday, October 8, 2014, Bing Jiang <ji...@gmail.com> wrote:

> Could you set a reserved room for non-dfs usage? Just to avoid the disk
> gets full.  <hdfs-site.xml>
>
> <property>
>
> <name>dfs.datanode.du.reserved</name>
>
> <value></value>
>
> <description>Reserved space in bytes per volume. Always leave this much
> space free for non dfs use.
>
> </description>
>
> </property>
>
> 2014-10-09 14:01 GMT+08:00 SF Hadoop <sfhadoop@gmail.com
> <javascript:_e(%7B%7D,'cvml','sfhadoop@gmail.com');>>:
>
>> I'm not sure if this is an HBase issue or an Hadoop issue so if this is
>> "off-topic" please forgive.
>>
>> I am having a problem with Hadoop maxing out drive space on a select few
>> nodes when I am running an HBase job.  The scenario is this:
>>
>> - The job is a data import using Map/Reduce / HBase
>> - The data is being imported to one table
>> - The table only has a couple of regions
>> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on
>> the datanode / regionserver that is hosting  the regions
>> - As the job progresses (and more data is imported) the two datanodes
>> hosting the regions start to get full and eventually drive space hits 100%
>> utilization whilst the other nodes in the cluster are at 40% or less drive
>> space utilization
>> - The job in Hadoop then begins to hang with multiple "out of space"
>> errors and eventually fails.
>>
>> I have tried running hadoop balancer during the job run and this helped
>> but only really succeeded in prolonging the eventual job failure.
>>
>> How can I get Hadoop / HBase to distribute the data to HDFS more evenly
>> when it is favoring the nodes that the regions are on?
>>
>> Am I missing something here?
>>
>> Thanks for any help.
>>
>
>
>
> --
> Bing Jiang
>
>

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by SF Hadoop <sf...@gmail.com>.
This doesn't help because the space is simply reserved for the OS. Hadoop
still maxes out its quota and spits out "out of space" errors.

Thanks

On Wednesday, October 8, 2014, Bing Jiang <ji...@gmail.com> wrote:

> Could you set a reserved room for non-dfs usage? Just to avoid the disk
> gets full.  <hdfs-site.xml>
>
> <property>
>
> <name>dfs.datanode.du.reserved</name>
>
> <value></value>
>
> <description>Reserved space in bytes per volume. Always leave this much
> space free for non dfs use.
>
> </description>
>
> </property>
>
> 2014-10-09 14:01 GMT+08:00 SF Hadoop <sfhadoop@gmail.com
> <javascript:_e(%7B%7D,'cvml','sfhadoop@gmail.com');>>:
>
>> I'm not sure if this is an HBase issue or an Hadoop issue so if this is
>> "off-topic" please forgive.
>>
>> I am having a problem with Hadoop maxing out drive space on a select few
>> nodes when I am running an HBase job.  The scenario is this:
>>
>> - The job is a data import using Map/Reduce / HBase
>> - The data is being imported to one table
>> - The table only has a couple of regions
>> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on
>> the datanode / regionserver that is hosting  the regions
>> - As the job progresses (and more data is imported) the two datanodes
>> hosting the regions start to get full and eventually drive space hits 100%
>> utilization whilst the other nodes in the cluster are at 40% or less drive
>> space utilization
>> - The job in Hadoop then begins to hang with multiple "out of space"
>> errors and eventually fails.
>>
>> I have tried running hadoop balancer during the job run and this helped
>> but only really succeeded in prolonging the eventual job failure.
>>
>> How can I get Hadoop / HBase to distribute the data to HDFS more evenly
>> when it is favoring the nodes that the regions are on?
>>
>> Am I missing something here?
>>
>> Thanks for any help.
>>
>
>
>
> --
> Bing Jiang
>
>

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by Bing Jiang <ji...@gmail.com>.
Could you set a reserved room for non-dfs usage? Just to avoid the disk
gets full.  <hdfs-site.xml>

<property>

<name>dfs.datanode.du.reserved</name>

<value></value>

<description>Reserved space in bytes per volume. Always leave this much
space free for non dfs use.

</description>

</property>

2014-10-09 14:01 GMT+08:00 SF Hadoop <sf...@gmail.com>:

> I'm not sure if this is an HBase issue or an Hadoop issue so if this is
> "off-topic" please forgive.
>
> I am having a problem with Hadoop maxing out drive space on a select few
> nodes when I am running an HBase job.  The scenario is this:
>
> - The job is a data import using Map/Reduce / HBase
> - The data is being imported to one table
> - The table only has a couple of regions
> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the
> datanode / regionserver that is hosting  the regions
> - As the job progresses (and more data is imported) the two datanodes
> hosting the regions start to get full and eventually drive space hits 100%
> utilization whilst the other nodes in the cluster are at 40% or less drive
> space utilization
> - The job in Hadoop then begins to hang with multiple "out of space"
> errors and eventually fails.
>
> I have tried running hadoop balancer during the job run and this helped
> but only really succeeded in prolonging the eventual job failure.
>
> How can I get Hadoop / HBase to distribute the data to HDFS more evenly
> when it is favoring the nodes that the regions are on?
>
> Am I missing something here?
>
> Thanks for any help.
>



-- 
Bing Jiang

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by Bing Jiang <ji...@gmail.com>.
Could you set a reserved room for non-dfs usage? Just to avoid the disk
gets full.  <hdfs-site.xml>

<property>

<name>dfs.datanode.du.reserved</name>

<value></value>

<description>Reserved space in bytes per volume. Always leave this much
space free for non dfs use.

</description>

</property>

2014-10-09 14:01 GMT+08:00 SF Hadoop <sf...@gmail.com>:

> I'm not sure if this is an HBase issue or an Hadoop issue so if this is
> "off-topic" please forgive.
>
> I am having a problem with Hadoop maxing out drive space on a select few
> nodes when I am running an HBase job.  The scenario is this:
>
> - The job is a data import using Map/Reduce / HBase
> - The data is being imported to one table
> - The table only has a couple of regions
> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the
> datanode / regionserver that is hosting  the regions
> - As the job progresses (and more data is imported) the two datanodes
> hosting the regions start to get full and eventually drive space hits 100%
> utilization whilst the other nodes in the cluster are at 40% or less drive
> space utilization
> - The job in Hadoop then begins to hang with multiple "out of space"
> errors and eventually fails.
>
> I have tried running hadoop balancer during the job run and this helped
> but only really succeeded in prolonging the eventual job failure.
>
> How can I get Hadoop / HBase to distribute the data to HDFS more evenly
> when it is favoring the nodes that the regions are on?
>
> Am I missing something here?
>
> Thanks for any help.
>



-- 
Bing Jiang

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by Ted Yu <yu...@gmail.com>.
Looks like the number of regions is lower than the number of nodes in the cluster. 

Can you split the table such that, after hbase balancer is run, there is region hosted by every node ?

Cheers

On Oct 8, 2014, at 11:01 PM, SF Hadoop <sf...@gmail.com> wrote:

> I'm not sure if this is an HBase issue or an Hadoop issue so if this is "off-topic" please forgive.
> 
> I am having a problem with Hadoop maxing out drive space on a select few nodes when I am running an HBase job.  The scenario is this:
> 
> - The job is a data import using Map/Reduce / HBase
> - The data is being imported to one table
> - The table only has a couple of regions
> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the datanode / regionserver that is hosting  the regions
> - As the job progresses (and more data is imported) the two datanodes hosting the regions start to get full and eventually drive space hits 100% utilization whilst the other nodes in the cluster are at 40% or less drive space utilization
> - The job in Hadoop then begins to hang with multiple "out of space" errors and eventually fails.
> 
> I have tried running hadoop balancer during the job run and this helped but only really succeeded in prolonging the eventual job failure.
> 
> How can I get Hadoop / HBase to distribute the data to HDFS more evenly when it is favoring the nodes that the regions are on?
> 
> Am I missing something here?
> 
> Thanks for any help.

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by Bing Jiang <ji...@gmail.com>.
Could you set a reserved room for non-dfs usage? Just to avoid the disk
gets full.  <hdfs-site.xml>

<property>

<name>dfs.datanode.du.reserved</name>

<value></value>

<description>Reserved space in bytes per volume. Always leave this much
space free for non dfs use.

</description>

</property>

2014-10-09 14:01 GMT+08:00 SF Hadoop <sf...@gmail.com>:

> I'm not sure if this is an HBase issue or an Hadoop issue so if this is
> "off-topic" please forgive.
>
> I am having a problem with Hadoop maxing out drive space on a select few
> nodes when I am running an HBase job.  The scenario is this:
>
> - The job is a data import using Map/Reduce / HBase
> - The data is being imported to one table
> - The table only has a couple of regions
> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the
> datanode / regionserver that is hosting  the regions
> - As the job progresses (and more data is imported) the two datanodes
> hosting the regions start to get full and eventually drive space hits 100%
> utilization whilst the other nodes in the cluster are at 40% or less drive
> space utilization
> - The job in Hadoop then begins to hang with multiple "out of space"
> errors and eventually fails.
>
> I have tried running hadoop balancer during the job run and this helped
> but only really succeeded in prolonging the eventual job failure.
>
> How can I get Hadoop / HBase to distribute the data to HDFS more evenly
> when it is favoring the nodes that the regions are on?
>
> Am I missing something here?
>
> Thanks for any help.
>



-- 
Bing Jiang

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by Bing Jiang <ji...@gmail.com>.
Could you set a reserved room for non-dfs usage? Just to avoid the disk
gets full.  <hdfs-site.xml>

<property>

<name>dfs.datanode.du.reserved</name>

<value></value>

<description>Reserved space in bytes per volume. Always leave this much
space free for non dfs use.

</description>

</property>

2014-10-09 14:01 GMT+08:00 SF Hadoop <sf...@gmail.com>:

> I'm not sure if this is an HBase issue or an Hadoop issue so if this is
> "off-topic" please forgive.
>
> I am having a problem with Hadoop maxing out drive space on a select few
> nodes when I am running an HBase job.  The scenario is this:
>
> - The job is a data import using Map/Reduce / HBase
> - The data is being imported to one table
> - The table only has a couple of regions
> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the
> datanode / regionserver that is hosting  the regions
> - As the job progresses (and more data is imported) the two datanodes
> hosting the regions start to get full and eventually drive space hits 100%
> utilization whilst the other nodes in the cluster are at 40% or less drive
> space utilization
> - The job in Hadoop then begins to hang with multiple "out of space"
> errors and eventually fails.
>
> I have tried running hadoop balancer during the job run and this helped
> but only really succeeded in prolonging the eventual job failure.
>
> How can I get Hadoop / HBase to distribute the data to HDFS more evenly
> when it is favoring the nodes that the regions are on?
>
> Am I missing something here?
>
> Thanks for any help.
>



-- 
Bing Jiang

Re: Hadoop / HBase hotspotting / overloading specific nodes

Posted by Ted Yu <yu...@gmail.com>.
Looks like the number of regions is lower than the number of nodes in the cluster. 

Can you split the table such that, after hbase balancer is run, there is region hosted by every node ?

Cheers

On Oct 8, 2014, at 11:01 PM, SF Hadoop <sf...@gmail.com> wrote:

> I'm not sure if this is an HBase issue or an Hadoop issue so if this is "off-topic" please forgive.
> 
> I am having a problem with Hadoop maxing out drive space on a select few nodes when I am running an HBase job.  The scenario is this:
> 
> - The job is a data import using Map/Reduce / HBase
> - The data is being imported to one table
> - The table only has a couple of regions
> - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the datanode / regionserver that is hosting  the regions
> - As the job progresses (and more data is imported) the two datanodes hosting the regions start to get full and eventually drive space hits 100% utilization whilst the other nodes in the cluster are at 40% or less drive space utilization
> - The job in Hadoop then begins to hang with multiple "out of space" errors and eventually fails.
> 
> I have tried running hadoop balancer during the job run and this helped but only really succeeded in prolonging the eventual job failure.
> 
> How can I get Hadoop / HBase to distribute the data to HDFS more evenly when it is favoring the nodes that the regions are on?
> 
> Am I missing something here?
> 
> Thanks for any help.