You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Thanasis Naskos <an...@csd.auth.gr> on 2013/10/04 11:07:32 UTC

Newly added regionserver is not severing requests

I'm setting up a Hbase cluster on a cloud infrastructure.
HBase version: 0.94.11
Hadoop version: 1.0.4

Currently I have 4 nodes in my cluster (1 master, 3 regionservers) and 
I'm using YCSB (yahoo benchmarks) to create a table (500.000 rows) and 
send requests (Asynchronous requests). Everything works fine with this 
setup (as I'm monitoring the hole process with ganglia and I'm getting 
lamda, throughput, latency combined with the YCSB's output), but the 
problem occurs when I add a new regionserver on-the-fly as it doesn't 
getting any requests.

What "on-the-fly" means:
While the YCSB is sending request to the cluster, I'm adding new 
regionservers using python scripts.

Addition Process (while the cluster is serving requests):

 1. I'm creating a new VM which will act as the new regionserver and
    configure every needed aspect (hbase, hadoop, /etc/host, connect to
    private network, etc)
 2. Stoping **hbase** balancer
 3. Configuring every node in the cluster with the new node's information
      * adding hostname to regioservers files
      * adding hostname to hadoop's slave file
      * adding hostname and IP to /etc/host file of every node
      * etc
 4. Executing on the master node:
      * `hadoop/bin/start-dfs.sh`
      * `hadoop/bin/start-mapred.sh`
      * `hbase/bin/start-hbase.sh`
        (I've also tried to run `hbase start regionserver` on the newly
        added node and does exactly the same with the last command -
        starts the regionserver)
 5. Once the newly added node is up and running I'm executing **hadoop**
    load balancer
 6. When the hadoop load balancer stops I'm starting again the **hbase**
    load balancer

I'm connecting over ssh to the master node and check that the load 
balancers (hbase/hadoop) did their job as both the blocks and regions 
are uniformly spread across all the regionservers/slaves including the 
new one.
But when I run status 'simple' in the hbase shell I see that the new 
regionservers are not getting any requests. (below is the output of the 
command after adding 2 new regionserver "okeanos-nodes-4/5")

|hbase(main):008:0> status 'simple'
5 live servers
     okeanos-nodes-1:60020 1380865800330
         requestsPerSecond=5379, numberOfOnlineRegions=4, usedHeapMB=175, maxHeapMB=3067
     okeanos-nodes-2:60020 1380865800738
         requestsPerSecond=5674, numberOfOnlineRegions=4, usedHeapMB=161, maxHeapMB=3067
     okeanos-nodes-5:60020 1380867725605
         requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=27, maxHeapMB=3067
     okeanos-nodes-3:60020 1380865800162
         requestsPerSecond=3871, numberOfOnlineRegions=5, usedHeapMB=162, maxHeapMB=3067
     okeanos-nodes-4:60020 1380866702216
         requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=29, maxHeapMB=3067
0 dead servers
Aggregate load: 14924, regions: 19|

The fact that they don't serve any requests is also evidenced by the CPU 
usage, in a serving regionserver is about 70% while in these 2 
regioservers is about 2%.

Below is the output of|hadoop dfsadmin -report|, as you can see the 
block are evenly distributed (according to|hadoop balancer -threshold 2|).

|root@okeanos-nodes-master:~# /opt/hadoop-1.0.4/bin/hadoop dfsadmin -report
Configured Capacity: 105701683200 (98.44 GB)
Present Capacity: 86440648704 (80.5 GB)
DFS Remaining: 84188446720 (78.41 GB)
DFS Used: 2252201984 (2.1 GB)
DFS Used%: 2.61%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 5 (5 total, 0 dead)

Name: 10.0.0.11:50010
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 309166080 (294.84 MB)
Non DFS Used: 3851579392 (3.59 GB)
DFS Remaining: 16979591168(15.81 GB)
DFS Used%: 1.46%
DFS Remaining%: 80.32%
Last contact: Fri Oct 04 11:30:31 EEST 2013


Name: 10.0.0.3:50010
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 531652608 (507.02 MB)
Non DFS Used: 3852300288 (3.59 GB)
DFS Remaining: 16756383744(15.61 GB)
DFS Used%: 2.51%
DFS Remaining%: 79.26%
Last contact: Fri Oct 04 11:30:32 EEST 2013


Name: 10.0.0.5:50010
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 502910976 (479.61 MB)
Non DFS Used: 3853029376 (3.59 GB)
DFS Remaining: 16784396288(15.63 GB)
DFS Used%: 2.38%
DFS Remaining%: 79.4%
Last contact: Fri Oct 04 11:30:32 EEST 2013


Name: 10.0.0.4:50010
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 421974016 (402.43 MB)
Non DFS Used: 3852365824 (3.59 GB)
DFS Remaining: 16865996800(15.71 GB)
DFS Used%: 2%
DFS Remaining%: 79.78%
Last contact: Fri Oct 04 11:30:29 EEST 2013


Name: 10.0.0.10:50010
Decommission Status : Normal
Configured Capacity: 21140336640 (19.69 GB)
DFS Used: 486498304 (463.96 MB)
Non DFS Used: 3851759616 (3.59 GB)
DFS Remaining: 16802078720(15.65 GB)
DFS Used%: 2.3%
DFS Remaining%: 79.48%
Last contact: Fri Oct 04 11:30:29 EEST 2013|

I've tried stopping YCSB, restarting hbase master and restarting YCSB 
but with no lack.. these 2 nodes don't serve any requests!

As there are many log and conf files, I have created a zip file with 
logs and confs (both hbase and hadoop) of the master, a healthy 
regionserver serving requests and a regionserver not serving 
requests.https://dl.dropboxusercontent.com/u/13480502/hbase_hadoop_logs__conf.zip

Thank you in advance!!

Re: Newly added regionserver is not severing requests

Posted by Thanasis Naskos <an...@csd.auth.gr>.

[SOLVED] I find what was going on and it had nothing to do with Hbase... 
I have forgotten to add the hostname and IP of the new RS to the YCSB 
server VM.... :-(

Thanks again Bharath for your interest

On 10/04/2013 02:18 PM, Thanasis Naskos wrote:
>> One possibility could be that the regions got balanced after the 
>> write load
>> is complete. That means, when the regions were being written they 
>> were with
>> one RS and once that is done, the region got assigned to the idle RS.
>
> I think that this is the case, but why is this wrong? I write the data 
> to the database with 3 RS's and when the write load is finished I add 
> one more RS and run hadoop and hbase load balancers to assign some 
> data and regions (respectively) to this new node (without adding new 
> data).... Shouldn't this work?
>
>> Are you sure you are that YCSB writes to the regions after balancing 
>> too?
>
> I should have mentioned that once the data is written to the RS's (3 
> RS's), YCSB sends only READ requests and doesn't write/insert/update 
> anything else to the database even after new nodes (RS's) are added.
>
>> Also you can run your benchmark now (after regions are balanced) and 
>> write
>> some data to the regions on idle RS and see if it increases the request
>> count.
>
> I've tried to add (put) a new row to the database from inside the idle 
> RS (shell) and the row was inserted properly (I've checked it with 
> "get" )... but as expected nothing changed still I have 2 RS's idle
>
> Thank you for your interest!!
>
> On 10/04/2013 12:45 PM, Bharath Vissapragada wrote:
>> One possibility could be that the regions got balanced after the 
>> write load
>> is complete. That means, when the regions were being written they 
>> were with
>> one RS and once that is done, the region got assigned to the idle RS.
>>
>> Are you sure you are that YCSB writes to the regions after balancing 
>> too?
>> Also you can run your benchmark now (after regions are balanced) and 
>> write
>> some data to the regions on idle RS and see if it increases the request
>> count.
>>
>>
>> On Fri, Oct 4, 2013 at 2:37 PM, Thanasis Naskos <an...@csd.auth.gr> 
>> wrote:
>>
>>> I'm setting up a Hbase cluster on a cloud infrastructure.
>>> HBase version: 0.94.11
>>> Hadoop version: 1.0.4
>>>
>>> Currently I have 4 nodes in my cluster (1 master, 3 regionservers) 
>>> and I'm
>>> using YCSB (yahoo benchmarks) to create a table (500.000 rows) and send
>>> requests (Asynchronous requests). Everything works fine with this 
>>> setup (as
>>> I'm monitoring the hole process with ganglia and I'm getting lamda,
>>> throughput, latency combined with the YCSB's output), but the problem
>>> occurs when I add a new regionserver on-the-fly as it doesn't 
>>> getting any
>>> requests.
>>>
>>> What "on-the-fly" means:
>>> While the YCSB is sending request to the cluster, I'm adding new
>>> regionservers using python scripts.
>>>
>>> Addition Process (while the cluster is serving requests):
>>>
>>> 1. I'm creating a new VM which will act as the new regionserver and
>>>     configure every needed aspect (hbase, hadoop, /etc/host, connect to
>>>     private network, etc)
>>> 2. Stoping **hbase** balancer
>>> 3. Configuring every node in the cluster with the new node's 
>>> information
>>>       * adding hostname to regioservers files
>>>       * adding hostname to hadoop's slave file
>>>       * adding hostname and IP to /etc/host file of every node
>>>       * etc
>>> 4. Executing on the master node:
>>>       * `hadoop/bin/start-dfs.sh`
>>>       * `hadoop/bin/start-mapred.sh`
>>>       * `hbase/bin/start-hbase.sh`
>>>         (I've also tried to run `hbase start regionserver` on the newly
>>>         added node and does exactly the same with the last command -
>>>         starts the regionserver)
>>> 5. Once the newly added node is up and running I'm executing **hadoop**
>>>     load balancer
>>> 6. When the hadoop load balancer stops I'm starting again the **hbase**
>>>     load balancer
>>>
>>> I'm connecting over ssh to the master node and check that the load
>>> balancers (hbase/hadoop) did their job as both the blocks and 
>>> regions are
>>> uniformly spread across all the regionservers/slaves including the 
>>> new one.
>>> But when I run status 'simple' in the hbase shell I see that the new
>>> regionservers are not getting any requests. (below is the output of the
>>> command after adding 2 new regionserver "okeanos-nodes-4/5")
>>>
>>> |hbase(main):008:0> status 'simple'
>>> 5 live servers
>>>      okeanos-nodes-1:60020 1380865800330
>>>          requestsPerSecond=5379, numberOfOnlineRegions=4, 
>>> usedHeapMB=175,
>>> maxHeapMB=3067
>>>      okeanos-nodes-2:60020 1380865800738
>>>          requestsPerSecond=5674, numberOfOnlineRegions=4, 
>>> usedHeapMB=161,
>>> maxHeapMB=3067
>>>      okeanos-nodes-5:60020 1380867725605
>>>          requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=27,
>>> maxHeapMB=3067
>>>      okeanos-nodes-3:60020 1380865800162
>>>          requestsPerSecond=3871, numberOfOnlineRegions=5, 
>>> usedHeapMB=162,
>>> maxHeapMB=3067
>>>      okeanos-nodes-4:60020 1380866702216
>>>          requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=29,
>>> maxHeapMB=3067
>>> 0 dead servers
>>> Aggregate load: 14924, regions: 19|
>>>
>>> The fact that they don't serve any requests is also evidenced by the 
>>> CPU
>>> usage, in a serving regionserver is about 70% while in these 2 
>>> regioservers
>>> is about 2%.
>>>
>>> Below is the output of|hadoop dfsadmin -report|, as you can see the 
>>> block
>>> are evenly distributed (according to|hadoop balancer -threshold 2|).
>>>
>>> |root@okeanos-nodes-master:~# /opt/hadoop-1.0.4/bin/hadoop dfsadmin
>>> -report
>>> Configured Capacity: 105701683200 (98.44 GB)
>>> Present Capacity: 86440648704 (80.5 GB)
>>> DFS Remaining: 84188446720 (78.41 GB)
>>> DFS Used: 2252201984 (2.1 GB)
>>> DFS Used%: 2.61%
>>> Under replicated blocks: 0
>>> Blocks with corrupt replicas: 0
>>> Missing blocks: 0
>>>
>>> ------------------------------**-------------------
>>> Datanodes available: 5 (5 total, 0 dead)
>>>
>>> Name: 10.0.0.11:50010
>>> Decommission Status : Normal
>>> Configured Capacity: 21140336640 (19.69 GB)
>>> DFS Used: 309166080 (294.84 MB)
>>> Non DFS Used: 3851579392 (3.59 GB)
>>> DFS Remaining: 16979591168(15.81 GB)
>>> DFS Used%: 1.46%
>>> DFS Remaining%: 80.32%
>>> Last contact: Fri Oct 04 11:30:31 EEST 2013
>>>
>>>
>>> Name: 10.0.0.3:50010
>>> Decommission Status : Normal
>>> Configured Capacity: 21140336640 (19.69 GB)
>>> DFS Used: 531652608 (507.02 MB)
>>> Non DFS Used: 3852300288 (3.59 GB)
>>> DFS Remaining: 16756383744(15.61 GB)
>>> DFS Used%: 2.51%
>>> DFS Remaining%: 79.26%
>>> Last contact: Fri Oct 04 11:30:32 EEST 2013
>>>
>>>
>>> Name: 10.0.0.5:50010
>>> Decommission Status : Normal
>>> Configured Capacity: 21140336640 (19.69 GB)
>>> DFS Used: 502910976 (479.61 MB)
>>> Non DFS Used: 3853029376 (3.59 GB)
>>> DFS Remaining: 16784396288(15.63 GB)
>>> DFS Used%: 2.38%
>>> DFS Remaining%: 79.4%
>>> Last contact: Fri Oct 04 11:30:32 EEST 2013
>>>
>>>
>>> Name: 10.0.0.4:50010
>>> Decommission Status : Normal
>>> Configured Capacity: 21140336640 (19.69 GB)
>>> DFS Used: 421974016 (402.43 MB)
>>> Non DFS Used: 3852365824 (3.59 GB)
>>> DFS Remaining: 16865996800(15.71 GB)
>>> DFS Used%: 2%
>>> DFS Remaining%: 79.78%
>>> Last contact: Fri Oct 04 11:30:29 EEST 2013
>>>
>>>
>>> Name: 10.0.0.10:50010
>>> Decommission Status : Normal
>>> Configured Capacity: 21140336640 (19.69 GB)
>>> DFS Used: 486498304 (463.96 MB)
>>> Non DFS Used: 3851759616 (3.59 GB)
>>> DFS Remaining: 16802078720(15.65 GB)
>>> DFS Used%: 2.3%
>>> DFS Remaining%: 79.48%
>>> Last contact: Fri Oct 04 11:30:29 EEST 2013|
>>>
>>> I've tried stopping YCSB, restarting hbase master and restarting 
>>> YCSB but
>>> with no lack.. these 2 nodes don't serve any requests!
>>>
>>> As there are many log and conf files, I have created a zip file with 
>>> logs
>>> and confs (both hbase and hadoop) of the master, a healthy regionserver
>>> serving requests and a regionserver not serving requests.https://dl.**
>>> dropboxusercontent.com/u/**13480502/hbase_hadoop_logs__**conf.zip<https://dl.dropboxusercontent.com/u/13480502/hbase_hadoop_logs__conf.zip> 
>>>
>>>
>>> Thank you in advance!!
>>>
>>>
>>
>

Re: Newly added regionserver is not severing requests

Posted by Thanasis Naskos <an...@csd.auth.gr>.

>One possibility could be that the regions got balanced after the write load
>is complete. That means, when the regions were being written they were with
>one RS and once that is done, the region got assigned to the idle RS.

I think that this is the case, but why is this wrong? I write the data 
to the database with 3 RS's and when the write load is finished I add 
one more RS and run hadoop and hbase load balancers to assign some data 
and regions (respectively) to this new node (without adding new 
data).... Shouldn't this work?

>Are you sure you are that YCSB writes to the regions after balancing too?

I should have mentioned that once the data is written to the RS's (3 
RS's), YCSB sends only READ requests and doesn't write/insert/update 
anything else to the database even after new nodes (RS's) are added.

>Also you can run your benchmark now (after regions are balanced) and write
>some data to the regions on idle RS and see if it increases the request
>count.

I've tried to add (put) a new row to the database from inside the idle 
RS (shell) and the row was inserted properly (I've checked it with "get" 
)... but as expected nothing changed still I have 2 RS's idle

Thank you for your interest!!

On 10/04/2013 12:45 PM, Bharath Vissapragada wrote:
> One possibility could be that the regions got balanced after the write load
> is complete. That means, when the regions were being written they were with
> one RS and once that is done, the region got assigned to the idle RS.
>
> Are you sure you are that YCSB writes to the regions after balancing too?
> Also you can run your benchmark now (after regions are balanced) and write
> some data to the regions on idle RS and see if it increases the request
> count.
>
>
> On Fri, Oct 4, 2013 at 2:37 PM, Thanasis Naskos <an...@csd.auth.gr> wrote:
>
>> I'm setting up a Hbase cluster on a cloud infrastructure.
>> HBase version: 0.94.11
>> Hadoop version: 1.0.4
>>
>> Currently I have 4 nodes in my cluster (1 master, 3 regionservers) and I'm
>> using YCSB (yahoo benchmarks) to create a table (500.000 rows) and send
>> requests (Asynchronous requests). Everything works fine with this setup (as
>> I'm monitoring the hole process with ganglia and I'm getting lamda,
>> throughput, latency combined with the YCSB's output), but the problem
>> occurs when I add a new regionserver on-the-fly as it doesn't getting any
>> requests.
>>
>> What "on-the-fly" means:
>> While the YCSB is sending request to the cluster, I'm adding new
>> regionservers using python scripts.
>>
>> Addition Process (while the cluster is serving requests):
>>
>> 1. I'm creating a new VM which will act as the new regionserver and
>>     configure every needed aspect (hbase, hadoop, /etc/host, connect to
>>     private network, etc)
>> 2. Stoping **hbase** balancer
>> 3. Configuring every node in the cluster with the new node's information
>>       * adding hostname to regioservers files
>>       * adding hostname to hadoop's slave file
>>       * adding hostname and IP to /etc/host file of every node
>>       * etc
>> 4. Executing on the master node:
>>       * `hadoop/bin/start-dfs.sh`
>>       * `hadoop/bin/start-mapred.sh`
>>       * `hbase/bin/start-hbase.sh`
>>         (I've also tried to run `hbase start regionserver` on the newly
>>         added node and does exactly the same with the last command -
>>         starts the regionserver)
>> 5. Once the newly added node is up and running I'm executing **hadoop**
>>     load balancer
>> 6. When the hadoop load balancer stops I'm starting again the **hbase**
>>     load balancer
>>
>> I'm connecting over ssh to the master node and check that the load
>> balancers (hbase/hadoop) did their job as both the blocks and regions are
>> uniformly spread across all the regionservers/slaves including the new one.
>> But when I run status 'simple' in the hbase shell I see that the new
>> regionservers are not getting any requests. (below is the output of the
>> command after adding 2 new regionserver "okeanos-nodes-4/5")
>>
>> |hbase(main):008:0> status 'simple'
>> 5 live servers
>>      okeanos-nodes-1:60020 1380865800330
>>          requestsPerSecond=5379, numberOfOnlineRegions=4, usedHeapMB=175,
>> maxHeapMB=3067
>>      okeanos-nodes-2:60020 1380865800738
>>          requestsPerSecond=5674, numberOfOnlineRegions=4, usedHeapMB=161,
>> maxHeapMB=3067
>>      okeanos-nodes-5:60020 1380867725605
>>          requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=27,
>> maxHeapMB=3067
>>      okeanos-nodes-3:60020 1380865800162
>>          requestsPerSecond=3871, numberOfOnlineRegions=5, usedHeapMB=162,
>> maxHeapMB=3067
>>      okeanos-nodes-4:60020 1380866702216
>>          requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=29,
>> maxHeapMB=3067
>> 0 dead servers
>> Aggregate load: 14924, regions: 19|
>>
>> The fact that they don't serve any requests is also evidenced by the CPU
>> usage, in a serving regionserver is about 70% while in these 2 regioservers
>> is about 2%.
>>
>> Below is the output of|hadoop dfsadmin -report|, as you can see the block
>> are evenly distributed (according to|hadoop balancer -threshold 2|).
>>
>> |root@okeanos-nodes-master:~# /opt/hadoop-1.0.4/bin/hadoop dfsadmin
>> -report
>> Configured Capacity: 105701683200 (98.44 GB)
>> Present Capacity: 86440648704 (80.5 GB)
>> DFS Remaining: 84188446720 (78.41 GB)
>> DFS Used: 2252201984 (2.1 GB)
>> DFS Used%: 2.61%
>> Under replicated blocks: 0
>> Blocks with corrupt replicas: 0
>> Missing blocks: 0
>>
>> ------------------------------**-------------------
>> Datanodes available: 5 (5 total, 0 dead)
>>
>> Name: 10.0.0.11:50010
>> Decommission Status : Normal
>> Configured Capacity: 21140336640 (19.69 GB)
>> DFS Used: 309166080 (294.84 MB)
>> Non DFS Used: 3851579392 (3.59 GB)
>> DFS Remaining: 16979591168(15.81 GB)
>> DFS Used%: 1.46%
>> DFS Remaining%: 80.32%
>> Last contact: Fri Oct 04 11:30:31 EEST 2013
>>
>>
>> Name: 10.0.0.3:50010
>> Decommission Status : Normal
>> Configured Capacity: 21140336640 (19.69 GB)
>> DFS Used: 531652608 (507.02 MB)
>> Non DFS Used: 3852300288 (3.59 GB)
>> DFS Remaining: 16756383744(15.61 GB)
>> DFS Used%: 2.51%
>> DFS Remaining%: 79.26%
>> Last contact: Fri Oct 04 11:30:32 EEST 2013
>>
>>
>> Name: 10.0.0.5:50010
>> Decommission Status : Normal
>> Configured Capacity: 21140336640 (19.69 GB)
>> DFS Used: 502910976 (479.61 MB)
>> Non DFS Used: 3853029376 (3.59 GB)
>> DFS Remaining: 16784396288(15.63 GB)
>> DFS Used%: 2.38%
>> DFS Remaining%: 79.4%
>> Last contact: Fri Oct 04 11:30:32 EEST 2013
>>
>>
>> Name: 10.0.0.4:50010
>> Decommission Status : Normal
>> Configured Capacity: 21140336640 (19.69 GB)
>> DFS Used: 421974016 (402.43 MB)
>> Non DFS Used: 3852365824 (3.59 GB)
>> DFS Remaining: 16865996800(15.71 GB)
>> DFS Used%: 2%
>> DFS Remaining%: 79.78%
>> Last contact: Fri Oct 04 11:30:29 EEST 2013
>>
>>
>> Name: 10.0.0.10:50010
>> Decommission Status : Normal
>> Configured Capacity: 21140336640 (19.69 GB)
>> DFS Used: 486498304 (463.96 MB)
>> Non DFS Used: 3851759616 (3.59 GB)
>> DFS Remaining: 16802078720(15.65 GB)
>> DFS Used%: 2.3%
>> DFS Remaining%: 79.48%
>> Last contact: Fri Oct 04 11:30:29 EEST 2013|
>>
>> I've tried stopping YCSB, restarting hbase master and restarting YCSB but
>> with no lack.. these 2 nodes don't serve any requests!
>>
>> As there are many log and conf files, I have created a zip file with logs
>> and confs (both hbase and hadoop) of the master, a healthy regionserver
>> serving requests and a regionserver not serving requests.https://dl.**
>> dropboxusercontent.com/u/**13480502/hbase_hadoop_logs__**conf.zip<https://dl.dropboxusercontent.com/u/13480502/hbase_hadoop_logs__conf.zip>
>>
>> Thank you in advance!!
>>
>>
>

Re: Newly added regionserver is not severing requests

Posted by Bharath Vissapragada <bh...@cloudera.com>.

One possibility could be that the regions got balanced after the write load
is complete. That means, when the regions were being written they were with
one RS and once that is done, the region got assigned to the idle RS.

Are you sure you are that YCSB writes to the regions after balancing too?
Also you can run your benchmark now (after regions are balanced) and write
some data to the regions on idle RS and see if it increases the request
count.


On Fri, Oct 4, 2013 at 2:37 PM, Thanasis Naskos <an...@csd.auth.gr> wrote:

> I'm setting up a Hbase cluster on a cloud infrastructure.
> HBase version: 0.94.11
> Hadoop version: 1.0.4
>
> Currently I have 4 nodes in my cluster (1 master, 3 regionservers) and I'm
> using YCSB (yahoo benchmarks) to create a table (500.000 rows) and send
> requests (Asynchronous requests). Everything works fine with this setup (as
> I'm monitoring the hole process with ganglia and I'm getting lamda,
> throughput, latency combined with the YCSB's output), but the problem
> occurs when I add a new regionserver on-the-fly as it doesn't getting any
> requests.
>
> What "on-the-fly" means:
> While the YCSB is sending request to the cluster, I'm adding new
> regionservers using python scripts.
>
> Addition Process (while the cluster is serving requests):
>
> 1. I'm creating a new VM which will act as the new regionserver and
>    configure every needed aspect (hbase, hadoop, /etc/host, connect to
>    private network, etc)
> 2. Stoping **hbase** balancer
> 3. Configuring every node in the cluster with the new node's information
>      * adding hostname to regioservers files
>      * adding hostname to hadoop's slave file
>      * adding hostname and IP to /etc/host file of every node
>      * etc
> 4. Executing on the master node:
>      * `hadoop/bin/start-dfs.sh`
>      * `hadoop/bin/start-mapred.sh`
>      * `hbase/bin/start-hbase.sh`
>        (I've also tried to run `hbase start regionserver` on the newly
>        added node and does exactly the same with the last command -
>        starts the regionserver)
> 5. Once the newly added node is up and running I'm executing **hadoop**
>    load balancer
> 6. When the hadoop load balancer stops I'm starting again the **hbase**
>    load balancer
>
> I'm connecting over ssh to the master node and check that the load
> balancers (hbase/hadoop) did their job as both the blocks and regions are
> uniformly spread across all the regionservers/slaves including the new one.
> But when I run status 'simple' in the hbase shell I see that the new
> regionservers are not getting any requests. (below is the output of the
> command after adding 2 new regionserver "okeanos-nodes-4/5")
>
> |hbase(main):008:0> status 'simple'
> 5 live servers
>     okeanos-nodes-1:60020 1380865800330
>         requestsPerSecond=5379, numberOfOnlineRegions=4, usedHeapMB=175,
> maxHeapMB=3067
>     okeanos-nodes-2:60020 1380865800738
>         requestsPerSecond=5674, numberOfOnlineRegions=4, usedHeapMB=161,
> maxHeapMB=3067
>     okeanos-nodes-5:60020 1380867725605
>         requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=27,
> maxHeapMB=3067
>     okeanos-nodes-3:60020 1380865800162
>         requestsPerSecond=3871, numberOfOnlineRegions=5, usedHeapMB=162,
> maxHeapMB=3067
>     okeanos-nodes-4:60020 1380866702216
>         requestsPerSecond=0, numberOfOnlineRegions=3, usedHeapMB=29,
> maxHeapMB=3067
> 0 dead servers
> Aggregate load: 14924, regions: 19|
>
> The fact that they don't serve any requests is also evidenced by the CPU
> usage, in a serving regionserver is about 70% while in these 2 regioservers
> is about 2%.
>
> Below is the output of|hadoop dfsadmin -report|, as you can see the block
> are evenly distributed (according to|hadoop balancer -threshold 2|).
>
> |root@okeanos-nodes-master:~# /opt/hadoop-1.0.4/bin/hadoop dfsadmin
> -report
> Configured Capacity: 105701683200 (98.44 GB)
> Present Capacity: 86440648704 (80.5 GB)
> DFS Remaining: 84188446720 (78.41 GB)
> DFS Used: 2252201984 (2.1 GB)
> DFS Used%: 2.61%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
>
> ------------------------------**-------------------
> Datanodes available: 5 (5 total, 0 dead)
>
> Name: 10.0.0.11:50010
> Decommission Status : Normal
> Configured Capacity: 21140336640 (19.69 GB)
> DFS Used: 309166080 (294.84 MB)
> Non DFS Used: 3851579392 (3.59 GB)
> DFS Remaining: 16979591168(15.81 GB)
> DFS Used%: 1.46%
> DFS Remaining%: 80.32%
> Last contact: Fri Oct 04 11:30:31 EEST 2013
>
>
> Name: 10.0.0.3:50010
> Decommission Status : Normal
> Configured Capacity: 21140336640 (19.69 GB)
> DFS Used: 531652608 (507.02 MB)
> Non DFS Used: 3852300288 (3.59 GB)
> DFS Remaining: 16756383744(15.61 GB)
> DFS Used%: 2.51%
> DFS Remaining%: 79.26%
> Last contact: Fri Oct 04 11:30:32 EEST 2013
>
>
> Name: 10.0.0.5:50010
> Decommission Status : Normal
> Configured Capacity: 21140336640 (19.69 GB)
> DFS Used: 502910976 (479.61 MB)
> Non DFS Used: 3853029376 (3.59 GB)
> DFS Remaining: 16784396288(15.63 GB)
> DFS Used%: 2.38%
> DFS Remaining%: 79.4%
> Last contact: Fri Oct 04 11:30:32 EEST 2013
>
>
> Name: 10.0.0.4:50010
> Decommission Status : Normal
> Configured Capacity: 21140336640 (19.69 GB)
> DFS Used: 421974016 (402.43 MB)
> Non DFS Used: 3852365824 (3.59 GB)
> DFS Remaining: 16865996800(15.71 GB)
> DFS Used%: 2%
> DFS Remaining%: 79.78%
> Last contact: Fri Oct 04 11:30:29 EEST 2013
>
>
> Name: 10.0.0.10:50010
> Decommission Status : Normal
> Configured Capacity: 21140336640 (19.69 GB)
> DFS Used: 486498304 (463.96 MB)
> Non DFS Used: 3851759616 (3.59 GB)
> DFS Remaining: 16802078720(15.65 GB)
> DFS Used%: 2.3%
> DFS Remaining%: 79.48%
> Last contact: Fri Oct 04 11:30:29 EEST 2013|
>
> I've tried stopping YCSB, restarting hbase master and restarting YCSB but
> with no lack.. these 2 nodes don't serve any requests!
>
> As there are many log and conf files, I have created a zip file with logs
> and confs (both hbase and hadoop) of the master, a healthy regionserver
> serving requests and a regionserver not serving requests.https://dl.**
> dropboxusercontent.com/u/**13480502/hbase_hadoop_logs__**conf.zip<https://dl.dropboxusercontent.com/u/13480502/hbase_hadoop_logs__conf.zip>
>
> Thank you in advance!!
>
>


-- 
Bharath Vissapragada
<http://www.cloudera.com>