You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Ted Yu <yu...@gmail.com> on 2010/03/01 19:46:59 UTC

duplicate regionserver entries

Hi,
We use hbase 0.20.1
On http://snv-it-lin-006:60010/master.jsp, I see two rows for the same
region server:
snv-it-lin-010.projectrialto.com:600301267038448430requests=0, regions=25,
usedHeap=1280, maxHeap=6127 snv-it-lin-010.projectrialto.com:60030
1267466540070requests=0, regions=2, usedHeap=1278, maxHeap=6127
But in regionservers on master server, snv-it-lin-010 is only specified
once:
snv-it-lin-010
snv-it-lin-011
snv-it-lin-012

Has anyone seen similar thing before ?

Thanks

RE: duplicate regionserver entries

Posted by Michael Segel <mi...@hotmail.com>.



> Date: Mon, 1 Mar 2010 10:46:59 -0800
> Subject: duplicate regionserver entries
> From: yuzhihong@gmail.com
> To: hbase-user@hadoop.apache.org
> 
> Hi,
> We use hbase 0.20.1
> On http://snv-it-lin-006:60010/master.jsp, I see two rows for the same
> region server:
> snv-it-lin-010.projectrialto.com:600301267038448430requests=0, regions=25,
> usedHeap=1280, maxHeap=6127 snv-it-lin-010.projectrialto.com:60030
> 1267466540070requests=0, regions=2, usedHeap=1278, maxHeap=6127
> But in regionservers on master server, snv-it-lin-010 is only specified
> once:

> Has anyone seen similar thing before ?


Funny you should mention this.
I was about to post something on the same topic...

What do see when you run a status 'simple' in an HBase Shell?

On our Dev Cloud I see the following:

hbase(main):003:0> status 'simple'
6 live servers
    dchilcmsdn03[redacted]com:60020 1267351339661
        requests=0, regions=0, usedHeap=48, maxHeap=1777
    dchilcmsdn01[redacted]com:60020 1267216506258
        requests=0, regions=2, usedHeap=96, maxHeap=1777
    dchilcmsdn02[redacted]com:60020 1267466817617
        requests=0, regions=0, usedHeap=26, maxHeap=1777
    dchilcmsdn01[redacted]com:60020 1267351329701
        requests=0, regions=0, usedHeap=71, maxHeap=1777
    dchilcmsdn03.[redacted]com:60020 1267216506597
        requests=0, regions=1, usedHeap=43, maxHeap=1777
    dchilcmsdn02[redacted]com:60020 1267216506428
        requests=0, regions=2, usedHeap=82, maxHeap=1777
0 dead servers
hbase(main):004:0>

Here I have 3 servers and one master. Over the weekend with no real users, it looks like the three servers had to restart themselves.

When I try to run the command list in Hbase shell I get the following:

hbase(main):002:0> list
NativeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '', but failed after 7 attempts.
Exceptions:
org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0


We saw the same problems on our sandbox environment, however we were using VMWare to split a bunch of 8 core blades in to two virtual servers with 3 cores each. (giving us 10 nodes instead of 5).  Since we're seeing the same type of problem, we can now rule out VMWare as a possible culprit. 

I saw some of the posts by St. Ack and others in the mail archives, and I think that what we may be experiencing are issues due to high loads of network traffic that occur periodically. Just my guess since these issues happen at a time when there are no loads on the system.

So I have to wonder how network latency plays a factor? I mean normally we'll see sub millisecond response times but then we can also see bursts of network latency over 100-200ms or longer.

Is there something I can tune to account for these? Or am I missing something?

Thx

-Mike


 		 	   		  
_________________________________________________________________
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/201469228/direct/01/