You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Yves Langisch <yv...@langisch.ch> on 2011/12/13 08:52:47 UTC
NPE while obtaining row lock
Hi,
from time to time I get a NPE on my regionserver. Apparently it's occurring while obtaining a row lock:
---
2011-12-13 02:00:19,582 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.26 MB, single=164.82 MB, multi=167.62 MB, memory=42.16 KB
2011-12-13 02:00:20,365 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true)
java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2010)
at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:1996)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
2011-12-13 02:00:20,529 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 39.56 MB of total=335.72 MB
2011-12-13 02:00:20,532 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.2 MB, single=177.89 MB, multi=154.62 MB, memory=42.16 KB
2011-12-13 02:00:21,433 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 39.49 MB of total=335.66 MB
2011-12-13 02:00:21,435 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.52 MB, total=296.14 MB, single=172.92 MB, multi=159.46 MB, memory=42.16 KB
---
Any ideas?
Regards
Yves
Re: NPE while obtaining row lock
Posted by yu...@gmail.com.
Thanks for the analysis.
Do you mind opening a Jira ?
On Jan 10, 2012, at 7:51 AM, Yves Langisch <yv...@langisch.ch> wrote:
> Still happens with HBase 0.90.5/Hadoop 1.0.0. But I think I have some more insights on this topic. Following an up to date stack trace:
>
> java.lang.NullPointerException
> at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:986)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2008)
> at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
> Caused by: java.lang.NullPointerException
> at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2018)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2004)
> ... 5 more
>
> After checking the source code I've noticed that the value which is going to be put into the HashMap can be null in the case where the waitForLock flag is true or the rowLockWaitDuration is expired (HRegion#internalObtainRowLock, line 2111ff). The latter I think happens in our case as we have heavy load hitting the server.
>
> IMHO this case should be handled somehow and must not lead to a NPE.
>
> -
> Yves
>
> On Dec 30, 2011, at 12:12 PM, Yves Langisch wrote:
>
>> Still happens but before I'm going to add some debugging information I'll try to deploy the new version 0.90.5.
>>
>> -
>> Yves
>>
>> On Dec 18, 2011, at 12:08 AM, Stack wrote:
>>
>>> On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <yv...@langisch.ch> wrote:
>>>> I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own:
>>>>
>>>> <snippet>
>>>> return database.client().lockRow(
>>>> new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join();
>>>>
>>>> </snippet>
>>>>
>>>> Any ideas what else could be wrong here?
>>>>
>>>
>>> Looking at the code on regionserver side,
>>> http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup,
>>> down around line 1994, its unlikely the region is null since we should
>>> throw NotServingRegionException if can't find region (and we check for
>>> null region name a few lines up) so maybe its something in the way we
>>> do obtainRowLock on line 1995?
>>>
>>> Any chance of your instrumenting the regionserver? Adding a bit of
>>> debugging and deploying the debugging regionserver?
>>>
>>> My guess is we haven't seen this before because not many use rowlocks
>>> (rowlocks if long-lived and lots of contending clients could freeze
>>> you out of the server; each client blocked waiting on rowlock to clear
>>> occupies a handler of which there are a bounded number).
>>>
>>> St.Ack
>>>
>>
>
Re: NPE while obtaining row lock
Posted by Yves Langisch <yv...@langisch.ch>.
I already filed https://issues.apache.org/jira/browse/HBASE-5171. I thought that I've posted the filed issue back to this thread, Sorry about that.
-
Yves
On Jan 22, 2012, at 11:44 PM, Stack wrote:
> On Tue, Jan 10, 2012 at 10:07 AM, Stack <st...@duboce.net> wrote:
>
>> On Tue, Jan 10, 2012 at 7:51 AM, Yves Langisch <yv...@langisch.ch> wrote:
>>> IMHO this case should be handled somehow and must not lead to a NPE.
>>>
>>
>>
> I filed HBASE-5249.
> St.Ack
Re: NPE while obtaining row lock
Posted by Stack <st...@duboce.net>.
On Tue, Jan 10, 2012 at 10:07 AM, Stack <st...@duboce.net> wrote:
> On Tue, Jan 10, 2012 at 7:51 AM, Yves Langisch <yv...@langisch.ch> wrote:
> > IMHO this case should be handled somehow and must not lead to a NPE.
> >
>
>
I filed HBASE-5249.
St.Ack
Re: NPE while obtaining row lock
Posted by Stack <st...@duboce.net>.
On Tue, Jan 10, 2012 at 7:51 AM, Yves Langisch <yv...@langisch.ch> wrote:
> IMHO this case should be handled somehow and must not lead to a NPE.
>
Agree. Thanks for filing issue.
St.Ack
Re: NPE while obtaining row lock
Posted by Yves Langisch <yv...@langisch.ch>.
Still happens with HBase 0.90.5/Hadoop 1.0.0. But I think I have some more insights on this topic. Following an up to date stack trace:
java.lang.NullPointerException
at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:986)
at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2008)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
Caused by: java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2018)
at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2004)
... 5 more
After checking the source code I've noticed that the value which is going to be put into the HashMap can be null in the case where the waitForLock flag is true or the rowLockWaitDuration is expired (HRegion#internalObtainRowLock, line 2111ff). The latter I think happens in our case as we have heavy load hitting the server.
IMHO this case should be handled somehow and must not lead to a NPE.
-
Yves
On Dec 30, 2011, at 12:12 PM, Yves Langisch wrote:
> Still happens but before I'm going to add some debugging information I'll try to deploy the new version 0.90.5.
>
> -
> Yves
>
> On Dec 18, 2011, at 12:08 AM, Stack wrote:
>
>> On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <yv...@langisch.ch> wrote:
>>> I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own:
>>>
>>> <snippet>
>>> return database.client().lockRow(
>>> new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join();
>>>
>>> </snippet>
>>>
>>> Any ideas what else could be wrong here?
>>>
>>
>> Looking at the code on regionserver side,
>> http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup,
>> down around line 1994, its unlikely the region is null since we should
>> throw NotServingRegionException if can't find region (and we check for
>> null region name a few lines up) so maybe its something in the way we
>> do obtainRowLock on line 1995?
>>
>> Any chance of your instrumenting the regionserver? Adding a bit of
>> debugging and deploying the debugging regionserver?
>>
>> My guess is we haven't seen this before because not many use rowlocks
>> (rowlocks if long-lived and lots of contending clients could freeze
>> you out of the server; each client blocked waiting on rowlock to clear
>> occupies a handler of which there are a bounded number).
>>
>> St.Ack
>>
>
Re: NPE while obtaining row lock
Posted by Yves Langisch <yv...@langisch.ch>.
Still happens but before I'm going to add some debugging information I'll try to deploy the new version 0.90.5.
-
Yves
On Dec 18, 2011, at 12:08 AM, Stack wrote:
> On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <yv...@langisch.ch> wrote:
>> I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own:
>>
>> <snippet>
>> return database.client().lockRow(
>> new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join();
>>
>> </snippet>
>>
>> Any ideas what else could be wrong here?
>>
>
> Looking at the code on regionserver side,
> http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup,
> down around line 1994, its unlikely the region is null since we should
> throw NotServingRegionException if can't find region (and we check for
> null region name a few lines up) so maybe its something in the way we
> do obtainRowLock on line 1995?
>
> Any chance of your instrumenting the regionserver? Adding a bit of
> debugging and deploying the debugging regionserver?
>
> My guess is we haven't seen this before because not many use rowlocks
> (rowlocks if long-lived and lots of contending clients could freeze
> you out of the server; each client blocked waiting on rowlock to clear
> occupies a handler of which there are a bounded number).
>
> St.Ack
>
Re: NPE while obtaining row lock
Posted by Stack <st...@duboce.net>.
On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <yv...@langisch.ch> wrote:
> I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own:
>
> <snippet>
> return database.client().lockRow(
> new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join();
>
> </snippet>
>
> Any ideas what else could be wrong here?
>
Looking at the code on regionserver side,
http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup,
down around line 1994, its unlikely the region is null since we should
throw NotServingRegionException if can't find region (and we check for
null region name a few lines up) so maybe its something in the way we
do obtainRowLock on line 1995?
Any chance of your instrumenting the regionserver? Adding a bit of
debugging and deploying the debugging regionserver?
My guess is we haven't seen this before because not many use rowlocks
(rowlocks if long-lived and lots of contending clients could freeze
you out of the server; each client blocked waiting on rowlock to clear
occupies a handler of which there are a bounded number).
St.Ack
Re: NPE while obtaining row lock
Posted by Yves Langisch <yv...@langisch.ch>.
I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own:
<snippet>
return database.client().lockRow(
new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join();
</snippet>
Any ideas what else could be wrong here?
Yves
On Dec 14, 2011, at 7:10 AM, Stack wrote:
> On Mon, Dec 12, 2011 at 11:52 PM, Yves Langisch <yv...@langisch.ch> wrote:
>> Hi,
>>
>> from time to time I get a NPE on my regionserver. Apparently it's occurring while obtaining a row lock:
>>
>> ---
>> 2011-12-13 02:00:19,582 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.26 MB, single=164.82 MB, multi=167.62 MB, memory=42.16 KB
>> 2011-12-13 02:00:20,365 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true)
>> java.lang.NullPointerException
>> at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
>
> Any chance you are passing a null lockid Yves?
>
> See below:
>
> 2005 protected long addRowLock(Integer r, HRegion region)
> 2006 throws LeaseStillHeldException {
> 2007 long lockId = -1L;
> 2008 lockId = rand.nextLong();
> 2009 String lockName = String.valueOf(lockId);
> 2010 rowlocks.put(lockName, r);
> 2011 this.leases.createLease(lockName, new
> RowLockListener(lockName, region));
> 2012 return lockId;
> 2013 }
>
>
> St.Ack
>
Re: NPE while obtaining row lock
Posted by Stack <st...@duboce.net>.
On Mon, Dec 12, 2011 at 11:52 PM, Yves Langisch <yv...@langisch.ch> wrote:
> Hi,
>
> from time to time I get a NPE on my regionserver. Apparently it's occurring while obtaining a row lock:
>
> ---
> 2011-12-13 02:00:19,582 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.26 MB, single=164.82 MB, multi=167.62 MB, memory=42.16 KB
> 2011-12-13 02:00:20,365 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true)
> java.lang.NullPointerException
> at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
Any chance you are passing a null lockid Yves?
See below:
2005 protected long addRowLock(Integer r, HRegion region)
2006 throws LeaseStillHeldException {
2007 long lockId = -1L;
2008 lockId = rand.nextLong();
2009 String lockName = String.valueOf(lockId);
2010 rowlocks.put(lockName, r);
2011 this.leases.createLease(lockName, new
RowLockListener(lockName, region));
2012 return lockId;
2013 }
St.Ack
Re: NPE while obtaining row lock
Posted by Yves Langisch <yv...@langisch.ch>.
Forgot to mention the versions: I'm running HBase 0.90.4 with Hadoop 0.20.205.0.
On Dec 13, 2011, at 8:52 AM, Yves Langisch wrote:
> Hi,
>
> from time to time I get a NPE on my regionserver. Apparently it's occurring while obtaining a row lock:
>
> ---
> 2011-12-13 02:00:19,582 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.26 MB, single=164.82 MB, multi=167.62 MB, memory=42.16 KB
> 2011-12-13 02:00:20,365 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true)
> java.lang.NullPointerException
> at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2010)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:1996)
> at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
> 2011-12-13 02:00:20,529 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 39.56 MB of total=335.72 MB
> 2011-12-13 02:00:20,532 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.2 MB, single=177.89 MB, multi=154.62 MB, memory=42.16 KB
> 2011-12-13 02:00:21,433 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 39.49 MB of total=335.66 MB
> 2011-12-13 02:00:21,435 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.52 MB, total=296.14 MB, single=172.92 MB, multi=159.46 MB, memory=42.16 KB
> ---
>
> Any ideas?
>
> Regards
> Yves