You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Yves Langisch <yv...@langisch.ch> on 2011/12/13 08:52:47 UTC

NPE while obtaining row lock

Hi,

from time to time I get a NPE on my regionserver. Apparently it's occurring while obtaining a row lock: 

---
2011-12-13 02:00:19,582 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.26 MB, single=164.82 MB, multi=167.62 MB, memory=42.16 KB
2011-12-13 02:00:20,365 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true)
java.lang.NullPointerException
        at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2010)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:1996)
        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
2011-12-13 02:00:20,529 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 39.56 MB of total=335.72 MB
2011-12-13 02:00:20,532 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.2 MB, single=177.89 MB, multi=154.62 MB, memory=42.16 KB
2011-12-13 02:00:21,433 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 39.49 MB of total=335.66 MB
2011-12-13 02:00:21,435 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.52 MB, total=296.14 MB, single=172.92 MB, multi=159.46 MB, memory=42.16 KB
---

Any ideas?

Regards
Yves

Re: NPE while obtaining row lock

Posted by yu...@gmail.com.
Thanks for the analysis. 

Do you mind opening a Jira ?



On Jan 10, 2012, at 7:51 AM, Yves Langisch <yv...@langisch.ch> wrote:

> Still happens with HBase 0.90.5/Hadoop 1.0.0. But I think I have some more insights on this topic. Following an up to date stack trace:
> 
> java.lang.NullPointerException
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:986)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2008)
>        at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
> Caused by: java.lang.NullPointerException
>        at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2018)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2004)
>        ... 5 more
> 
> After checking the source code I've noticed that the value which is going to be put into the HashMap can be null in the case where the waitForLock flag is true or the rowLockWaitDuration is expired (HRegion#internalObtainRowLock, line 2111ff). The latter I think happens in our case as we have heavy load hitting the server.
> 
> IMHO this case should be handled somehow and must not lead to a NPE.
> 
> -
> Yves
> 
> On Dec 30, 2011, at 12:12 PM, Yves Langisch wrote:
> 
>> Still happens but before I'm going to add some debugging information I'll try to deploy the new version 0.90.5.
>> 
>> -
>> Yves
>> 
>> On Dec 18, 2011, at 12:08 AM, Stack wrote:
>> 
>>> On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <yv...@langisch.ch> wrote:
>>>> I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own:
>>>> 
>>>> <snippet>
>>>> return database.client().lockRow(
>>>>                  new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join();
>>>> 
>>>> </snippet>
>>>> 
>>>> Any ideas what else could be wrong here?
>>>> 
>>> 
>>> Looking at the code on regionserver side,
>>> http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup,
>>> down around line 1994, its unlikely the region is null since we should
>>> throw NotServingRegionException if can't find region (and we check for
>>> null region name a few lines up) so maybe its something in the way we
>>> do obtainRowLock on line 1995?
>>> 
>>> Any chance of your instrumenting the regionserver?  Adding a bit of
>>> debugging and deploying the debugging regionserver?
>>> 
>>> My guess is we haven't seen this before because not many use rowlocks
>>> (rowlocks if long-lived and lots of contending clients could freeze
>>> you out of the server; each client blocked waiting on rowlock to clear
>>> occupies a handler of which there are a bounded number).
>>> 
>>> St.Ack
>>> 
>> 
> 

Re: NPE while obtaining row lock

Posted by Yves Langisch <yv...@langisch.ch>.
I already filed https://issues.apache.org/jira/browse/HBASE-5171. I thought that I've posted the filed issue back to this thread, Sorry about that.

-
Yves

On Jan 22, 2012, at 11:44 PM, Stack wrote:

> On Tue, Jan 10, 2012 at 10:07 AM, Stack <st...@duboce.net> wrote:
> 
>> On Tue, Jan 10, 2012 at 7:51 AM, Yves Langisch <yv...@langisch.ch> wrote:
>>> IMHO this case should be handled somehow and must not lead to a NPE.
>>> 
>> 
>> 
> I filed HBASE-5249.
> St.Ack


Re: NPE while obtaining row lock

Posted by Stack <st...@duboce.net>.
On Tue, Jan 10, 2012 at 10:07 AM, Stack <st...@duboce.net> wrote:

> On Tue, Jan 10, 2012 at 7:51 AM, Yves Langisch <yv...@langisch.ch> wrote:
> > IMHO this case should be handled somehow and must not lead to a NPE.
> >
>
>
I filed HBASE-5249.
St.Ack

Re: NPE while obtaining row lock

Posted by Stack <st...@duboce.net>.
On Tue, Jan 10, 2012 at 7:51 AM, Yves Langisch <yv...@langisch.ch> wrote:
> IMHO this case should be handled somehow and must not lead to a NPE.
>

Agree.  Thanks for filing issue.
St.Ack

Re: NPE while obtaining row lock

Posted by Yves Langisch <yv...@langisch.ch>.
Still happens with HBase 0.90.5/Hadoop 1.0.0. But I think I have some more insights on this topic. Following an up to date stack trace:

java.lang.NullPointerException
        at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:986)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2008)
        at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
Caused by: java.lang.NullPointerException
        at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2018)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2004)
        ... 5 more

After checking the source code I've noticed that the value which is going to be put into the HashMap can be null in the case where the waitForLock flag is true or the rowLockWaitDuration is expired (HRegion#internalObtainRowLock, line 2111ff). The latter I think happens in our case as we have heavy load hitting the server.

IMHO this case should be handled somehow and must not lead to a NPE.

-
Yves

On Dec 30, 2011, at 12:12 PM, Yves Langisch wrote:

> Still happens but before I'm going to add some debugging information I'll try to deploy the new version 0.90.5.
> 
> -
> Yves
> 
> On Dec 18, 2011, at 12:08 AM, Stack wrote:
> 
>> On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <yv...@langisch.ch> wrote:
>>> I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own:
>>> 
>>> <snippet>
>>> return database.client().lockRow(
>>>                   new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join();
>>> 
>>> </snippet>
>>> 
>>> Any ideas what else could be wrong here?
>>> 
>> 
>> Looking at the code on regionserver side,
>> http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup,
>> down around line 1994, its unlikely the region is null since we should
>> throw NotServingRegionException if can't find region (and we check for
>> null region name a few lines up) so maybe its something in the way we
>> do obtainRowLock on line 1995?
>> 
>> Any chance of your instrumenting the regionserver?  Adding a bit of
>> debugging and deploying the debugging regionserver?
>> 
>> My guess is we haven't seen this before because not many use rowlocks
>> (rowlocks if long-lived and lots of contending clients could freeze
>> you out of the server; each client blocked waiting on rowlock to clear
>> occupies a handler of which there are a bounded number).
>> 
>> St.Ack
>> 
> 


Re: NPE while obtaining row lock

Posted by Yves Langisch <yv...@langisch.ch>.
Still happens but before I'm going to add some debugging information I'll try to deploy the new version 0.90.5.

-
Yves

On Dec 18, 2011, at 12:08 AM, Stack wrote:

> On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <yv...@langisch.ch> wrote:
>> I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own:
>> 
>> <snippet>
>>  return database.client().lockRow(
>>                    new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join();
>> 
>> </snippet>
>> 
>> Any ideas what else could be wrong here?
>> 
> 
> Looking at the code on regionserver side,
> http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup,
> down around line 1994, its unlikely the region is null since we should
> throw NotServingRegionException if can't find region (and we check for
> null region name a few lines up) so maybe its something in the way we
> do obtainRowLock on line 1995?
> 
> Any chance of your instrumenting the regionserver?  Adding a bit of
> debugging and deploying the debugging regionserver?
> 
> My guess is we haven't seen this before because not many use rowlocks
> (rowlocks if long-lived and lots of contending clients could freeze
> you out of the server; each client blocked waiting on rowlock to clear
> occupies a handler of which there are a bounded number).
> 
> St.Ack
> 


Re: NPE while obtaining row lock

Posted by Stack <st...@duboce.net>.
On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <yv...@langisch.ch> wrote:
> I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own:
>
> <snippet>
>  return database.client().lockRow(
>                    new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join();
>
> </snippet>
>
> Any ideas what else could be wrong here?
>

Looking at the code on regionserver side,
http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup,
down around line 1994, its unlikely the region is null since we should
throw NotServingRegionException if can't find region (and we check for
null region name a few lines up) so maybe its something in the way we
do obtainRowLock on line 1995?

Any chance of your instrumenting the regionserver?  Adding a bit of
debugging and deploying the debugging regionserver?

My guess is we haven't seen this before because not many use rowlocks
(rowlocks if long-lived and lots of contending clients could freeze
you out of the server; each client blocked waiting on rowlock to clear
occupies a handler of which there are a bounded number).

St.Ack

Re: NPE while obtaining row lock

Posted by Yves Langisch <yv...@langisch.ch>.
I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own:

<snippet>
  return database.client().lockRow(
                    new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join();

</snippet>

Any ideas what else could be wrong here? 

Yves

On Dec 14, 2011, at 7:10 AM, Stack wrote:

> On Mon, Dec 12, 2011 at 11:52 PM, Yves Langisch <yv...@langisch.ch> wrote:
>> Hi,
>> 
>> from time to time I get a NPE on my regionserver. Apparently it's occurring while obtaining a row lock:
>> 
>> ---
>> 2011-12-13 02:00:19,582 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.26 MB, single=164.82 MB, multi=167.62 MB, memory=42.16 KB
>> 2011-12-13 02:00:20,365 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true)
>> java.lang.NullPointerException
>>        at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
> 
> Any chance you are passing a null lockid Yves?
> 
> See below:
> 
> 2005	  protected long addRowLock(Integer r, HRegion region)
> 2006	      throws LeaseStillHeldException {
> 2007	    long lockId = -1L;
> 2008	    lockId = rand.nextLong();
> 2009	    String lockName = String.valueOf(lockId);
> 2010	    rowlocks.put(lockName, r);
> 2011	    this.leases.createLease(lockName, new
> RowLockListener(lockName, region));
> 2012	    return lockId;
> 2013	  }
> 
> 
> St.Ack
> 


Re: NPE while obtaining row lock

Posted by Stack <st...@duboce.net>.
On Mon, Dec 12, 2011 at 11:52 PM, Yves Langisch <yv...@langisch.ch> wrote:
> Hi,
>
> from time to time I get a NPE on my regionserver. Apparently it's occurring while obtaining a row lock:
>
> ---
> 2011-12-13 02:00:19,582 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.26 MB, single=164.82 MB, multi=167.62 MB, memory=42.16 KB
> 2011-12-13 02:00:20,365 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true)
> java.lang.NullPointerException
>        at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)

Any chance you are passing a null lockid Yves?

See below:

2005	  protected long addRowLock(Integer r, HRegion region)
2006	      throws LeaseStillHeldException {
2007	    long lockId = -1L;
2008	    lockId = rand.nextLong();
2009	    String lockName = String.valueOf(lockId);
2010	    rowlocks.put(lockName, r);
2011	    this.leases.createLease(lockName, new
RowLockListener(lockName, region));
2012	    return lockId;
2013	  }


St.Ack

Re: NPE while obtaining row lock

Posted by Yves Langisch <yv...@langisch.ch>.
Forgot to mention the versions: I'm running HBase 0.90.4 with Hadoop 0.20.205.0.

On Dec 13, 2011, at 8:52 AM, Yves Langisch wrote:

> Hi,
> 
> from time to time I get a NPE on my regionserver. Apparently it's occurring while obtaining a row lock: 
> 
> ---
> 2011-12-13 02:00:19,582 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.26 MB, single=164.82 MB, multi=167.62 MB, memory=42.16 KB
> 2011-12-13 02:00:20,365 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true)
> java.lang.NullPointerException
>        at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2010)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:1996)
>        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
> 2011-12-13 02:00:20,529 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 39.56 MB of total=335.72 MB
> 2011-12-13 02:00:20,532 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.2 MB, single=177.89 MB, multi=154.62 MB, memory=42.16 KB
> 2011-12-13 02:00:21,433 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 39.49 MB of total=335.66 MB
> 2011-12-13 02:00:21,435 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.52 MB, total=296.14 MB, single=172.92 MB, multi=159.46 MB, memory=42.16 KB
> ---
> 
> Any ideas?
> 
> Regards
> Yves