You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2009/12/01 19:30:20 UTC

[jira] Created: (HBASE-2022) NPE in housekeeping kills RS

NPE in housekeeping kills RS
----------------------------

                 Key: HBASE-2022
                 URL: https://issues.apache.org/jira/browse/HBASE-2022
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.2
            Reporter: Jean-Daniel Cryans
            Assignee: Jean-Daniel Cryans
            Priority: Critical
             Fix For: 0.20.3, 0.21.0


Saw this on Zhenyu's 0.20.1 cluster (which for some weird reason seems to have many issues):

{code}
2009-11-30 16:44:48,170 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception. Aborting...
java.lang.NullPointerException
	at org.apache.hadoop.hbase.regionserver.HRegionServer.housekeeping(HRegionServer.java:1280)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:590)
	at java.lang.Thread.run(Thread.java:619)
{code}

This reminds me of HBASE-1386 and in fact this could be the same issue (but I can't confirm). Searching on the web gives me some hits and this is particularly interesting http://forums.sun.com/thread.jspa?threadID=5379669

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2022) NPE in housekeeping kills RS

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-2022:
--------------------------------------

    Attachment: HBASE-2022-v2.patch

I just added a log that reports the null value. Adding logging in Worker makes it very chatty because when the master don't give anything to do the list is empty so we get a null (expected). Also I don't think we should synchronize every access since getting a null in housekeeping won't break anything.

Also. thinking about it, it's normal that the list gives null every time after the first one because the list is still always empty.

> NPE in housekeeping kills RS
> ----------------------------
>
>                 Key: HBASE-2022
>                 URL: https://issues.apache.org/jira/browse/HBASE-2022
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.20.3, 0.21.0
>
>         Attachments: HBASE-2022-v2.patch, HBASE-2022.patch
>
>
> Saw this on Zhenyu's 0.20.1 cluster (which for some weird reason seems to have many issues):
> {code}
> 2009-11-30 16:44:48,170 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception. Aborting...
> java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.housekeeping(HRegionServer.java:1280)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:590)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}
> This reminds me of HBASE-1386 and in fact this could be the same issue (but I can't confirm). Searching on the web gives me some hits and this is particularly interesting http://forums.sun.com/thread.jspa?threadID=5379669

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2022) NPE in housekeeping kills RS

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784331#action_12784331 ] 

Jean-Daniel Cryans commented on HBASE-2022:
-------------------------------------------

It seems that in HRS.Worker.run() we already handle that case:

{code}

e = toDo.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
if(e == null || stopRequested.get()) {
  continue;
}

{code}

Then we should just check for null in housekeeping.

> NPE in housekeeping kills RS
> ----------------------------
>
>                 Key: HBASE-2022
>                 URL: https://issues.apache.org/jira/browse/HBASE-2022
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.20.3, 0.21.0
>
>
> Saw this on Zhenyu's 0.20.1 cluster (which for some weird reason seems to have many issues):
> {code}
> 2009-11-30 16:44:48,170 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception. Aborting...
> java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.housekeeping(HRegionServer.java:1280)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:590)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}
> This reminds me of HBASE-1386 and in fact this could be the same issue (but I can't confirm). Searching on the web gives me some hits and this is particularly interesting http://forums.sun.com/thread.jspa?threadID=5379669

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2022) NPE in housekeeping kills RS

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787195#action_12787195 ] 

Todd Lipcon commented on HBASE-2022:
------------------------------------

patch seems fine to me - we got this error on a cluster as well.

> NPE in housekeeping kills RS
> ----------------------------
>
>                 Key: HBASE-2022
>                 URL: https://issues.apache.org/jira/browse/HBASE-2022
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.20.3, 0.21.0
>
>         Attachments: HBASE-2022-v2.patch, HBASE-2022.patch
>
>
> Saw this on Zhenyu's 0.20.1 cluster (which for some weird reason seems to have many issues):
> {code}
> 2009-11-30 16:44:48,170 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception. Aborting...
> java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.housekeeping(HRegionServer.java:1280)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:590)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}
> This reminds me of HBASE-1386 and in fact this could be the same issue (but I can't confirm). Searching on the web gives me some hits and this is particularly interesting http://forums.sun.com/thread.jspa?threadID=5379669

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2022) NPE in housekeeping kills RS

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-2022:
--------------------------------------

    Attachment: HBASE-2022.patch

Patch that breaks from the loop if an element is null. 

I'm beginning to wonder if housekeeping is really useful since the Worker processes stuff and not the main HRS loop (where we call housekeeping).

> NPE in housekeeping kills RS
> ----------------------------
>
>                 Key: HBASE-2022
>                 URL: https://issues.apache.org/jira/browse/HBASE-2022
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.20.3, 0.21.0
>
>         Attachments: HBASE-2022.patch
>
>
> Saw this on Zhenyu's 0.20.1 cluster (which for some weird reason seems to have many issues):
> {code}
> 2009-11-30 16:44:48,170 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception. Aborting...
> java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.housekeeping(HRegionServer.java:1280)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:590)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}
> This reminds me of HBASE-1386 and in fact this could be the same issue (but I can't confirm). Searching on the web gives me some hits and this is particularly interesting http://forums.sun.com/thread.jspa?threadID=5379669

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2022) NPE in housekeeping kills RS

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784939#action_12784939 ] 

Jean-Daniel Cryans commented on HBASE-2022:
-------------------------------------------

I guess a synchronized block would help, I'll try that and I'll add more logging.

> NPE in housekeeping kills RS
> ----------------------------
>
>                 Key: HBASE-2022
>                 URL: https://issues.apache.org/jira/browse/HBASE-2022
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.20.3, 0.21.0
>
>         Attachments: HBASE-2022.patch
>
>
> Saw this on Zhenyu's 0.20.1 cluster (which for some weird reason seems to have many issues):
> {code}
> 2009-11-30 16:44:48,170 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception. Aborting...
> java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.housekeeping(HRegionServer.java:1280)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:590)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}
> This reminds me of HBASE-1386 and in fact this could be the same issue (but I can't confirm). Searching on the web gives me some hits and this is particularly interesting http://forums.sun.com/thread.jspa?threadID=5379669

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2022) NPE in housekeeping kills RS

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784547#action_12784547 ] 

stack commented on HBASE-2022:
------------------------------

Patch seems fine.

Reading the forums note you post, the reporter says that once it returns null once, it does so ever after.  I wonder if that will happen here?  Perhaps log it if we get a null out the linkedlist?  Do it here and in HRS.Worker.run since its not supposed to happen.

Could it be a synchronization issue?  I haven't looked?  Maybe the linked list needs synchronizing?  All access to the list?

> NPE in housekeeping kills RS
> ----------------------------
>
>                 Key: HBASE-2022
>                 URL: https://issues.apache.org/jira/browse/HBASE-2022
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.20.3, 0.21.0
>
>         Attachments: HBASE-2022.patch
>
>
> Saw this on Zhenyu's 0.20.1 cluster (which for some weird reason seems to have many issues):
> {code}
> 2009-11-30 16:44:48,170 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception. Aborting...
> java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.housekeeping(HRegionServer.java:1280)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:590)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}
> This reminds me of HBASE-1386 and in fact this could be the same issue (but I can't confirm). Searching on the web gives me some hits and this is particularly interesting http://forums.sun.com/thread.jspa?threadID=5379669

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-2022) NPE in housekeeping kills RS

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans resolved HBASE-2022.
---------------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

I committed this (Stack also told me it was ok) to branch and trunk.

> NPE in housekeeping kills RS
> ----------------------------
>
>                 Key: HBASE-2022
>                 URL: https://issues.apache.org/jira/browse/HBASE-2022
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.20.3, 0.21.0
>
>         Attachments: HBASE-2022-v2.patch, HBASE-2022.patch
>
>
> Saw this on Zhenyu's 0.20.1 cluster (which for some weird reason seems to have many issues):
> {code}
> 2009-11-30 16:44:48,170 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception. Aborting...
> java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.housekeeping(HRegionServer.java:1280)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:590)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}
> This reminds me of HBASE-1386 and in fact this could be the same issue (but I can't confirm). Searching on the web gives me some hits and this is particularly interesting http://forums.sun.com/thread.jspa?threadID=5379669

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.