You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Matt Davies (JIRA)" <ji...@apache.org> on 2011/07/14 23:54:59 UTC

[jira] [Created] (HBASE-4101) Regionserver Deadlock

Regionserver Deadlock
---------------------

                 Key: HBASE-4101
                 URL: https://issues.apache.org/jira/browse/HBASE-4101
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.90.3
         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
            Reporter: Matt Davies
         Attachments: jstack.txt

We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.

Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.


Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "Matt Davies (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067793#comment-13067793 ] 

Matt Davies commented on HBASE-4101:
------------------------------------

Many thanks!
the process list, zookeeper thread sends the keepalive so the master won't
remove it from the active list, yet the regionserver will not serve data.
testing tool.


> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183866#comment-13183866 ] 

stack commented on HBASE-4101:
------------------------------

@Ram Yes.
                
> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-4101:
------------------------------------------

    Attachment: HBASE-4101_0.90.patch

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4101) Regionserver Deadlock

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4101:
-------------------------

         Priority: Blocker  (was: Major)
    Fix Version/s: 0.90.4

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-4101:
------------------------------------------

    Status: Patch Available  (was: Open)

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065854#comment-13065854 ] 

ramkrishna.s.vasudevan commented on HBASE-4101:
-----------------------------------------------

Is it fine if we change the Date object in the PriorityCompactionQueue.CompactionRequest class to Long.
and get the System.nanoTime and use this value for comparison?
Can i submit a patch with this change.

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (HBASE-4101) Regionserver Deadlock

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu reopened HBASE-4101:
---------------------------


TestScannerTimeout hangs in 0.90 build

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-4101:
------------------------------------------

    Fix Version/s:     (was: 0.90.4)
                   0.92.0
    
> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068197#comment-13068197 ] 

ramkrishna.s.vasudevan commented on HBASE-4101:
-----------------------------------------------

@Ted, I tried running the TestScannerTimeOut.java locally.
With or without patch the test is running fine.  Some problems comes attimes in the tear down (but all assertions are passing).

Also the change is pertaining to the proritycompactionqueue but the test case as far I saw it is dealing with scanning.
Do you find any other problem for the failure of testcase ?


> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067596#comment-13067596 ] 

Ted Yu commented on HBASE-4101:
-------------------------------

Java is strong-typed language:
{code}
+    private final Long timeInLong;
{code}
Calling the field time would suffice.

Other than the above, +1 on patch.

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_trunk.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-4101:
------------------------------------------

    Attachment: HBASE-4101_trunk.patch

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_trunk.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-4101:
------------------------------------------

    Fix Version/s: 0.92.0
           Status: Patch Available  (was: Open)

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Priority: Blocker
>             Fix For: 0.90.4, 0.92.0
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_trunk.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan reassigned HBASE-4101:
---------------------------------------------

    Assignee: ramkrishna.s.vasudevan

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_trunk.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-4101:
------------------------------------------

    Attachment: HBASE-4101_trunk_1.patch

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-4101:
------------------------------------------

    Status: Open  (was: Patch Available)

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_trunk.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-4101) Regionserver Deadlock

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu resolved HBASE-4101.
---------------------------

    Resolution: Fixed

Current failed test should be caused by other change.

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-4101:
------------------------------------------

    Attachment: HBASE-4101_0.90_1.patch

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065575#comment-13065575 ] 

stack commented on HBASE-4101:
------------------------------

>From up on list:

{code}
We aren't profiling right now.  Here's what is in the hbase-env.sh

export TZ="US/Mountain"
export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/gc-hbase.log "
export HBASE_MANAGES_ZK=false
export HBASE_PID_DIR=/home/hadoop
export HBASE_HEAPSIZE=10240

Java is
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

We were planning an upgrade to 1.6.0_25 before we ran into this issue.
{code}

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>             Fix For: 0.90.4
>
>         Attachments: jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067756#comment-13067756 ] 

stack commented on HBASE-4101:
------------------------------

Patch looks great (as does your analysis above Ram).

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065855#comment-13065855 ] 

ramkrishna.s.vasudevan commented on HBASE-4101:
-----------------------------------------------

This issue is similar to HBASE-3830

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183857#comment-13183857 ] 

ramkrishna.s.vasudevan commented on HBASE-4101:
-----------------------------------------------

I find that in 0.90 the CHANGES.txt alone is updated.  Should we remove that when we commit HBASE-5178 (backport JIRA for this issue).?
                
> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4101) Regionserver Deadlock

Posted by "Matt Davies (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Davies updated HBASE-4101:
-------------------------------

    Attachment: jstack.txt

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>         Attachments: jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067900#comment-13067900 ] 

Hudson commented on HBASE-4101:
-------------------------------

Integrated in HBase-TRUNK #2041 (See [https://builds.apache.org/job/HBase-TRUNK/2041/])
    HBASE-4101 Regionserver Deadlock

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionRequest.java
* /hbase/trunk/CHANGES.txt


> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4101) Regionserver Deadlock

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4101:
-------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Committed branch and trunk.  Thank you for the patch Ram.

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-4101:
------------------------------------------

    Fix Version/s:     (was: 0.92.0)

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_trunk.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183849#comment-13183849 ] 

ramkrishna.s.vasudevan commented on HBASE-4101:
-----------------------------------------------

This change  has not been committed to branch. But the issue fixed version says it is present in 0.90.
I will update this JIRA and create a new JIRA for fixing in 0.90.
                
> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-4101_0.90.patch, HBASE-4101_0.90_1.patch, HBASE-4101_trunk.patch, HBASE-4101_trunk_1.patch, jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065746#comment-13065746 ] 

ramkrishna.s.vasudevan commented on HBASE-4101:
-----------------------------------------------

The problem here is due to the usage of Date class in the PriorityCompactionQueue.
The ResourceBundle is trying to get hold of the current thread. 

Pls find JD's comment 
"
I see what you are saying, and I understand the deadlock, but what escapes
 me is why ResourceBundle has to go touch all the classes every time to
find
 the locale as I see 2 threads doing the same. Maybe my understanding of
what
 it does is just poor, but I also see that you are using the yourkit
profiler
 so it's one more variable in the equation.

 In any case, using a Date strikes me as odd. Using a long representing
 System.currentTimeMillis is usually what we do."

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4101) Regionserver Deadlock

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065868#comment-13065868 ] 

ramkrishna.s.vasudevan commented on HBASE-4101:
-----------------------------------------------

Before submiting the patch I would like to tell my analysis why the date object is the problem.


Memstoreflusher.flushOneForGlobalPressure() and MemstoreFlusher.reclaimMemStoreMemory() are the 
two api where the problem occurs.

As part of flushOneForGlobalPressure() flushRegions() gets called. Here 
{noformat}
lock.lock();
{noformat}
 is obtained.
Then the flow goes into server.compactSplitThread.requestCompaction().
Here the region is added into CompactionQueue. 
In the PriorityCompactionQueue.addToRegionsInQueue() api we try to log the newRequest.
This internally invokes the toString of the date object as in the log
{noformat}
public String toString() {
      return "regionName=" + r.getRegionNameAsString() +
        ", priority=" + p + ", date=" + date;
{noformat}

Internally the date object uses ResourceBundle where in the endLoading() api
{noformat}
	Thread me = Thread.currentThread();
	assert (underConstruction.get(constKey) == me);
	underConstruction.remove(constKey);
	synchronized (me) {
	    me.notifyAll();
	}
{noformat} 
tries to get the current thread.(Here the MemStoreFlusher).
Now parallely the MemStoreFlusher.reclaimMemStoreMemory() is getting called which itself is synchronized.
So the other thread has obtained the MemStoreFlusher lock and waits to obtain the 
{noformat}
    if (isAboveHighWaterMark()) {
      lock.lock();
{noformat}

Whereas The ResourceBundle waits to get the MemStoreFlusher Lock.  So this is leading to a deadlock condition.

> Regionserver Deadlock
> ---------------------
>
>                 Key: HBASE-4101
>                 URL: https://issues.apache.org/jira/browse/HBASE-4101
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.3
>         Environment: CentOS 5.5, CDH3 u0 Hadoop, HBase 0.90.3
>            Reporter: Matt Davies
>            Priority: Blocker
>             Fix For: 0.90.4
>
>         Attachments: jstack.txt
>
>
> We periodically see a situation where the regionserver process exists in the process list, zookeeper thread sends the keepalive so the master won't remove it from the active list, yet the regionserver will not serve data.
> Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an internal testing tool.
> Attached is the full JStack

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira