You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/09/10 10:15:09 UTC

[jira] [Created] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
----------------------------------------------------------------------------------

                 Key: HBASE-4367
                 URL: https://issues.apache.org/jira/browse/HBASE-4367
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.90.4
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon
            Priority: Critical
             Fix For: 0.92.0


We observed a deadlock in production between the following threads:
- IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
- cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102063#comment-13102063 ] 

Andrew Purtell edited comment on HBASE-4367 at 9/10/11 3:45 PM:
----------------------------------------------------------------

I think option 1 is fine but if the larger issue is a deadlock due to bundle loading, perhaps we can remove all uses of Date in the codebase?

      was (Author: apurtell):
    I think option 1 is fine but if the larger issue is a deadlock in bundle loading, perhaps we can remove all uses of Date in the codebase?
  
> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu reassigned HBASE-4367:
-----------------------------

    Assignee: Todd Lipcon  (was: Ted Yu)

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112194#comment-13112194 ] 

Todd Lipcon commented on HBASE-4367:
------------------------------------

Yes, it's apparently fixed in 6u19, see http://www.oracle.com/technetwork/java/javase/6u19-141078.html ("(rb) ResourceBundle and/or SimpleDateFormat not thread safe (hangs JVM)")

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4367:
--------------------------

    Attachment: 4367.txt

Here is patch for option 1 in my comment.

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-4367:
-------------------------------

    Attachment: hbase-4367.txt

Here's another way to solve this. I added a new HasThread class which has a private thread member, but otherwise extending from it acts just like extending from Thread.

I haven't run the full test suite yet, but I started a server and poked around at a thread dump, looks moderately OK.

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103335#comment-13103335 ] 

Todd Lipcon commented on HBASE-4367:
------------------------------------

bq. Should we apply both patches (loading bundles to do a Date seems silly).

Yet I think it's nice to print out dates in human-readable forms in some places. I don't know about you, but I can't tell you what time 12349582034 is :)

bq. Is ThreadWrapper a better name for HasThread (which sounds like a method name) ?

Sure, I'll rename and post a new patch tomorrow.

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106957#comment-13106957 ] 

stack commented on HBASE-4367:
------------------------------

We can do without the removal of the Date from PriorityCompactionQueue (There is no PCQ in TRUNK as it happens).

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102121#comment-13102121 ] 

Todd Lipcon commented on HBASE-4367:
------------------------------------

I don't think removing Date is the right solution. There are lots of places where bundles might get loaded. The better solution is to not synchronize on Thread objects, and I don't think it's much harder to do. I will take a crack at a patch

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102014#comment-13102014 ] 

Todd Lipcon commented on HBASE-4367:
------------------------------------

This was once reported on the mailing list here:
http://mail-archives.apache.org/mod_mbox/hbase-user/201104.mbox/%3CBANLkTinCmfarwvqw8aF1DidZwD7kpiEH5A@mail.gmail.com%3E

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135008#comment-13135008 ] 

Hudson commented on HBASE-4367:
-------------------------------

Integrated in HBase-TRUNK #2366 (See [https://builds.apache.org/job/HBase-TRUNK/2366/])
    HBASE-4367 Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/Chore.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Leases.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HasThread.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java

                
> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102071#comment-13102071 ] 

Ted Yu commented on HBASE-4367:
-------------------------------

After applying my patch:
{code}
tyu-mbp:trunk tyu$ find src/main -name '*.java' -exec grep ' Date ' {} \; -print
tyu-mbp:trunk tyu$ find src/main -name '*.java' -exec grep 'Date ' {} \; -print
  private MetricsString hdfsDate = new MetricsString("hdfsDate", registry,
src/main/java/org/apache/hadoop/hbase/metrics/HBaseInfo.javatyu-mbp:trunk tyu$ 
{code}
i.e. there is no other class using Date.

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102486#comment-13102486 ] 

ramkrishna.s.vasudevan commented on HBASE-4367:
-----------------------------------------------

HBASE-4101 does the same thing of removing the Date object.  Perhaps could have found all similar cases and arrived at the solution.

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103309#comment-13103309 ] 

stack commented on HBASE-4367:
------------------------------

+1 on your fancy dancing Todd.

Should we apply both patches (loading bundles to do a Date seems silly).

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102096#comment-13102096 ] 

stack commented on HBASE-4367:
------------------------------

+1 on patch and applying to 0.90.  There is one other new Date in AssignmentManager used by MasterDumpServlet.... should we purge that too?  Its for a human to read and its in the master rather than RS so probably not too bad... can wait see?

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu reassigned HBASE-4367:
-----------------------------

    Assignee: Ted Yu  (was: Todd Lipcon)

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102010#comment-13102010 ] 

Todd Lipcon commented on HBASE-4367:
------------------------------------

Thread dumps (line numbers from 0.90.4)

{noformat}
"IPC Server handler 37 on 60020":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aaab584cee0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2061)
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:444)
        - locked <0x00002aaab5519648> (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2586)


"regionserver60020.cacheFlusher":
        at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
        - waiting to lock <0x00002aaab5519648> (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
        at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
        at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
        at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
        at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
        at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
        at sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
        at sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
        at sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80)
        at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
        at java.util.TimeZone.getDisplayName(TimeZone.java:350)
        at java.util.Date.toString(Date.java:1025)
        at java.lang.String.valueOf(String.java:2826)
        at java.lang.StringBuilder.append(StringBuilder.java:115)
        at org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionRequest.toString(PriorityCompactionQueue.java:114)
        at java.lang.String.valueOf(String.java:2826)
        at java.lang.StringBuilder.append(StringBuilder.java:115)
        at org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQueue(PriorityCompactionQueue.java:145)
        - locked <0x00002aaab55aa2a8> (a java.util.HashMap)
        at org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCompactionQueue.java:188)
        at org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:140)
        - locked <0x00002aaab555c870> (a org.apache.hadoop.hbase.regionserver.CompactSplitThread)
        at org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:118)
        - locked <0x00002aaab555c870> (a org.apache.hadoop.hbase.regionserver.CompactSplitThread)
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:395)
{noformat}

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102074#comment-13102074 ] 

Andrew Purtell commented on HBASE-4367:
---------------------------------------

+1

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102063#comment-13102063 ] 

Andrew Purtell commented on HBASE-4367:
---------------------------------------

I think option 1 is fine but if the larger issue is a deadlock in bundle loading, perhaps we can remove all uses of Date in the codebase?

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102075#comment-13102075 ] 

Andrew Purtell commented on HBASE-4367:
---------------------------------------

This also needs to be applied to 0.90 branch, please.

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103320#comment-13103320 ] 

Ted Yu commented on HBASE-4367:
-------------------------------

Is ThreadWrapper a better name for HasThread (which sounds like a method name) ?

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112176#comment-13112176 ] 

Eli Collins commented on HBASE-4367:
------------------------------------

Any idea which java versions have this bug?

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102012#comment-13102012 ] 

Todd Lipcon commented on HBASE-4367:
------------------------------------

I think the solution is straightforward: make MemStoreFlusher contain a Thread object rather than extend Thread itself. Or, have it extend Runnable, and create a thread to run it in when it's created.

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103345#comment-13103345 ] 

stack commented on HBASE-4367:
------------------------------

bq. Yet I think it's nice to print out dates in human-readable forms in some places. I don't know about you, but I can't tell you what time 12349582034 is 

For sure, but it looked like a silly compare of timestamps was all that was going on in the patch (perhaps there a toString in here somewhere...).

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4367:
--------------------------

    Status: Patch Available  (was: Open)

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135116#comment-13135116 ] 

Hudson commented on HBASE-4367:
-------------------------------

Integrated in HBase-0.92 #79 (See [https://builds.apache.org/job/HBase-0.92/79/])
    HBASE-4367 Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/Chore.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Leases.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HasThread.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java

                
> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106956#comment-13106956 ] 

stack commented on HBASE-4367:
------------------------------

OK to commit this?

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103459#comment-13103459 ] 

Ted Yu commented on HBASE-4367:
-------------------------------

bq. but I can't tell you what time 12349582034 is 
It is Sat May 23 1970 15:26:22 GMT-0700 (PST)
I use http://www.ruddwire.com/handy-code/date-to-millisecond-calculators/ quite often.

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4367:
-------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Committed to 0.92 branch and trunk.  Committed Todd's patch only -- hbase-4367.txt.
                
> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4367.txt, hbase-4367.txt
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4367) Deadlock in MemStore flusher due to JDK internally synchronizing on current thread

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102020#comment-13102020 ] 

Ted Yu commented on HBASE-4367:
-------------------------------

According to the line number above, CompactionRequest.toString() is from this line:
{code}
        LOG.trace("Inserting region in queue. " + newRequest);
{code}
I think the above code was for debugging purpose. Meaning, we can change what CompactionRequest.toString() returns.

The Date field in CompactionRequest is used in its compareTo() method.
By examining the code, the following is always executed (starting line 61):
{code}
      if (d == null) {
        d = new Date();
      }
{code}
I think we can replace Date field with a long field which carries timestamp.
This would satisfy its usage in compareTo() method and avoid Date.toString() altogether.

Option 2 is to omit the date component in CompactionRequest.toString().

If any of the above options is chosen, please allow me to provide a patch.

> Deadlock in MemStore flusher due to JDK internally synchronizing on current thread
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-4367
>                 URL: https://issues.apache.org/jira/browse/HBASE-4367
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> We observed a deadlock in production between the following threads:
> - IPC handler thread holding the monitor lock on MemStoreFlusher inside reclaimMemStoreMemory, waiting to obtain MemStoreFlusher.lock (the reentrant lock member)
> - cacheFlusher thread inside flushRegion holds MemStoreFlusher.lock, and then calls PriorityCompactionQueue.add, which calls PriorityCompactionQueue.addToRegionsInQueue, which calls CompactionRequest.toString(), which calls Date.toString. If this occurs just after a GC under memory pressure, Date.toString needs to reload locale information (stored in a soft reference), so it calls ResourceBundle.loadBundle, which uses Thread.currentThread() as a synchronizer (see sun bug http://bugs.sun.com/view_bug.do?bug_id=6915621). Since the current thread is the MemStoreFlusher itself, we have a lock order inversion and a deadlock.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira