You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2011/03/11 01:35:59 UTC

[jira] Created: (HBASE-3621) The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no

The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
-------------------------------------------------------------------------------------------

                 Key: HBASE-3621
                 URL: https://issues.apache.org/jira/browse/HBASE-3621
             Project: HBase
          Issue Type: Bug
            Reporter: stack
             Fix For: 0.90.2


J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition).  Fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Assigned: (HBASE-3621) The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu reassigned HBASE-3621:
-----------------------------

    Assignee: Ted Yu

> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3621
>                 URL: https://issues.apache.org/jira/browse/HBASE-3621
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Ted Yu
>             Fix For: 0.90.2
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition).  Fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3621) The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009958#comment-13009958 ] 

Hudson commented on HBASE-3621:
-------------------------------

Integrated in HBase-TRUNK #1803 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1803/])
    

> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3621
>                 URL: https://issues.apache.org/jira/browse/HBASE-3621
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Ted Yu
>             Fix For: 0.90.2
>
>         Attachments: 3621-v2.txt, hbase-3621.txt
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition).  Fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3621) The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3621:
--------------------------

    Attachment: hbase-3621.txt

Initial attempt.
I moved the processing of regions in PENDING_CLOSE state outside of synchronized block.

> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3621
>                 URL: https://issues.apache.org/jira/browse/HBASE-3621
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Ted Yu
>             Fix For: 0.90.2
>
>         Attachments: hbase-3621.txt
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition).  Fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3621) The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3621.
--------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Committed to branch and trunk.  Thanks for the patch Ted.

> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3621
>                 URL: https://issues.apache.org/jira/browse/HBASE-3621
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Ted Yu
>             Fix For: 0.90.2
>
>         Attachments: 3621-v2.txt, hbase-3621.txt
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition).  Fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3621) The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3621:
-------------------------

    Attachment: 3621-v2.txt

This is what I committed.  Its Teds patch plus I do same thing for assign (Teds patch covered unassigns happening under locks -- I noticed that we are also assigning under locks so I copied Teds pattern moving assign out to run outside of the lock on regionsInTransition.

> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3621
>                 URL: https://issues.apache.org/jira/browse/HBASE-3621
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Ted Yu
>             Fix For: 0.90.2
>
>         Attachments: 3621-v2.txt, hbase-3621.txt
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition).  Fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3621) The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005441#comment-13005441 ] 

Ted Yu commented on HBASE-3621:
-------------------------------

unassign() has the following block which is before making RPC call:
{code}
    synchronized (regionsInTransition) {
    }
{code}
so we don't need to enclose unassign() in the synchronized block of chore().

> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3621
>                 URL: https://issues.apache.org/jira/browse/HBASE-3621
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.90.2
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition).  Fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3621) The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005438#comment-13005438 ] 

Jean-Daniel Cryans commented on HBASE-3621:
-------------------------------------------

For example:

{code}
"somenode.prod.twitter.com:60000.timeoutMonitor" daemon prio=10 tid=0x00002aacb8567800 nid=0x772 in Object.wait() [0x0000000045bf1000]
   java.lang.Thread.State: WAITING (on object monitor)
  at java.lang.Object.wait(Native Method)
  at java.lang.Object.wait(Object.java:485)
  at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
  - locked <0x00002aaab2a10da8> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
  at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
  at $Proxy6.closeRegion(Unknown Source)
  at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
  at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1093)
  at org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1672)
  - locked <0x00002aaabf759858> (a java.util.concurrent.ConcurrentSkipListMap)
  at org.apache.hadoop.hbase.Chore.run(Chore.java:66
...

"main-EventThread" daemon prio=10 tid=0x00002aacb850b000 nid=0x761 waiting for monitor entry [0x00000000455eb000]
   java.lang.Thread.State: BLOCKED (on object monitor)
  at org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525)
  - waiting to lock <0x00002aaabf759858> (a java.util.concurrent.ConcurrentSkipListMap)
  at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268)
  at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
{code}

The ZK event thread is blocked by that other thread that talks to a RS that doesn't answer. All ZK events get severely delayed.

> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3621
>                 URL: https://issues.apache.org/jira/browse/HBASE-3621
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.90.2
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition).  Fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3621) The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007974#comment-13007974 ] 

Ted Yu commented on HBASE-3621:
-------------------------------

assign() calls the following method:
{code}
        serverManager.sendRegionOpen(plan.getDestination(), state.getRegion());
{code}
I think we should move call to assign() outside of
{code}
synchronized (regionsInTransition) {
}
{code}
block.

I noticed the call assign(regionState, false, true) at line 1790 doesn't obtain lock on regionState, inconsistent with the other calls to:
{code}
  private void assign(final RegionState state, final boolean setOfflineInZK,
      final boolean forceNewPlan) {
{code}


> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3621
>                 URL: https://issues.apache.org/jira/browse/HBASE-3621
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Ted Yu
>             Fix For: 0.90.2
>
>         Attachments: hbase-3621.txt
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition).  Fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira