You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2009/09/18 20:10:16 UTC

[jira] Created: (HBASE-1853) Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them

Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
--------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-1853
                 URL: https://issues.apache.org/jira/browse/HBASE-1853
             Project: Hadoop HBase
          Issue Type: Bug
          Components: regionserver
            Reporter: stack


At the head of the regionserver run loop we do this:
{code}
          synchronized(this.outboundMsgs) {
            outboundArray =
              this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
            this.outboundMsgs.clear();
          }
{code}

We do this even if we failed to deliver the message to the master -- Connection refused or whatever.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1853) Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757670#action_12757670 ] 

stack commented on HBASE-1853:
------------------------------

I ran a loading over night with extra debug logging a count of how many messages regionserver had to send master.  Below is filtered grep:

{code}
2009-09-19 01:00:31,760 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:34,787 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:37,798 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:40,807 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:43,815 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:46,825 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:49,845 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:52,856 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:55,866 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:58,875 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:01,885 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:04,911 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 3; SIZE OF msgs 3
2009-09-19 01:01:07,927 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 4; SIZE OF msgs 4
2009-09-19 01:01:10,936 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:14,050 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:17,098 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:20,107 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:23,120 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:26,137 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
{code}

It looks like this patch is doing right thing.

The other item of note was that under high load, hardly any messages are passed between regionserver and master.

> Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1853
>                 URL: https://issues.apache.org/jira/browse/HBASE-1853
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.20.1
>
>         Attachments: rs.patch
>
>
> At the head of the regionserver run loop we do this:
> {code}
>           synchronized(this.outboundMsgs) {
>             outboundArray =
>               this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
>             this.outboundMsgs.clear();
>           }
> {code}
> We do this even if we failed to deliver the message to the master -- Connection refused or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1853) Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757589#action_12757589 ] 

Jonathan Gray commented on HBASE-1853:
--------------------------------------

Patch looks sane to me.  +1 for commit after you test.

> Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1853
>                 URL: https://issues.apache.org/jira/browse/HBASE-1853
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.20.1
>
>         Attachments: rs.patch
>
>
> At the head of the regionserver run loop we do this:
> {code}
>           synchronized(this.outboundMsgs) {
>             outboundArray =
>               this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
>             this.outboundMsgs.clear();
>           }
> {code}
> We do this even if we failed to deliver the message to the master -- Connection refused or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1853) Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1853:
-------------------------

    Fix Version/s: 0.20.1

Bringing into 0.20.1.

Helps w/ case where master is down for a while and we have a split to deliver.  Without a fix for this fix, the split is dropped on the ground.

> Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1853
>                 URL: https://issues.apache.org/jira/browse/HBASE-1853
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: stack
>             Fix For: 0.20.1
>
>
> At the head of the regionserver run loop we do this:
> {code}
>           synchronized(this.outboundMsgs) {
>             outboundArray =
>               this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
>             this.outboundMsgs.clear();
>           }
> {code}
> We do this even if we failed to deliver the message to the master -- Connection refused or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1853) Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1853:
-------------------------

    Assignee: stack
      Status: Patch Available  (was: Open)

Review?  I'm testing at mo.

> Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1853
>                 URL: https://issues.apache.org/jira/browse/HBASE-1853
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.20.1
>
>         Attachments: rs.patch
>
>
> At the head of the regionserver run loop we do this:
> {code}
>           synchronized(this.outboundMsgs) {
>             outboundArray =
>               this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
>             this.outboundMsgs.clear();
>           }
> {code}
> We do this even if we failed to deliver the message to the master -- Connection refused or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1853) Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1853:
-------------------------

    Attachment: rs.patch

Patch that doesn't clear the outbound messages array until after we've delivered them to the master.

> Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1853
>                 URL: https://issues.apache.org/jira/browse/HBASE-1853
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: stack
>             Fix For: 0.20.1
>
>         Attachments: rs.patch
>
>
> At the head of the regionserver run loop we do this:
> {code}
>           synchronized(this.outboundMsgs) {
>             outboundArray =
>               this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
>             this.outboundMsgs.clear();
>           }
> {code}
> We do this even if we failed to deliver the message to the master -- Connection refused or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1853) Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1853:
-------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Thanks for review.

Committed to branch and trunk.

> Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1853
>                 URL: https://issues.apache.org/jira/browse/HBASE-1853
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.20.1
>
>         Attachments: rs.patch
>
>
> At the head of the regionserver run loop we do this:
> {code}
>           synchronized(this.outboundMsgs) {
>             outboundArray =
>               this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
>             this.outboundMsgs.clear();
>           }
> {code}
> We do this even if we failed to deliver the message to the master -- Connection refused or whatever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.