You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2009/09/18 20:10:16 UTC
[jira] Created: (HBASE-1853) Each time around the regionserver core
loop, we clear the messages to pass master, even if we failed to deliver
them
Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
--------------------------------------------------------------------------------------------------------------------
Key: HBASE-1853
URL: https://issues.apache.org/jira/browse/HBASE-1853
Project: Hadoop HBase
Issue Type: Bug
Components: regionserver
Reporter: stack
At the head of the regionserver run loop we do this:
{code}
synchronized(this.outboundMsgs) {
outboundArray =
this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
this.outboundMsgs.clear();
}
{code}
We do this even if we failed to deliver the message to the master -- Connection refused or whatever.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1853) Each time around the regionserver
core loop, we clear the messages to pass master, even if we failed to
deliver them
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757670#action_12757670 ]
stack commented on HBASE-1853:
------------------------------
I ran a loading over night with extra debug logging a count of how many messages regionserver had to send master. Below is filtered grep:
{code}
2009-09-19 01:00:31,760 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:34,787 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:37,798 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:40,807 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:43,815 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:46,825 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:49,845 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:52,856 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:55,866 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:00:58,875 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:01,885 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:04,911 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 3; SIZE OF msgs 3
2009-09-19 01:01:07,927 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 4; SIZE OF msgs 4
2009-09-19 01:01:10,936 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:14,050 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:17,098 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:20,107 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:23,120 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
2009-09-19 01:01:26,137 [regionserver/208.76.44.140:60020] DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: SIZE OF OUTBOUNDMSGS: 0, WAS 0; SIZE OF msgs 0
{code}
It looks like this patch is doing right thing.
The other item of note was that under high load, hardly any messages are passed between regionserver and master.
> Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-1853
> URL: https://issues.apache.org/jira/browse/HBASE-1853
> Project: Hadoop HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: stack
> Assignee: stack
> Fix For: 0.20.1
>
> Attachments: rs.patch
>
>
> At the head of the regionserver run loop we do this:
> {code}
> synchronized(this.outboundMsgs) {
> outboundArray =
> this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
> this.outboundMsgs.clear();
> }
> {code}
> We do this even if we failed to deliver the message to the master -- Connection refused or whatever.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1853) Each time around the regionserver
core loop, we clear the messages to pass master, even if we failed to
deliver them
Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757589#action_12757589 ]
Jonathan Gray commented on HBASE-1853:
--------------------------------------
Patch looks sane to me. +1 for commit after you test.
> Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-1853
> URL: https://issues.apache.org/jira/browse/HBASE-1853
> Project: Hadoop HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: stack
> Assignee: stack
> Fix For: 0.20.1
>
> Attachments: rs.patch
>
>
> At the head of the regionserver run loop we do this:
> {code}
> synchronized(this.outboundMsgs) {
> outboundArray =
> this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
> this.outboundMsgs.clear();
> }
> {code}
> We do this even if we failed to deliver the message to the master -- Connection refused or whatever.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1853) Each time around the regionserver core
loop, we clear the messages to pass master, even if we failed to deliver
them
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-1853:
-------------------------
Fix Version/s: 0.20.1
Bringing into 0.20.1.
Helps w/ case where master is down for a while and we have a split to deliver. Without a fix for this fix, the split is dropped on the ground.
> Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-1853
> URL: https://issues.apache.org/jira/browse/HBASE-1853
> Project: Hadoop HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: stack
> Fix For: 0.20.1
>
>
> At the head of the regionserver run loop we do this:
> {code}
> synchronized(this.outboundMsgs) {
> outboundArray =
> this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
> this.outboundMsgs.clear();
> }
> {code}
> We do this even if we failed to deliver the message to the master -- Connection refused or whatever.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1853) Each time around the regionserver core
loop, we clear the messages to pass master, even if we failed to deliver
them
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-1853:
-------------------------
Assignee: stack
Status: Patch Available (was: Open)
Review? I'm testing at mo.
> Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-1853
> URL: https://issues.apache.org/jira/browse/HBASE-1853
> Project: Hadoop HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: stack
> Assignee: stack
> Fix For: 0.20.1
>
> Attachments: rs.patch
>
>
> At the head of the regionserver run loop we do this:
> {code}
> synchronized(this.outboundMsgs) {
> outboundArray =
> this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
> this.outboundMsgs.clear();
> }
> {code}
> We do this even if we failed to deliver the message to the master -- Connection refused or whatever.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1853) Each time around the regionserver core
loop, we clear the messages to pass master, even if we failed to deliver
them
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-1853:
-------------------------
Attachment: rs.patch
Patch that doesn't clear the outbound messages array until after we've delivered them to the master.
> Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-1853
> URL: https://issues.apache.org/jira/browse/HBASE-1853
> Project: Hadoop HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: stack
> Fix For: 0.20.1
>
> Attachments: rs.patch
>
>
> At the head of the regionserver run loop we do this:
> {code}
> synchronized(this.outboundMsgs) {
> outboundArray =
> this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
> this.outboundMsgs.clear();
> }
> {code}
> We do this even if we failed to deliver the message to the master -- Connection refused or whatever.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1853) Each time around the regionserver core
loop, we clear the messages to pass master, even if we failed to deliver
them
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-1853:
-------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Thanks for review.
Committed to branch and trunk.
> Each time around the regionserver core loop, we clear the messages to pass master, even if we failed to deliver them
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-1853
> URL: https://issues.apache.org/jira/browse/HBASE-1853
> Project: Hadoop HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: stack
> Assignee: stack
> Fix For: 0.20.1
>
> Attachments: rs.patch
>
>
> At the head of the regionserver run loop we do this:
> {code}
> synchronized(this.outboundMsgs) {
> outboundArray =
> this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
> this.outboundMsgs.clear();
> }
> {code}
> We do this even if we failed to deliver the message to the master -- Connection refused or whatever.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.