You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2009/01/01 00:08:46 UTC

[jira] Created: (HBASE-1104) Doubly-assigned regions redux

Doubly-assigned regions redux
-----------------------------

                 Key: HBASE-1104
                 URL: https://issues.apache.org/jira/browse/HBASE-1104
             Project: Hadoop HBase
          Issue Type: Bug
         Environment: pset cluster with TRUNK.
            Reporter: stack
             Fix For: 0.19.0


Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.

{code}
2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
{code}

See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1104) Doubly-assigned regions redux

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-1104:
---------------------------------

    Assignee: stack  (was: Jim Kellerman)
      Status: Patch Available  (was: Open)

This patch should address HBASE-1104, HBASE-1098 and HBASE-1096.

Please try it out, and review it.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660165#action_12660165 ] 

stack commented on HBASE-1104:
------------------------------

Seems pretty easy to reproduce here on my end.  I don't see any change in the new state transition code other than improved logging and addition of setting closing to false inside in setClose.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>             Fix For: 0.19.0
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660521#action_12660521 ] 

Jim Kellerman commented on HBASE-1104:
--------------------------------------

Closing used to mean "marked to close".
Closed was set when the master told the region server to close the region.
When the master received the confirmation from the region server, 
ProcessRegionClose either set the state to unassigned (if it was to be reassigned)
or removed it from the map if it was being offlined.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>             Fix For: 0.19.0
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1104) Doubly-assigned regions redux

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-1104:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch, 1104.patch.1
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660537#action_12660537 ] 

stack commented on HBASE-1104:
------------------------------

Thanks Jim.  So, how would you fix the 'hole' where we set region to 'closed' state when we return to the regionserver the CLOSE message but if we clear the 'closing' flag -- because surely a region can't be 'closed' and 'closing' -- then the region becomes assignable though it hasn't been closed on the regionserver?

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>             Fix For: 0.19.0
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660163#action_12660163 ] 

Andrew Purtell commented on HBASE-1104:
---------------------------------------

I was going to work up an issue but I think this covers something I've been seeing: Trying to disable a table, the master will give out the regions again as soon as they are closed.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>             Fix For: 0.19.0
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1104) Doubly-assigned regions redux

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-1104:
---------------------------------

    Status: Patch Available  (was: Open)

Running tests now.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch, 1104.patch.1
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1104) Doubly-assigned regions redux

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-1104:
---------------------------------

    Attachment: 1104.patch.1

> stack - 07/Jan/09 08:42 PM
> Did you mean to add in changes to Index: src/webapps/master/WEB-INF/web.xml?

No, and I'm not sure how it got changed. Reverted.

> Want to add more javadoc to the @return in below (Not important...)
{code}
Index: src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
===================================================================
--- src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java (revision 732591)
+++ src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java (working copy)
@@ -126,6 +126,7 @@

    * @param regionName name of the region to update
    * @param b BatchUpdate
    * @param expectedValues map of column names to expected data values.
      + * @return true if
{code}

Done. It was missing the @return altogether, and I just forgot to finish the
comment.

Tell me about this change:

{code}
storedInfo = this.master.serverManager.getServerInfo(serverName);
deadServer = this.master.serverManager.isDead(serverName);

    * deadServerAndLogsSplit =
    * this.master.serverManager.isDeadServerLogsSplit(serverName);

and...

    * if ((deadServerAndLogsSplit ||
    * (!deadServer && (storedInfo == null ||
    * (storedInfo.getStartCode() != startCode)))) &&
    * this.regionManager.assignable(info)) {
      + if ((deadServer ||
      + (storedInfo == null || storedInfo.getStartCode() != startCode))) {
      +
{code}

> It don't look right. Changes I made for 1099 were "allow assigning if
> its a dead server and its commit logs HAVE been split" or "if NOT a
> dead server....because if a dead server and didn't pass first test,
> then its logs are being split.." ... We don't want BaseScanner
> assigning to servers on dead list. If regions are assigned to server
> on dead list, when dead server runs its scan in shutdown handler,
> we'll reassign these regions as though they'd been on crashed server;
> makes for double assignment and a mess.

You're right. It was a half finished change. What I meant to do was
not assign regions that are offline, in transition or were assigned to
a dead server since ProcessServerShutdown does that.

> You also remove the new method assignable. Don't we want to check if
> region is 'assignable' before dropping into this assigning code block?
> (Not sure... so asking).

If we get this far, we know the region is assignable because of the
test above.

> Your patch does this which as discussed on IRC is not whats wanted:
{code}
@@ -1088,12 +1088,8 @@
       byte [] closestKey = store.getRowKeyAtOrBefore(row);
       // If it happens to be an exact match, we can stop looping.
       // Otherwise, we need to check if it's the max and move to the next
-      if (HStoreKey.equalsTwoRowKeys(regionInfo, row, closestKey)) {
+      if (closestKey != null) {
         key = new HStoreKey(closestKey, this.regionInfo);
-      } else if (closestKey != null &&
-          (key == null || HStoreKey.compareTwoRowKeys(
-              regionInfo,closestKey, key.getRow()) > 0) ) {
-        key = new HStoreKey(closestKey, this.regionInfo);
       } else {
         return null;
       }
{code}

After some discussion with Stack, we determined that neither
implementation was correct. The new code is:
{code}
      // get the closest key. (HStore.getRowKeyAtOrBefore can return null)
      byte [] closestKey = store.getRowKeyAtOrBefore(row);
      // If it happens to be an exact match, we can stop.
      // Otherwise, we need to check if it's the max and move to the next
      if (closestKey != null) {
        if (HStoreKey.equalsTwoRowKeys(regionInfo, row, closestKey)) {
          key = new HStoreKey(closestKey, this.regionInfo);
        }
        if (key == null) {
          key = new HStoreKey(closestKey, this.regionInfo);
        }
      }
      if (key == null) {
        return null;
      }
{code}

> Do you think this safe Jim in below?
{code}
@@ -564,9 +566,10 @@
       //       the messages we've received. In this case, a close could be
       //       processed before an open resulting in the master not agreeing on
       //       the region's state.
+      master.regionManager.setClosed(region.getRegionName());
{code}

> Will we have the problem where state changes are processed out of
> order? Thinking on it, it doesn't seem so but asking just to check.

No, I don't think it is a problem, because the region is still in
transition and cannot be reassigned until the RegionState is removed
from the map.



> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch, 1104.patch.1
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662161#action_12662161 ] 

Andrew Purtell commented on HBASE-1104:
---------------------------------------

Testing with the latest patch now.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch, 1104.patch.1
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662192#action_12662192 ] 

stack commented on HBASE-1104:
------------------------------

+1 on patch and my testing fails to lose data or reproduce doubly-assign regions, at least to date.  It used to be easy to do.  Thats at least changed.  Will open new issue if I come across a problem.  Also tested killing the server carrying -ROOT-.  It took too long to come up -- another issue hbase-1111 -- but it came up.  How's your testing going apurtell?

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch, 1104.patch.1
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1104) Doubly-assigned regions redux

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reassigned HBASE-1104:
------------------------------------

    Assignee: Jim Kellerman  (was: stack)

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1104) Doubly-assigned regions redux

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-1104:
---------------------------------

    Status: Open  (was: Patch Available)

Addressing Stack's issues

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HBASE-1104) Doubly-assigned regions redux

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662211#action_12662211 ] 

apurtell edited comment on HBASE-1104 at 1/8/09 5:35 PM:
---------------------------------------------------------------

+1

Looking good here. I shut down some HRS to provoke, but everything recovered. I was especially pleased that manual close_region invocations were not needed, unlike before. Recovery is a little slow as mentioned in HBASE-1111, but prior situation often was no recovery (missing regions) or permanent failure (region corruption). 

      was (Author: apurtell):
    Looking good here. I shut down some HRS to provoke, but everything recovered. I was especially pleased that manual close_region invocations were not needed, unlike before. Recovery is a little slow as mentioned in HBASE-1111, but prior situation often was no recovery (missing regions) or permanent failure (region corruption). 
  
> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch, 1104.patch.1
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1104) Doubly-assigned regions redux

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-1104:
---------------------------------

    Attachment: 1104.patch

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661000#action_12661000 ] 

Jim Kellerman commented on HBASE-1104:
--------------------------------------

Ok, I can see where this could be confusing. In HBASE-543, a newly discovered region would be
set to 'unassigned'. 

- if a region is 'unassigned' it is a candidate to be opened by the next region server that checks in.
- if the region is assigned to a region server, it is marked as assigned.
- when the region server reports that it has opened the region, it is marked as pending

Once ProcessRegionOpen runs and the HRS is has been stored in the META table, it is removed
from the Map of regionsInTransition

- When it is determined that a region should be closed, the region is marked as 'closing'
- When the master sends the close message to the HRS, the region's status is set as
  closing + closed (and if the region is being off-lined in the process, the status is: closing +
  closed + offlined)

Once the HRS reports that a region is closed, ProcessRegionClose is called. If the region 
should be reassigned (i.e., offlined == false), then the region status is set to unassigned
so that it will get picked up and assigned to the first region server that reports in that is not
overloaded.

If the region has been offlined, ProcessRegionClose will remove the region from the
regionsInTransition Map. 

Ok, so what does this boil down to? There are three states for getting a region served: 
1) unassigned 
2) assigned 
3) pending

However, for regions being closed it is more complex:
- closing means the region is in the process of being closed
- closing + closed means that the master has told the HRS to close the region.
- closing + offline means that the master wants to close the region and have it offlined
- closing + closed + offline means that the master has told the HRS to close the region,
  and that it will be offlined once the HRS reports that it has closed the region.

The reason for this approach was that if a region was closing, it could not be marked
as unassigned. Only ProcessRegionClose would know if the region should be reassigned,
and if not, it would remove the region from the regionsInTransition Map. If the region was
to be reassigned, it would stay in the map and its status would be changed to "unassigned"

As opening a region requires three states (unassigned, assigned, pending), closing a region
should be similar:
- close -- region server should be told that region is to be closed when the HRS reports in
- closing -- the HRS has been told to close the region
- closed - HRS reports that the region is closed.

When a region has a status of closing, it also has a substatus of closing and/or offlined.
If offlined, and the status == closed, then the master should remove the region from the 
regionsInTransition Map. If not offlined, the region should have its status set to unassigned.

So that is how it should work, but because starting up a region requires three state transitions
and closing one down currently only requires two, it is confusing.

Changing region close to be symmetrical with region open should clarify (and simplify) how
regions get reassigned.



> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660177#action_12660177 ] 

stack commented on HBASE-1104:
------------------------------

We set region as 'closed' when master sends over the CLOSE message to the regionserver.  My adding the clearing of the 'closing' when state goes to 'close' unearthed this 'hole' (least seems like a hole that region could be 'closing' and 'closed' at same time).  I think state should be set to 'closing' when master sends message out to regionserver and then on receipt of the close confirmation, then it should move region to 'closed' and 'unassigned'.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>             Fix For: 0.19.0
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660255#action_12660255 ] 

Andrew Purtell commented on HBASE-1104:
---------------------------------------

Got this on the JP cluster today:

>From master log:

2009-01-02 00:05:09,671 DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region content,b1d86dfc01925c3b891285b12d3bf452,1230853204947
2009-01-02 00:05:12,765 INFO org.apache.hadoop.hbase.master.RegionManager: assigning region content,b1d86dfc01925c3b891285b12d3bf452,1230853204947 to server 10.3.134.207:60020
2009-01-02 00:05:15,778 INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: content,b1d86dfc01925c3b891285b12d3bf452,1230853204947 from 10.3.134.207:60020
2009-01-02 00:05:15,779 INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: content,b1d86dfc01925c3b891285b12d3bf452,1230853204947 open on 10.3.134.207:60020
2009-01-02 00:05:27,699 INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: content,b1d86dfc01925c3b891285b12d3bf452,1230853204947 from 10.3.134.221:60020
2009-01-02 00:05:28,688 INFO org.apache.hadoop.hbase.master.RegionManager: assigning region content,b1d86dfc01925c3b891285b12d3bf452,1230853204947 to server 10.3.134.222:60020
2009-01-02 00:05:31,697 INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: content,b1d86dfc01925c3b891285b12d3bf452,1230853204947 from 10.3.134.222:60020
2009-01-02 00:05:31,697 INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: content,b1d86dfc01925c3b891285b12d3bf452,1230853204947 open on 10.3.134.222:60020
2009-01-02 00:05:31,698 INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row content,b1d86dfc01925c3b891285b12d3bf452,1230853204947 in region .META.,,1 with startcode 1230848433306 and server 10.3.134.222:60020

There are three HRS involved here.

This leads to file system trouble:

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server 10.3.134.225:60020 for region content,b1d86dfc01925c3b891285b12d3bf452,1230853204947, row 'b1d86dfc01925c3b891285b12d3bf452', but failed after 10 attempts.
Exceptions:
java.io.IOException: java.io.IOException: HStoreScanner failed construction
	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.(StoreFileScanner.java:70)
	at org.apache.hadoop.hbase.regionserver.HStoreScanner.(HStoreScanner.java:84)
	at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2119)
	at org.apache.hadoop.hbase.regionserver.HRegion$HScanner.(HRegion.java:1878)
	at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1162)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1673)
	at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:894)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://jdc2-atr-dc-1.atr.trendmicro.com:50000/data/hbase/content/485774435/info/mapfiles/1743031531971685847/data
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:394)
	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
	at org.apache.hadoop.hbase.io.SequenceFile$Reader.(SequenceFile.java:1431)
	at org.apache.hadoop.hbase.io.SequenceFile$Reader.(SequenceFile.java:1426)
	at org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:310)
	at org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:96)
	at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:292)
	at org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.(HBaseMapFile.java:79)
	at org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.(BloomFilterMapFile.java:65)
	at org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:443)
	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:96)
	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.(StoreFileScanner.java:67)
	... 10 more


> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>             Fix For: 0.19.0
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662211#action_12662211 ] 

Andrew Purtell commented on HBASE-1104:
---------------------------------------

Looking good here. I shut down some HRS to provoke, but everything recovered. I was especially pleased that manual close_region invocations were not needed, unlike before. Recovery is a little slow as mentioned in HBASE-1111, but prior situation often was no recovery (missing regions) or permanent failure (region corruption). 

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch, 1104.patch.1
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1104) Doubly-assigned regions redux

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-1104:
---------------------------------

    Attachment: 1104.patch

This one has no <cr><lf>'s

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1104) Doubly-assigned regions redux

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reassigned HBASE-1104:
------------------------------------

    Assignee: Jim Kellerman

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661832#action_12661832 ] 

Andrew Purtell commented on HBASE-1104:
---------------------------------------

Thanks Jim. I'll give it a shot right now. After restart my cluster should split like crazy. Want the master log for anything, please ask stack. He can show you how to get it.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1104) Doubly-assigned regions redux

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-1104:
---------------------------------

    Attachment:     (was: 1104.patch)

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661840#action_12661840 ] 

stack commented on HBASE-1104:
------------------------------

Did you mean to add in changes to Index: src/webapps/master/WEB-INF/web.xml?

Want to add more javadoc to the @return in below (Not important...)

Index: src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
===================================================================
--- src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java  (revision 732591)
+++ src/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java  (working copy)
@@ -126,6 +126,7 @@
    * @param regionName name of the region to update
    * @param b BatchUpdate
    * @param expectedValues map of column names to expected data values.
+   * @return true if 

Tell me about this change:

         storedInfo = this.master.serverManager.getServerInfo(serverName);
         deadServer = this.master.serverManager.isDead(serverName);
-        deadServerAndLogsSplit =
-          this.master.serverManager.isDeadServerLogsSplit(serverName);


and...


-      if ((deadServerAndLogsSplit ||
-          (!deadServer && (storedInfo == null ||
-            (storedInfo.getStartCode() != startCode)))) &&
-          this.regionManager.assignable(info)) {
+      if ((deadServer ||
+          (storedInfo == null || storedInfo.getStartCode() != startCode))) {
+

It don't look right.  Changes I made for 1099 were "allow assigning if its a dead server and its commit logs HAVE been split" or "if NOT a dead server....because if a dead server and didn't pass first test, then its logs are being split.."  ... We don't want BaseScanner assigning to servers on dead list.  If regions are assigned to server on dead list, when dead server runs its scan in shutdown handler, we'll reassign these regions as though they'd been on crashed server; makes for double assignment and a mess.

You also remove the new method assignable.  Don't we want to check if region is 'assignable' before dropping into this assigning code block? (Not sure... so asking).

Your patch does this which as discussed on IRC is not whats wanted:

{code}
@@ -1088,12 +1088,8 @@
       byte [] closestKey = store.getRowKeyAtOrBefore(row);
       // If it happens to be an exact match, we can stop looping.
       // Otherwise, we need to check if it's the max and move to the next
-      if (HStoreKey.equalsTwoRowKeys(regionInfo, row, closestKey)) {
+      if (closestKey != null) {
         key = new HStoreKey(closestKey, this.regionInfo);
-      } else if (closestKey != null &&
-          (key == null || HStoreKey.compareTwoRowKeys(
-              regionInfo,closestKey, key.getRow()) > 0) ) {
-        key = new HStoreKey(closestKey, this.regionInfo);
       } else {
         return null;
       }
{code}

Do you think this safe Jim in below?

{code}
@@ -564,9 +566,10 @@
       //       the messages we've received. In this case, a close could be
       //       processed before an open resulting in the master not agreeing on
       //       the region's state.
+      master.regionManager.setClosed(region.getRegionName());
{code}

Will we have the problem where state changes are processed out of order?  Thinking on it, it doesn't seem so but asking just to check.

I'll hold on testing the patch until answer on above.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660161#action_12660161 ] 

stack commented on HBASE-1104:
------------------------------

This part from above log extract is where we've gone wrong:

{code}
2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
{code}

We've set regionstate to 'closing' but then six seconds later we give out the region.   Looking at code, if its state is 'closing', we shouldn't be giving it out.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>             Fix For: 0.19.0
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661844#action_12661844 ] 

stack commented on HBASE-1104:
------------------------------

Other comments, I like the clean up of state namings.  Much clearer now whats going on (Was super confusing before -- now I'll have a chance at grokking whats going on).  Also, I liked removal of passing RegionManager when its accessible off the Master.  Good cleanup.

> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>         Attachments: 1104.patch
>
>
> Testing, I see doubly assigned regions.  Below is from master log for TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; daughters: TestTable,0000116170,1230761605500, TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG org.apache.hadoop.hbase.master.RegionManager: Going to close region TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.hbase.master.RegionManager: assigning region TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG org.apache.hadoop.hbase.master.BaseScanner: TestTable,0000135598,1230761605500 no longer has references to TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.