You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2011/07/05 05:27:22 UTC

[jira] [Created] (HBASE-4060) ServerShutdownHandler.FindDaughterVisitor doesn't detect whether daughter region is assigned to some server

ServerShutdownHandler.FindDaughterVisitor doesn't detect whether daughter region is assigned to some server
-----------------------------------------------------------------------------------------------------------

                 Key: HBASE-4060
                 URL: https://issues.apache.org/jira/browse/HBASE-4060
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.3
            Reporter: Ted Yu
            Assignee: Ted Yu
             Fix For: 0.90.4




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4060) Making region assignment more robust

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4060:
-------------------------

    Fix Version/s:     (was: 0.96.0)

Moving out of 0.96.0.  Needs to be worked through more.  Won't be done for 0.96.0
                
> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: ramkrishna.s.vasudevan
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.
> For more background information, see 'Errors after major compaction' discussion on user@hbase.apache.org

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4060) Making region assignment more robust

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-4060:
---------------------------------

    Fix Version/s:     (was: 0.94.0)
                   0.96.0

Moving out of 0.94.
                
> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.96.0
>
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.
> For more background information, see 'Errors after major compaction' discussion on user@hbase.apache.org

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4060) Making region assignment more robust

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060036#comment-13060036 ] 

Ted Yu commented on HBASE-4060:
-------------------------------

On top of HBASE-3789, we should consider the following two remedies for 0.90 branch:
1. how to speed up enabling table with large number of regions (12K in Eran's case)
2. AM.TimeoutMonitor.chore() may reassign a region which just completed OpenedRegionHandler.process()

For #2 above, better coordination between OpenedRegionHandler and AM.TimeoutMonitor should be devised.

> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>             Fix For: 0.92.0
>
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.
> For more background information, see 'Errors after major compaction' discussion on user@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4060) Making region assignment more robust

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4060:
--------------------------

      Description: 
>From Eran Kutner:
My concern is that the region allocation process seems to rely too much on
timing considerations and doesn't seem to take enough measures to guarantee
conflicts do not occur. I understand that in a distributed environment, when
you don't get a timely response from a remote machine you can't know for
sure if it did or did not receive the request, however there are things that
can be done to mitigate this and reduce the conflict time significantly. For
example, when I run dbck it knows that some regions are multiply assigned,
the master could do the same and try to resolve the conflict. Another
approach would be to handle late responses, even if the response from the
remote machine arrives after it was assumed to be dead the master should
have enough information to know it had created a conflict by assigning the
region to another server. An even better solution, I think, is for the RS to
periodically test that it is indeed the rightful owner of every region it
holds and relinquish control over the region if it's not.
Obviously a state where two RSs hold the same region is pathological and can
lead to data loss, as demonstrated in my case. The system should be able to
actively protect itself against such a scenario. It probably doesn't need
saying but there is really nothing worse for a data storage system than data
loss.

In my case the problem didn't happen in the initial phase but after
disabling and enabling a table with about 12K regions.
    Fix Version/s:     (was: 0.90.4)
          Summary: Making region assignment more robust  (was: ServerShutdownHandler.FindDaughterVisitor doesn't detect whether daughter region is assigned to some server)

> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4060) Making region assignment more robust

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4060:
--------------------------

    Description: 
>From Eran Kutner:
My concern is that the region allocation process seems to rely too much on
timing considerations and doesn't seem to take enough measures to guarantee
conflicts do not occur. I understand that in a distributed environment, when
you don't get a timely response from a remote machine you can't know for
sure if it did or did not receive the request, however there are things that
can be done to mitigate this and reduce the conflict time significantly. For
example, when I run dbck it knows that some regions are multiply assigned,
the master could do the same and try to resolve the conflict. Another
approach would be to handle late responses, even if the response from the
remote machine arrives after it was assumed to be dead the master should
have enough information to know it had created a conflict by assigning the
region to another server. An even better solution, I think, is for the RS to
periodically test that it is indeed the rightful owner of every region it
holds and relinquish control over the region if it's not.
Obviously a state where two RSs hold the same region is pathological and can
lead to data loss, as demonstrated in my case. The system should be able to
actively protect itself against such a scenario. It probably doesn't need
saying but there is really nothing worse for a data storage system than data
loss.

In my case the problem didn't happen in the initial phase but after
disabling and enabling a table with about 12K regions.

For more background information, see 'Errors after major compaction' discussion on user@hbase.apache.org

  was:
>From Eran Kutner:
My concern is that the region allocation process seems to rely too much on
timing considerations and doesn't seem to take enough measures to guarantee
conflicts do not occur. I understand that in a distributed environment, when
you don't get a timely response from a remote machine you can't know for
sure if it did or did not receive the request, however there are things that
can be done to mitigate this and reduce the conflict time significantly. For
example, when I run dbck it knows that some regions are multiply assigned,
the master could do the same and try to resolve the conflict. Another
approach would be to handle late responses, even if the response from the
remote machine arrives after it was assumed to be dead the master should
have enough information to know it had created a conflict by assigning the
region to another server. An even better solution, I think, is for the RS to
periodically test that it is indeed the rightful owner of every region it
holds and relinquish control over the region if it's not.
Obviously a state where two RSs hold the same region is pathological and can
lead to data loss, as demonstrated in my case. The system should be able to
actively protect itself against such a scenario. It probably doesn't need
saying but there is really nothing worse for a data storage system than data
loss.

In my case the problem didn't happen in the initial phase but after
disabling and enabling a table with about 12K regions.


> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Ted Yu
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.
> For more background information, see 'Errors after major compaction' discussion on user@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4060) Making region assignment more robust

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4060:
--------------------------

    Assignee:     (was: Ted Yu)

> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Ted Yu
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4060) Making region assignment more robust

Posted by "Eran Kutner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070154#comment-13070154 ] 

Eran Kutner commented on HBASE-4060:
------------------------------------

I will try to elaborate a bit on what I had in mind, I think it is not very far from what Andrew suggested earlier.
First I should say that I am not familiar enough with the current implementation so my understanding may not be correct or accurate. However, based on what I understand, the current implementation doesn't seem to be robust enough, because it is based on active communication between the master and RSs, which leaves room for timeouts and failures.
My suggestion is to be more proactive about monitoring the assignment of regions and allow the RSs themselves to know which regions are assigned to them at any time.
I suggest opening a new znode in ZK for listing the regions and their assignment. It can be something like /hbase/regions/<table>/<region>, so each region will have a znode. Under that will be a znode for the assigned RS.
When the master assigns a region to a RS it should delete the old owner record from the list and add the new one.
When a RS gets an assignment command from the master it should list the children of the znode corresponding to the assigned region and set a watcher for that. The RS should verify it is indeed the owner registered in ZK. If it is not it should immediately refuse to accept the region assignment command.
If the RS receives an event trigger from one of the watchers it had set, it should re-check that region assignment and validate it is still the owner of the region. If it's not, it should relinquish control over the region.
The process so far should guarantee that there are never double assigned regions, however it may create orphan regions which are not assigned to any RS. To resolve that the master should periodically check for unassigned regions and reassign them.



> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>             Fix For: 0.92.0
>
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.
> For more background information, see 'Errors after major compaction' discussion on user@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4060) Making region assignment more robust

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070218#comment-13070218 ] 

Jonathan Gray commented on HBASE-4060:
--------------------------------------

The primary difference between the suggestion by Eran and what is currently implemented is that the per-region znodes are never deleted in Eran's design.  The existing implementation uses znodes to track regions that are currently in transition.  An assigned and open region doesn't have a znode (nor would an unassigned and closed region of a disabled table).

Check out ZKAssign and AssignmentManager for details on how that works.

> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>             Fix For: 0.92.0
>
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.
> For more background information, see 'Errors after major compaction' discussion on user@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4060) Making region assignment more robust

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060193#comment-13060193 ] 

Ted Yu commented on HBASE-4060:
-------------------------------

Here is related log snippet:
{noformat}
2011-06-29 16:39:54,326 DEBUG
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region
gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
on hadoop1-s05.farm-ny.gigya.com,60020,1307349217076
2011-06-29 16:40:00,598 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x13004a31d7804c4 Creating (or updating) unassigned node for
584dac5cc70d8682f71c4675a843c309 with OFFLINE state
2011-06-29 16:40:00,877 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: No previous transition
plan was found (or we are ignoring an existing plan) for
gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
so generated a random one;
hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.,
src=, dest=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076; 5 (online=5,
exclude=null) available servers
{noformat}
the log indicates that the following was executed:
{code}
  private void assign(final RegionState state, final boolean setOfflineInZK,
      final boolean forceNewPlan) {
    for (int i = 0; i < this.maximumAssignmentAttempts; i++) {
      if (setOfflineInZK && !setOfflineInZooKeeper(state)) return;
{code}
The above would have been called from either:
{code}
  public void assign(HRegionInfo region, boolean setOfflineInZK) {
    assign(region, setOfflineInZK, false);
  }
{code}
or TimeoutMonitor.chore()

> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>             Fix For: 0.92.0
>
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.
> For more background information, see 'Errors after major compaction' discussion on user@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4060) Making region assignment more robust

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060183#comment-13060183 ] 

Jonathan Gray commented on HBASE-4060:
--------------------------------------

Andrew, we are already doing something like what you describe.  It seems the issue is what Ted describes in #2 but it's not clear to me how this bug is being triggered.

In TimeoutMonitor, we attempt to do an atomic change of state from OPENING to OFFLINE.  If this fails, we don't do anything.  If it succeeds, we attempt to do a reassign.

In OpenRegionHandler (in the RS), we attempt an atomic change of state from OPENING to OPENED.  If this fails, we roll back our open.  If it succeeds, we are opened and the node is at OPENED.

In OpenedRegionHandler (in the master), the first thing we do is delete a node but only if in OPENED state.  If the TimeoutMonitor had done anything, it would have switched the state to OFFLINE.


What am I missing?

> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>             Fix For: 0.92.0
>
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.
> For more background information, see 'Errors after major compaction' discussion on user@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4060) Making region assignment more robust

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan reassigned HBASE-4060:
---------------------------------------------

    Assignee: ramkrishna.s.vasudevan

> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.94.0
>
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.
> For more background information, see 'Errors after major compaction' discussion on user@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4060) Making region assignment more robust

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-4060:
----------------------------------

    Affects Version/s:     (was: 0.90.3)
        Fix Version/s: 0.92.0

Pulling into 0.92. We can push it out as determined by the RM.

Some initial steps:

- After opening a region the RegionServer should claim ownership of the region in ZooKeeper with an ephemeral znode. If the znode already exists, refuse to open the region. (Pardon if we already do something like this... but then it seems to be not working correctly.)

- In the master, watch for double assignments in the region lists reported by the RegionServers when they check in. If a double assignment is observed, issue close commands to both RS ASAP and yell about it in the logs.

> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>             Fix For: 0.92.0
>
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.
> For more background information, see 'Errors after major compaction' discussion on user@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4060) Making region assignment more robust

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4060:
--------------------------

    Fix Version/s:     (was: 0.92.0)
                   0.94.0

> Making region assignment more robust
> ------------------------------------
>
>                 Key: HBASE-4060
>                 URL: https://issues.apache.org/jira/browse/HBASE-4060
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>             Fix For: 0.94.0
>
>
> From Eran Kutner:
> My concern is that the region allocation process seems to rely too much on
> timing considerations and doesn't seem to take enough measures to guarantee
> conflicts do not occur. I understand that in a distributed environment, when
> you don't get a timely response from a remote machine you can't know for
> sure if it did or did not receive the request, however there are things that
> can be done to mitigate this and reduce the conflict time significantly. For
> example, when I run dbck it knows that some regions are multiply assigned,
> the master could do the same and try to resolve the conflict. Another
> approach would be to handle late responses, even if the response from the
> remote machine arrives after it was assumed to be dead the master should
> have enough information to know it had created a conflict by assigning the
> region to another server. An even better solution, I think, is for the RS to
> periodically test that it is indeed the rightful owner of every region it
> holds and relinquish control over the region if it's not.
> Obviously a state where two RSs hold the same region is pathological and can
> lead to data loss, as demonstrated in my case. The system should be able to
> actively protect itself against such a scenario. It probably doesn't need
> saying but there is really nothing worse for a data storage system than data
> loss.
> In my case the problem didn't happen in the initial phase but after
> disabling and enabling a table with about 12K regions.
> For more background information, see 'Errors after major compaction' discussion on user@hbase.apache.org

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira