You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Maryann Xue (Created) (JIRA)" <ji...@apache.org> on 2012/04/19 10:35:44 UTC

[jira] [Created] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Inconsistency between the "regions" map and the "servers" map in AssignmentManager
----------------------------------------------------------------------------------

                 Key: HBASE-5829
                 URL: https://issues.apache.org/jira/browse/HBASE-5829
             Project: HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 0.92.1, 0.90.6
            Reporter: Maryann Xue


There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.

In AssignmentManager.unassign(HRegionInfo, boolean)
    try {
      // TODO: We should consider making this look more like it does for the
      // region open where we catch all throwables and never abort
      if (serverManager.sendRegionClose(server, state.getRegion(),
        versionOfClosingNode)) {
        LOG.debug("Sent CLOSE to " + server + " for region " +
          region.getRegionNameAsString());
        return;
      }
      // This never happens. Currently regionserver close always return true.
      LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
        region.getRegionNameAsString());
    } catch (NotServingRegionException nsre) {
      LOG.info("Server " + server + " returned " + nsre + " for " +
        region.getRegionNameAsString());
      // Presume that master has stale data.  Presume remote side just split.
      // Presume that the split message when it comes in will fix up the master's
      // in memory cluster state.
    } catch (Throwable t) {
      if (t instanceof RemoteException) {
        t = ((RemoteException)t).unwrapRemoteException();
        if (t instanceof NotServingRegionException) {
          if (checkIfRegionBelongsToDisabling(region)) {
            // Remove from the regionsinTransition map
            LOG.info("While trying to recover the table "
                + region.getTableNameAsString()
                + " to DISABLED state the region " + region
                + " was offlined but the table was in DISABLING state");
            synchronized (this.regionsInTransition) {
              this.regionsInTransition.remove(region.getEncodedName());
            }
            // Remove from the regionsMap
            synchronized (this.regions) {
              this.regions.remove(region);
            }
            deleteClosingOrClosedNode(region);
          }
        }
        // RS is already processing this region, only need to update the timestamp
        if (t instanceof RegionAlreadyInTransitionException) {
          LOG.debug("update " + state + " the timestamp.");
          state.update(state.getState());
        }
      }

In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
          synchronized (this.regions) {
            this.regions.put(plan.getRegionInfo(), plan.getDestination());
          }


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261353#comment-13261353 ] 

Hadoop QA commented on HBASE-5829:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12524120/HBASE-5829-trunk.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.TestRegionRebalancing

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1643//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1643//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1643//console

This message is automatically generated.
                
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1
>            Reporter: Maryann Xue
>         Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch
>
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Posted by "Maryann Xue (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maryann Xue updated HBASE-5829:
-------------------------------

    Attachment: HBASE-5829-trunk.patch
                HBASE-5829-0.90.patch

Add corresponding operations to this.servers
                
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1
>            Reporter: Maryann Xue
>         Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch
>
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261655#comment-13261655 ] 

Zhihong Yu commented on HBASE-5829:
-----------------------------------

Patch makes sense.
w.r.t. this.servers, I found a useless statement (at least in trunk):
{code}
  void unassignCatalogRegions() {
    this.servers.entrySet();
{code}
that should be removed.
                
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1
>            Reporter: Maryann Xue
>         Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch
>
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Posted by "Maryann Xue (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258074#comment-13258074 ] 

Maryann Xue commented on HBASE-5829:
------------------------------------

In AssignmentManager.unassign(HRegionInfo, boolean)
            // Remove from the regionsMap
            synchronized (this.regions) {
              this.regions.remove(region);
            }

In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
          synchronized (this.regions) {
            this.regions.put(plan.getRegionInfo(), plan.getDestination());
          }

Here, not updating/removing the region from this.servers might cause the balancer to generate incorrect region plans.
After the fix of HBASE-5563, it seems this problem won't cause endless loop of wrong balances or a region always in transition.
 
                
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1
>            Reporter: Maryann Xue
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Posted by "Maryann Xue (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maryann Xue updated HBASE-5829:
-------------------------------

    Status: Patch Available  (was: Open)
    
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.92.1, 0.90.6
>            Reporter: Maryann Xue
>         Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch
>
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13257704#comment-13257704 ] 

stack commented on HBASE-5829:
------------------------------

Please explain where the disparity between this.server and this.regions is in in the code Maryann.
                
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1
>            Reporter: Maryann Xue
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263372#comment-13263372 ] 

Hudson commented on HBASE-5829:
-------------------------------

Integrated in HBase-TRUNK-security #186 (See [https://builds.apache.org/job/HBase-TRUNK-security/186/])
    HBASE-5829 Inconsistency between the "regions" map and the "servers" map in AssignmentManager (Revision 1330993)

     Result = SUCCESS
stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java

                
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1
>            Reporter: Maryann Xue
>            Assignee: Maryann Xue
>             Fix For: 0.96.0
>
>         Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch
>
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Posted by "Maryann Xue (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261327#comment-13261327 ] 

Maryann Xue commented on HBASE-5829:
------------------------------------

@ for the second, think we should guarantee that it is also added to the map "this.servers".
                
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1
>            Reporter: Maryann Xue
>         Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch
>
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260238#comment-13260238 ] 

stack commented on HBASE-5829:
------------------------------

Do you have a patch for us Maryann?  The first at least seems legit (For the second, there is no associated server, right?)
                
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1
>            Reporter: Maryann Xue
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262793#comment-13262793 ] 

stack commented on HBASE-5829:
------------------------------

@Ted Make a new issue?
                
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1
>            Reporter: Maryann Xue
>         Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch
>
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262798#comment-13262798 ] 

Zhihong Yu commented on HBASE-5829:
-----------------------------------

The latest patch is good to go.
Useless statement can be addressed elsewhere.
                
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1
>            Reporter: Maryann Xue
>         Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch
>
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5829) Inconsistency between the "regions" map and the "servers" map in AssignmentManager

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-5829:
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.96.0
         Assignee: Maryann Xue
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

Applied to trunk.  Letting patch hang out in case someone wants to apply it to other branches.

I added you as a contributor Maryann and assigned you this issue (You can assign yourself issues going forward).  Thanks for the patch.
                
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-5829
>                 URL: https://issues.apache.org/jira/browse/HBASE-5829
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.6, 0.92.1
>            Reporter: Maryann Xue
>            Assignee: Maryann Xue
>             Fix For: 0.96.0
>
>         Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch
>
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
>     try {
>       // TODO: We should consider making this look more like it does for the
>       // region open where we catch all throwables and never abort
>       if (serverManager.sendRegionClose(server, state.getRegion(),
>         versionOfClosingNode)) {
>         LOG.debug("Sent CLOSE to " + server + " for region " +
>           region.getRegionNameAsString());
>         return;
>       }
>       // This never happens. Currently regionserver close always return true.
>       LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
>         region.getRegionNameAsString());
>     } catch (NotServingRegionException nsre) {
>       LOG.info("Server " + server + " returned " + nsre + " for " +
>         region.getRegionNameAsString());
>       // Presume that master has stale data.  Presume remote side just split.
>       // Presume that the split message when it comes in will fix up the master's
>       // in memory cluster state.
>     } catch (Throwable t) {
>       if (t instanceof RemoteException) {
>         t = ((RemoteException)t).unwrapRemoteException();
>         if (t instanceof NotServingRegionException) {
>           if (checkIfRegionBelongsToDisabling(region)) {
>             // Remove from the regionsinTransition map
>             LOG.info("While trying to recover the table "
>                 + region.getTableNameAsString()
>                 + " to DISABLED state the region " + region
>                 + " was offlined but the table was in DISABLING state");
>             synchronized (this.regionsInTransition) {
>               this.regionsInTransition.remove(region.getEncodedName());
>             }
>             // Remove from the regionsMap
>             synchronized (this.regions) {
>               this.regions.remove(region);
>             }
>             deleteClosingOrClosedNode(region);
>           }
>         }
>         // RS is already processing this region, only need to update the timestamp
>         if (t instanceof RegionAlreadyInTransitionException) {
>           LOG.debug("update " + state + " the timestamp.");
>           state.update(state.getState());
>         }
>       }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
>           synchronized (this.regions) {
>             this.regions.put(plan.getRegionInfo(), plan.getDestination());
>           }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira