You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Maryann Xue (JIRA)" <ji...@apache.org> on 2012/04/25 08:10:08 UTC
[jira] [Updated] (HBASE-5829) Inconsistency between the "regions"
map and the "servers" map in AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maryann Xue updated HBASE-5829:
-------------------------------
Attachment: HBASE-5829-trunk.patch
HBASE-5829-0.90.patch
Add corresponding operations to this.servers
> Inconsistency between the "regions" map and the "servers" map in AssignmentManager
> ----------------------------------------------------------------------------------
>
> Key: HBASE-5829
> URL: https://issues.apache.org/jira/browse/HBASE-5829
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.6, 0.92.1
> Reporter: Maryann Xue
> Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch
>
>
> There are occurrences in AM where this.servers is not kept consistent with this.regions. This might cause balancer to offline a region from the RS that already returned NotServingRegionException at a previous offline attempt.
> In AssignmentManager.unassign(HRegionInfo, boolean)
> try {
> // TODO: We should consider making this look more like it does for the
> // region open where we catch all throwables and never abort
> if (serverManager.sendRegionClose(server, state.getRegion(),
> versionOfClosingNode)) {
> LOG.debug("Sent CLOSE to " + server + " for region " +
> region.getRegionNameAsString());
> return;
> }
> // This never happens. Currently regionserver close always return true.
> LOG.warn("Server " + server + " region CLOSE RPC returned false for " +
> region.getRegionNameAsString());
> } catch (NotServingRegionException nsre) {
> LOG.info("Server " + server + " returned " + nsre + " for " +
> region.getRegionNameAsString());
> // Presume that master has stale data. Presume remote side just split.
> // Presume that the split message when it comes in will fix up the master's
> // in memory cluster state.
> } catch (Throwable t) {
> if (t instanceof RemoteException) {
> t = ((RemoteException)t).unwrapRemoteException();
> if (t instanceof NotServingRegionException) {
> if (checkIfRegionBelongsToDisabling(region)) {
> // Remove from the regionsinTransition map
> LOG.info("While trying to recover the table "
> + region.getTableNameAsString()
> + " to DISABLED state the region " + region
> + " was offlined but the table was in DISABLING state");
> synchronized (this.regionsInTransition) {
> this.regionsInTransition.remove(region.getEncodedName());
> }
> // Remove from the regionsMap
> synchronized (this.regions) {
> this.regions.remove(region);
> }
> deleteClosingOrClosedNode(region);
> }
> }
> // RS is already processing this region, only need to update the timestamp
> if (t instanceof RegionAlreadyInTransitionException) {
> LOG.debug("update " + state + " the timestamp.");
> state.update(state.getState());
> }
> }
> In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, boolean)
> synchronized (this.regions) {
> this.regions.put(plan.getRegionInfo(), plan.getDestination());
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira