You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Nihal Jain (JIRA)" <ji...@apache.org> on 2019/01/18 13:30:00 UTC
[jira] [Commented] (HBASE-21644) Modify table procedure runs infinitely for a table having region replication > 1

    [ https://issues.apache.org/jira/browse/HBASE-21644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746281#comment-16746281 ] 

Nihal Jain commented on HBASE-21644:
------------------------------------

bq. Maybe a possible way is that, we read the max sequence id for the primary region when opening a secondary region replica
I was checking how next seq num is calculated for default replica vs secondary replica.

From what I understood:
 * Default replica takes the max of:
 ## Highest sequenceId found out in a Store
 ## Sequence id of the last edit added to this region out of the recovered edits log or {{minSeqId}} if nothing added from editlogs.
 ## Max sequence id which is stored in the region directory. -1 if none.
 * Secondary replica takes the max of:
 ## Highest sequenceId found out in a Store
 ## Max sequence id which is stored in the region directory. -1 if none.

Point to note is even though in case of secondary replica we try to get max seq id from region dir; but the region directory that we pass will never exist. Instead if we use the region directory of the primary, then we can set the next seq id of secondary replica to be same as {{max of seq id from (store, regiondir of primary replica)}} and hence break the loop, since highest seq id of primary region dir will always increase.

So, what you suggested works if we made the change the way we do {{getWALRegionDir}}  for secondary replicas
{code:java}

    if (regionDir == null) {
      RegionInfo regionInfo = getRegionInfo();
      if (!RegionReplicaUtil.isDefaultReplica(regionInfo)) {
        regionInfo = RegionReplicaUtil.getRegionInfoForDefaultReplica(regionInfo);
      }
      regionDir = FSUtils.getWALRegionDir(conf, regionInfo.getTable(), regionInfo.getEncodedName());
    }

{code}

bq.  when we want to reopen a secondary replicas, we should always schedule a reopen region procedure for its primary region,
I think eventually (may  be after some retries) all secondary replica will re-open once we are done with primary replica (as it will update the seq number in region dir). If we want to make sure it re-opens in first try, we may take care of that in separate JIRA and provide this temporary fix. What do you suggest?

> Modify table procedure runs infinitely for a table having region replication > 1
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-21644
>                 URL: https://issues.apache.org/jira/browse/HBASE-21644
>             Project: HBase
>          Issue Type: Bug
>          Components: Admin
>    Affects Versions: 3.0.0, 2.1.1, 2.1.2
>            Reporter: Nihal Jain
>            Assignee: Nihal Jain
>            Priority: Critical
>         Attachments: HBASE-21644.master.001.patch, HBASE-21644.master.UT.patch
>
>
> *Steps to reproduce*
>  # Create a table with region replication set to a value greater than 1
>  # Modify any of the table properties, say max file size
> *Expected Result*
>  The modify table should succeed and run to completion.
> *Actual Result*
>  The modify table keep running infinitely
> *Analysis/Issue*
>  The problem occurs due to inifinitely looping between states {{REOPEN_TABLE_REGIONS_REOPEN_REGIONS}} and {{REOPEN_TABLE_REGIONS_CONFIRM_REOPENED}} of {{ReopenTableRegionsProcedure}}, called as part of {{ModifyTableProcedure}}.
> *Consequences*
>  For a table having region replicas:
>  - Any modify table operation fails to complete
>  - Also, enable table replication fails to complete as it is unable to change the replication scope of the table in source cluster



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)