You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Tak-Lon (Stephen) Wu (Jira)" <ji...@apache.org> on 2020/08/07 18:11:00 UTC

[jira] [Updated] (HBASE-24833) Bootstrap should not delete the META table directory if it's not partial

     [ https://issues.apache.org/jira/browse/HBASE-24833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tak-Lon (Stephen) Wu updated HBASE-24833:
-----------------------------------------
    Description: 
this issues were discussed in [PR#2113|https://github.com/apache/hbase/pull/2113] as part of HBASE-24286, and it is a dependencies before we solve HBASE-24286.

The changes were introduced in [HBASE-24471 |https://github.com/apache/hbase/commit/4d5efec76718032a1e55024fd5133409e4be3cb8#diff-21659161b1393e6632730dcbea205fd8R70-R89] that partial meta was introduced and `partial` was defined as InitMetaProcedure did not succeed and INIT_META_ASSIGN_META was not completed.
{code:java}
  private static void writeFsLayout(Path rootDir, Configuration conf) throws IOException { 
   LOG.info("BOOTSTRAP: creating hbase:meta region"); 
   FileSystem fs = rootDir.getFileSystem(conf); 
   Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.META_TABLE_NAME); 
   if (fs.exists(tableDir) && !fs.delete(tableDir, true)) { 
     LOG.warn("Can not delete partial created meta table, continue..."); 
   }

{code}
however, in the cloud use case where HFiles store on S3, WALs store on HDFS, ZK data are stored within the cluster, this partial meta becomes a block when cluster recreate on existing HFiles; Here, Zk data and WALs cannot be retained (HDFS was associated with cloud instance and was terminated together) when cluster recreates on the flushed HFiles, and existing meta are always considered as partial and deleted in `INIT_META_WRITE_FS_LAYOUT` during bootstrap. As a result, the recreate cluster starts with a empty meta table, either the cluster hangs during the master initialization (branch-2) because table states of namespace table cannot be assigned, or starts as a fresh cluster without any region assigned and table opens (may need HBCK to rebuild the meta).

Potential solution suggested by Anoop
{quote}In case of HM start and the bootstrap we create the ClusterID and write to FS and then to zk and then create the META table FS layout. So in a cluster recreate, we will see clusterID is there in FS and also the META FS layout but no clusterID in zk. Ya seems we can use this as indication for cluster recreate over existing data. In HM start, this is some thing we need to check at 1st itself and track. If this mode is true, later when (if) we do INIT_META_WRITE_FS_LAYOUT , we should not delete the META dir. As part of the Bootstrap when we write that proc to MasterProcWal, we can include this mode (boolean) info also. This is a protobuf message anyways. So even if this HM got killed and restarted (at a point where the clusterId was written to zk but the Meta FS layout part was not reached) we can use the info added as part of the bootstrap wal entry and make sure NOT to delete the meta dir.
{quote}
In this JIRA, we're going to fix the `partial` definition when we found cluster ID was stored in HFiles but ZK were deleted or fresh on cluster creates.

  was:
this issues were discussed in [PR#2113|https://github.com/apache/hbase/pull/2113] as part of HBASE-24286, and it is a dependencies before we solve HBASE-24286.


The changes were introduced in [HBASE-24471 |https://github.com/apache/hbase/commit/4d5efec76718032a1e55024fd5133409e4be3cb8#diff-21659161b1393e6632730dcbea205fd8R70-R89] that partial meta was introduced and `partial` was defined as InitMetaProcedure did not succeed and INIT_META_ASSIGN_META was not completed. 

{{ private static void writeFsLayout(Path rootDir, Configuration conf) throws IOException { 
   LOG.info("BOOTSTRAP: creating hbase:meta region"); 
   FileSystem fs = rootDir.getFileSystem(conf); 
   Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.META_TABLE_NAME); 
   if (fs.exists(tableDir) && !fs.delete(tableDir, true)) { 
     LOG.warn("Can not delete partial created meta table, continue..."); 
   } }}

however, in the cloud use case where HFiles store on S3, WALs store on HDFS, ZK data are stored within the cluster. Here, Zk data and WALs cannot be retained (HDFS was associated with cloud instance and was terminated together) when cluster recreates on the flushed HFiles, and existing meta are always considered as partial and deleted in `INIT_META_WRITE_FS_LAYOUT` during bootstrap. as a result, the recreate cluster starts with a empty meta table, either the cluster hangs during the master initialization (branch-2) because table states of namespace table cannot be assigned, or starts as a fresh cluster without any region assigned and table opens (may need HBCK to rebuild the meta).  

Potential solution suggested by Anoop

bq. In case of HM start and the bootstrap we create the ClusterID and write to FS and then to zk and then create the META table FS layout. So in a cluster recreate, we will see clusterID is there in FS and also the META FS layout but no clusterID in zk. Ya seems we can use this as indication for cluster recreate over existing data. In HM start, this is some thing we need to check at 1st itself and track. If this mode is true, later when (if) we do INIT_META_WRITE_FS_LAYOUT , we should not delete the META dir. As part of the Bootstrap when we write that proc to MasterProcWal, we can include this mode (boolean) info also. This is a protobuf message anyways. So even if this HM got killed and restarted (at a point where the clusterId was written to zk but the Meta FS layout part was not reached) we can use the info added as part of the bootstrap wal entry and make sure NOT to delete the meta dir.



In this JIRA, we're going to fix the `partial` definition when we found cluster ID was stored in HFiles but ZK were deleted or fresh on cluster creates. 


> Bootstrap should not delete the META table directory if it's not partial
> ------------------------------------------------------------------------
>
>                 Key: HBASE-24833
>                 URL: https://issues.apache.org/jira/browse/HBASE-24833
>             Project: HBase
>          Issue Type: Umbrella
>            Reporter: Tak-Lon (Stephen) Wu
>            Priority: Major
>
> this issues were discussed in [PR#2113|https://github.com/apache/hbase/pull/2113] as part of HBASE-24286, and it is a dependencies before we solve HBASE-24286.
> The changes were introduced in [HBASE-24471 |https://github.com/apache/hbase/commit/4d5efec76718032a1e55024fd5133409e4be3cb8#diff-21659161b1393e6632730dcbea205fd8R70-R89] that partial meta was introduced and `partial` was defined as InitMetaProcedure did not succeed and INIT_META_ASSIGN_META was not completed.
> {code:java}
>   private static void writeFsLayout(Path rootDir, Configuration conf) throws IOException { 
>    LOG.info("BOOTSTRAP: creating hbase:meta region"); 
>    FileSystem fs = rootDir.getFileSystem(conf); 
>    Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.META_TABLE_NAME); 
>    if (fs.exists(tableDir) && !fs.delete(tableDir, true)) { 
>      LOG.warn("Can not delete partial created meta table, continue..."); 
>    }
> {code}
> however, in the cloud use case where HFiles store on S3, WALs store on HDFS, ZK data are stored within the cluster, this partial meta becomes a block when cluster recreate on existing HFiles; Here, Zk data and WALs cannot be retained (HDFS was associated with cloud instance and was terminated together) when cluster recreates on the flushed HFiles, and existing meta are always considered as partial and deleted in `INIT_META_WRITE_FS_LAYOUT` during bootstrap. As a result, the recreate cluster starts with a empty meta table, either the cluster hangs during the master initialization (branch-2) because table states of namespace table cannot be assigned, or starts as a fresh cluster without any region assigned and table opens (may need HBCK to rebuild the meta).
> Potential solution suggested by Anoop
> {quote}In case of HM start and the bootstrap we create the ClusterID and write to FS and then to zk and then create the META table FS layout. So in a cluster recreate, we will see clusterID is there in FS and also the META FS layout but no clusterID in zk. Ya seems we can use this as indication for cluster recreate over existing data. In HM start, this is some thing we need to check at 1st itself and track. If this mode is true, later when (if) we do INIT_META_WRITE_FS_LAYOUT , we should not delete the META dir. As part of the Bootstrap when we write that proc to MasterProcWal, we can include this mode (boolean) info also. This is a protobuf message anyways. So even if this HM got killed and restarted (at a point where the clusterId was written to zk but the Meta FS layout part was not reached) we can use the info added as part of the bootstrap wal entry and make sure NOT to delete the meta dir.
> {quote}
> In this JIRA, we're going to fix the `partial` definition when we found cluster ID was stored in HFiles but ZK were deleted or fresh on cluster creates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)