You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Tak-Lon (Stephen) Wu" <ta...@apache.org> on 2020/08/14 05:16:35 UTC

[DISCUSS] HBASE-24833 and HBASE-24471 if meta directory can skip deletion during bootstrap

Hi guys,

Sorry to bother everyone, but we need some help on this discussion
about a recent change in HBASE-24471 that adds a new state
`INIT_META_WRITE_FS_LAYOUT` to InitMetaProcedure. Within the state, it
introduces a new logic to remove the meta directory if it exists.

  private static void writeFsLayout(Path rootDir, Configuration conf)
throws IOException {
   LOG.info("BOOTSTRAP: creating hbase:meta region");
   FileSystem fs = rootDir.getFileSystem(conf);
   Path tableDir = CommonFSUtils.getTableDir(rootDir,
TableName.META_TABLE_NAME);
   if (fs.exists(tableDir) && !fs.delete(tableDir, true)) {
     LOG.warn("Can not delete partial created meta table, continue...");
   }

HBASE-24471 is an incompatible change as mentioned in release note, if
a HM restarts and hit into InitMetaProcedure#INIT_META_WRITE_FS_LAYOUT
, it considers the meta is `partial` and it should be deleted even if
the meta may not be partial (however, we cannot tell from the HFiles
or table data itself if the table is partial or inconsistent).

So, I’m wondering if we can keep the meta without deleting it, or
leave it to repair action if any inconsistency happens after the meta
bootstrap , e.g. using HBCK.

Apologize in advance to Duo, and I want some ideas from a broader
audience how we can move forward from the discussion on the PR#2237

P.S. I need to be honest on our use cases, we’re restarting a cluster
on a fresh ZK data (the cloud use cases that restarting on no ZK and
WAL but only HFiles) that will lead into resubmitting
InitMetaProcedure and triggers the first state of
INIT_META_WRITE_FS_LAYOUT that deletes the meta. As such we’re
suffering from the other side that even if the meta direcotry has the
right data content, we need to rebuild it.

Related JIRAs
* https://issues.apache.org/jira/browse/HBASE-24471
* https://issues.apache.org/jira/browse/HBASE-24833

Related PRs
* PR#1806, https://github.com/apache/hbase/commit/4d5efec76718032a1e55024fd5133409e4be3cb8#
* PR#2237 still in progress of discussion,
https://github.com/apache/hbase/pull/2237



Thanks,
Stephen

Re: [DISCUSS] HBASE-24833 and HBASE-24471 if meta directory can skip deletion during bootstrap

Posted by "Tak-Lon (Stephen) Wu" <ta...@gmail.com>.
I replied on PR#2237, but let me write down here as well.

Thanks Duo, I agreed with you on the meta table inconsistency with the
ZNode because we cannot find the last server host on the ZNode and the
meta region is offline, then an InitMetaProcedure was submitted.
(rewording from your comments and thanks for pointing out in the PR).
Although I was thinking not throwing exception and continue the meta
bootstrap, your inconsistency concern between ZNode and HFiles make
senses.

Interestingly, after another night of thinking, I found that in
HBASE-24388 moves the server location of the meta table to the master
region, it seems that solves our conflicts of interesting that
InitMetaProcedure should not be entered with master region, and meta
will not be deleted.

But before the completion of splittable meta in HBASE-11288 (or even
with it), adding an exception should be protecting the cluster from
deleting the big meta (if there are any corner cases).

Thanks again, have a good weekend.


-Stephen


On Thu, Aug 13, 2020 at 10:48 PM 张铎(Duo Zhang) <pa...@gmail.com> wrote:
>
> I'm +1 on adding a check to see if the meta region is really empty or
> partial. If it is not, just leave the meta region there and let the users
> use HBCK to fix the inconsistency, as we should not schedule
> InitMetaProcedure if the meta has already been initialized.
>
> Thanks.
>
> Tak-Lon (Stephen) Wu <ta...@apache.org> 于2020年8月14日周五 下午1:16写道:
>
> > Hi guys,
> >
> > Sorry to bother everyone, but we need some help on this discussion
> > about a recent change in HBASE-24471 that adds a new state
> > `INIT_META_WRITE_FS_LAYOUT` to InitMetaProcedure. Within the state, it
> > introduces a new logic to remove the meta directory if it exists.
> >
> >   private static void writeFsLayout(Path rootDir, Configuration conf)
> > throws IOException {
> >    LOG.info("BOOTSTRAP: creating hbase:meta region");
> >    FileSystem fs = rootDir.getFileSystem(conf);
> >    Path tableDir = CommonFSUtils.getTableDir(rootDir,
> > TableName.META_TABLE_NAME);
> >    if (fs.exists(tableDir) && !fs.delete(tableDir, true)) {
> >      LOG.warn("Can not delete partial created meta table, continue...");
> >    }
> >
> > HBASE-24471 is an incompatible change as mentioned in release note, if
> > a HM restarts and hit into InitMetaProcedure#INIT_META_WRITE_FS_LAYOUT
> > , it considers the meta is `partial` and it should be deleted even if
> > the meta may not be partial (however, we cannot tell from the HFiles
> > or table data itself if the table is partial or inconsistent).
> >
> > So, I’m wondering if we can keep the meta without deleting it, or
> > leave it to repair action if any inconsistency happens after the meta
> > bootstrap , e.g. using HBCK.
> >
> > Apologize in advance to Duo, and I want some ideas from a broader
> > audience how we can move forward from the discussion on the PR#2237
> >
> > P.S. I need to be honest on our use cases, we’re restarting a cluster
> > on a fresh ZK data (the cloud use cases that restarting on no ZK and
> > WAL but only HFiles) that will lead into resubmitting
> > InitMetaProcedure and triggers the first state of
> > INIT_META_WRITE_FS_LAYOUT that deletes the meta. As such we’re
> > suffering from the other side that even if the meta direcotry has the
> > right data content, we need to rebuild it.
> >
> > Related JIRAs
> > * https://issues.apache.org/jira/browse/HBASE-24471
> > * https://issues.apache.org/jira/browse/HBASE-24833
> >
> > Related PRs
> > * PR#1806,
> > https://github.com/apache/hbase/commit/4d5efec76718032a1e55024fd5133409e4be3cb8#
> > * PR#2237 still in progress of discussion,
> > https://github.com/apache/hbase/pull/2237
> >
> >
> >
> > Thanks,
> > Stephen
> >

Re: [DISCUSS] HBASE-24833 and HBASE-24471 if meta directory can skip deletion during bootstrap

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
I'm +1 on adding a check to see if the meta region is really empty or
partial. If it is not, just leave the meta region there and let the users
use HBCK to fix the inconsistency, as we should not schedule
InitMetaProcedure if the meta has already been initialized.

Thanks.

Tak-Lon (Stephen) Wu <ta...@apache.org> 于2020年8月14日周五 下午1:16写道:

> Hi guys,
>
> Sorry to bother everyone, but we need some help on this discussion
> about a recent change in HBASE-24471 that adds a new state
> `INIT_META_WRITE_FS_LAYOUT` to InitMetaProcedure. Within the state, it
> introduces a new logic to remove the meta directory if it exists.
>
>   private static void writeFsLayout(Path rootDir, Configuration conf)
> throws IOException {
>    LOG.info("BOOTSTRAP: creating hbase:meta region");
>    FileSystem fs = rootDir.getFileSystem(conf);
>    Path tableDir = CommonFSUtils.getTableDir(rootDir,
> TableName.META_TABLE_NAME);
>    if (fs.exists(tableDir) && !fs.delete(tableDir, true)) {
>      LOG.warn("Can not delete partial created meta table, continue...");
>    }
>
> HBASE-24471 is an incompatible change as mentioned in release note, if
> a HM restarts and hit into InitMetaProcedure#INIT_META_WRITE_FS_LAYOUT
> , it considers the meta is `partial` and it should be deleted even if
> the meta may not be partial (however, we cannot tell from the HFiles
> or table data itself if the table is partial or inconsistent).
>
> So, I’m wondering if we can keep the meta without deleting it, or
> leave it to repair action if any inconsistency happens after the meta
> bootstrap , e.g. using HBCK.
>
> Apologize in advance to Duo, and I want some ideas from a broader
> audience how we can move forward from the discussion on the PR#2237
>
> P.S. I need to be honest on our use cases, we’re restarting a cluster
> on a fresh ZK data (the cloud use cases that restarting on no ZK and
> WAL but only HFiles) that will lead into resubmitting
> InitMetaProcedure and triggers the first state of
> INIT_META_WRITE_FS_LAYOUT that deletes the meta. As such we’re
> suffering from the other side that even if the meta direcotry has the
> right data content, we need to rebuild it.
>
> Related JIRAs
> * https://issues.apache.org/jira/browse/HBASE-24471
> * https://issues.apache.org/jira/browse/HBASE-24833
>
> Related PRs
> * PR#1806,
> https://github.com/apache/hbase/commit/4d5efec76718032a1e55024fd5133409e4be3cb8#
> * PR#2237 still in progress of discussion,
> https://github.com/apache/hbase/pull/2237
>
>
>
> Thanks,
> Stephen
>