You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Sean Busbey <bu...@apache.org> on 2019/10/01 09:11:39 UTC

[DISCUSS] How do we test hbck2?

I was chatting with Sakthi about automating some testing of hbck2 commands.
Nothing too fancy, I just want some assurance that they ought to work.

This got us talking about how we might purposefully break a cluster to meet
a set of symptoms that hbck2 knows how to correct. We need something
different from the chaos monkeys. in this case we're not trying to peturb
the cluster in ways we think it should handle; we're setting up a state we
already know requires an outside tool.

Where should this kind of tooling live? Main repo next to the monkeys?
Alongside hbck2 in operator tools? Somewhere else entirely?

Re: [DISCUSS] How do we test hbck2?

Posted by Sakthi <sa...@apache.org>.
Sure Peter. Created HBASE-23180 for this.

On Wed, Oct 16, 2019 at 12:33 AM Peter Somogyi <ps...@apache.org> wrote:

> Sounds good!
>
> I'd prefer to run HBCK against 2.2 latest version since that is planned to
> get the stable pointer soon. It is also fine to run the test for 2.1 and
> 2.2 HBase versions.
> Currently there is only a Yetus based pre-commit job for
> hbase-operator-tools. Similarly to the main HBase repository,
> the Jenkinsfile can be stored in hbase-operator-tools.
> In case you need access to builds.a.o for this work you can request access
> from the PMC.
>
> Peter
>
> On Wed, Oct 16, 2019 at 12:02 AM Sakthi <sa...@apache.org> wrote:
>
> > I'm planning to start working on a nightly build that can spin up a
> > mini-cluster, load some data into it, do some actions to bring the
> cluster
> > into an undesirable state that hbck2 can fix and then invoke the hbck2 to
> > see if things work well.
> >
> > Plan is to start small with one of the hbck2 commands and remaining ones
> > can be added incrementally. As of now I would like to start with making
> > sure the job uses one of the hbase versions (probably 2.1.x), we can
> > discuss about the need to run the job against all the present hbase
> > versions/taking in a bunch of hbase versions as input and running against
> > them/or just a single version.
> >
> > The job script would be located in our operator-tools repo. Let me start
> > digging into creation of a nightly job in the operator-tools (I don't
> think
> > we have any as of now). Will create a tracking jira for this. Further
> > discussions regarding this can be delegated to the jira if deemed more
> > convenient.
> >
> > -Sakthi
> >
> > On Mon, Oct 7, 2019 at 11:51 AM Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > > We need something different from the chaos monkeys. in this case
> we're
> > > not trying to peturb the cluster in ways we think it should handle;
> we're
> > > setting up a state we already know requires an outside tool.
> > >
> > > Not sure this really falls outside the framework. Add an action that
> > > invokes hbck. Then, add a policy that schedules the hbck invocation as
> > part
> > > of the schedule. That policy would also include the destructive actions
> > > breaking things in a way the tool needs to fix (or HBase can handle
> > > intrinsically...)
> > >
> > > On Tue, Oct 1, 2019 at 2:11 AM Sean Busbey <bu...@apache.org> wrote:
> > >
> > > > I was chatting with Sakthi about automating some testing of hbck2
> > > commands.
> > > > Nothing too fancy, I just want some assurance that they ought to
> work.
> > > >
> > > > This got us talking about how we might purposefully break a cluster
> to
> > > meet
> > > > a set of symptoms that hbck2 knows how to correct. We need something
> > > > different from the chaos monkeys. in this case we're not trying to
> > peturb
> > > > the cluster in ways we think it should handle; we're setting up a
> state
> > > we
> > > > already know requires an outside tool.
> > > >
> > > > Where should this kind of tooling live? Main repo next to the
> monkeys?
> > > > Alongside hbck2 in operator tools? Somewhere else entirely?
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >    - A23, Crosstalk
> > >
> >
>

Re: [DISCUSS] How do we test hbck2?

Posted by Peter Somogyi <ps...@apache.org>.
Sounds good!

I'd prefer to run HBCK against 2.2 latest version since that is planned to
get the stable pointer soon. It is also fine to run the test for 2.1 and
2.2 HBase versions.
Currently there is only a Yetus based pre-commit job for
hbase-operator-tools. Similarly to the main HBase repository,
the Jenkinsfile can be stored in hbase-operator-tools.
In case you need access to builds.a.o for this work you can request access
from the PMC.

Peter

On Wed, Oct 16, 2019 at 12:02 AM Sakthi <sa...@apache.org> wrote:

> I'm planning to start working on a nightly build that can spin up a
> mini-cluster, load some data into it, do some actions to bring the cluster
> into an undesirable state that hbck2 can fix and then invoke the hbck2 to
> see if things work well.
>
> Plan is to start small with one of the hbck2 commands and remaining ones
> can be added incrementally. As of now I would like to start with making
> sure the job uses one of the hbase versions (probably 2.1.x), we can
> discuss about the need to run the job against all the present hbase
> versions/taking in a bunch of hbase versions as input and running against
> them/or just a single version.
>
> The job script would be located in our operator-tools repo. Let me start
> digging into creation of a nightly job in the operator-tools (I don't think
> we have any as of now). Will create a tracking jira for this. Further
> discussions regarding this can be delegated to the jira if deemed more
> convenient.
>
> -Sakthi
>
> On Mon, Oct 7, 2019 at 11:51 AM Andrew Purtell <ap...@apache.org>
> wrote:
>
> > > We need something different from the chaos monkeys. in this case we're
> > not trying to peturb the cluster in ways we think it should handle; we're
> > setting up a state we already know requires an outside tool.
> >
> > Not sure this really falls outside the framework. Add an action that
> > invokes hbck. Then, add a policy that schedules the hbck invocation as
> part
> > of the schedule. That policy would also include the destructive actions
> > breaking things in a way the tool needs to fix (or HBase can handle
> > intrinsically...)
> >
> > On Tue, Oct 1, 2019 at 2:11 AM Sean Busbey <bu...@apache.org> wrote:
> >
> > > I was chatting with Sakthi about automating some testing of hbck2
> > commands.
> > > Nothing too fancy, I just want some assurance that they ought to work.
> > >
> > > This got us talking about how we might purposefully break a cluster to
> > meet
> > > a set of symptoms that hbck2 knows how to correct. We need something
> > > different from the chaos monkeys. in this case we're not trying to
> peturb
> > > the cluster in ways we think it should handle; we're setting up a state
> > we
> > > already know requires an outside tool.
> > >
> > > Where should this kind of tooling live? Main repo next to the monkeys?
> > > Alongside hbck2 in operator tools? Somewhere else entirely?
> > >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >    - A23, Crosstalk
> >
>

Re: [DISCUSS] How do we test hbck2?

Posted by Sakthi <sa...@apache.org>.
I'm planning to start working on a nightly build that can spin up a
mini-cluster, load some data into it, do some actions to bring the cluster
into an undesirable state that hbck2 can fix and then invoke the hbck2 to
see if things work well.

Plan is to start small with one of the hbck2 commands and remaining ones
can be added incrementally. As of now I would like to start with making
sure the job uses one of the hbase versions (probably 2.1.x), we can
discuss about the need to run the job against all the present hbase
versions/taking in a bunch of hbase versions as input and running against
them/or just a single version.

The job script would be located in our operator-tools repo. Let me start
digging into creation of a nightly job in the operator-tools (I don't think
we have any as of now). Will create a tracking jira for this. Further
discussions regarding this can be delegated to the jira if deemed more
convenient.

-Sakthi

On Mon, Oct 7, 2019 at 11:51 AM Andrew Purtell <ap...@apache.org> wrote:

> > We need something different from the chaos monkeys. in this case we're
> not trying to peturb the cluster in ways we think it should handle; we're
> setting up a state we already know requires an outside tool.
>
> Not sure this really falls outside the framework. Add an action that
> invokes hbck. Then, add a policy that schedules the hbck invocation as part
> of the schedule. That policy would also include the destructive actions
> breaking things in a way the tool needs to fix (or HBase can handle
> intrinsically...)
>
> On Tue, Oct 1, 2019 at 2:11 AM Sean Busbey <bu...@apache.org> wrote:
>
> > I was chatting with Sakthi about automating some testing of hbck2
> commands.
> > Nothing too fancy, I just want some assurance that they ought to work.
> >
> > This got us talking about how we might purposefully break a cluster to
> meet
> > a set of symptoms that hbck2 knows how to correct. We need something
> > different from the chaos monkeys. in this case we're not trying to peturb
> > the cluster in ways we think it should handle; we're setting up a state
> we
> > already know requires an outside tool.
> >
> > Where should this kind of tooling live? Main repo next to the monkeys?
> > Alongside hbck2 in operator tools? Somewhere else entirely?
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Re: [DISCUSS] How do we test hbck2?

Posted by Andrew Purtell <ap...@apache.org>.
> We need something different from the chaos monkeys. in this case we're
not trying to peturb the cluster in ways we think it should handle; we're
setting up a state we already know requires an outside tool.

Not sure this really falls outside the framework. Add an action that
invokes hbck. Then, add a policy that schedules the hbck invocation as part
of the schedule. That policy would also include the destructive actions
breaking things in a way the tool needs to fix (or HBase can handle
intrinsically...)

On Tue, Oct 1, 2019 at 2:11 AM Sean Busbey <bu...@apache.org> wrote:

> I was chatting with Sakthi about automating some testing of hbck2 commands.
> Nothing too fancy, I just want some assurance that they ought to work.
>
> This got us talking about how we might purposefully break a cluster to meet
> a set of symptoms that hbck2 knows how to correct. We need something
> different from the chaos monkeys. in this case we're not trying to peturb
> the cluster in ways we think it should handle; we're setting up a state we
> already know requires an outside tool.
>
> Where should this kind of tooling live? Main repo next to the monkeys?
> Alongside hbck2 in operator tools? Somewhere else entirely?
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSS] How do we test hbck2?

Posted by Sakthi <sa...@apache.org>.
Thanks for starting the discussion Sean! Would really like to know what do
folks think about this. I think most of the magic of our hbck tool is left
un-appreciated because of the lack of the proof of correctness that we can
provide along in the form of a constructive “destruction” tool that can be
standalone one or something that takes in a cluster id/zk quorum and do the
same on the cluster.

While trying to test out our operator tools rc, this is one of the friction
that I faced, which I think many other enthusiasts would have probably
faced.

I think for starters, there could be a doc that could just list out the
steps for each of our hbck commands that would bring the cluster in a state
from where hbck could take it further! A tool to follow up would be a great
addition.

-Sakthi

On Tue, Oct 1, 2019 at 2:11 AM Sean Busbey <bu...@apache.org> wrote:

> I was chatting with Sakthi about automating some testing of hbck2 commands.
> Nothing too fancy, I just want some assurance that they ought to work.
>
> This got us talking about how we might purposefully break a cluster to meet
> a set of symptoms that hbck2 knows how to correct. We need something
> different from the chaos monkeys. in this case we're not trying to peturb
> the cluster in ways we think it should handle; we're setting up a state we
> already know requires an outside tool.
>
> Where should this kind of tooling live? Main repo next to the monkeys?
> Alongside hbck2 in operator tools? Somewhere else entirely?
>