You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Josh Elser <el...@apache.org> on 2017/11/08 04:30:53 UTC

[DISCUSS] Plan to avoid backup/restore removal from 2.0

Folks,

I've been working with Vlad and Ted offline to make sure we have a plan 
that addresses the implementation gaps Vlad sees and the 
barriers-for-entry previously stated to keep the feature in HBase 2.0. 
My hope is that this can be an honest discussion given 2.0-beta 
timelines, with a concrete action plan. I'm trying my best to not 
re-hash the logic/reasoning/caveats behind previous concerns; anything 
folks feel is a blocker that I haven't covered below is unintentional.

The list:

1. Documentation. It must be updated and committed, ensuring it covers 
the details operators/architects need to know to use it effectively 
(HBASE-16574). Vlad will help with content, myself and/or Frank will get 
it updated to asciidoc.

2. Distributed testing missing. Vlad has taken my previous document on 
goals and translated that into an implementation outline[1]. Ted and I 
have already weighed in -- I believe it hits the salient points for the 
quality of testing we're looking for. I'll get started on this while 
Vlad does #4 (after consensus on approach, of course). Needs JIRA issue 
(maybe?).

3. Operator utility to verify backups. In abstract, this should just be 
the same guts of a tool like VerifyReplication. In practice, this should 
be the same code that #3 uses (if not _actually_ the same guts as 
VerifyReplication). The hope is that this will be encapsulated 
(time-wise) by #3. Needs JIRA issue (maybe?).

4. Polish DistCP for bulk-loaded files/fault-tolerance (HBASE-17852). I 
don't have specifics here -- will rely on Vlad to correct me if there's 
a better JIRA issue to track than the aforementioned. Will rely on 
details to show up the JIRA issue to track it.

Current due dates:

1. End of week (2017/11/10)
2. Before US Thanksgiving (2017/11/22)
3. Same as #2
4. Same as #1

My current thought is that this is reasonable for implementation times, 
and would not derail the rest of the beta-1 train. I appreciate the 
patience from all parties, and I hope that those trying to make this 
better can find a little more time to give some feedback. Thanks for the 
long read if nothing else.

- Josh

[1] 
https://docs.google.com/document/d/1xbPlLKjOcPq2LDqjbSkF6uNDAG0mzgOxek6P3POLeMc/edit?usp=sharing

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Josh Elser <el...@apache.org>.
On 11/8/17 1:26 PM, Andrew Purtell wrote:
> I won't speak to the timing aspects of this, that's up to the RM, but the
> testing details look reasonable to me.

Understood and agree. Thanks for your input!

  With respect to chaos testing, the
> following goals would be good:
> 
> - Some backups and restores succeed even with masters and RSes going up and
> down. The resiliency can always be improved later, but we can't rely on no
> failures for entire duration of backup or restore operation to get a good
> result, especially for restore.

Yup! The expectation (if not explicitly stated) would be that we would 
work our way up to the ServerKilling monkey. The expectation is that 
this would be trivial to implement - IntegrationTestBase would wire it 
up for us.

> - Backups are not corrupted by failures. Or, corrupted (partial?) backups
> are identified and ignored and there are still good backups remaining which
> can be used for restore.
> 
> - When the verification tool says a backup and restore are good, they
> really are.

/me nods. Agreed.

I think we'll learn a bit about failure situations (doc intentionally 
avoided defining problems/solution) and the problems we see will help 
shape what the solutions we need to make are.

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Andrew Purtell <ap...@apache.org>.
I won't speak to the timing aspects of this, that's up to the RM, but the
testing details look reasonable to me. With respect to chaos testing, the
following goals would be good:

- Some backups and restores succeed even with masters and RSes going up and
down. The resiliency can always be improved later, but we can't rely on no
failures for entire duration of backup or restore operation to get a good
result, especially for restore.

- Backups are not corrupted by failures. Or, corrupted (partial?) backups
are identified and ignored and there are still good backups remaining which
can be used for restore.

- When the verification tool says a backup and restore are good, they
really are.



On Tue, Nov 7, 2017 at 8:30 PM, Josh Elser <el...@apache.org> wrote:

> Folks,
>
> I've been working with Vlad and Ted offline to make sure we have a plan
> that addresses the implementation gaps Vlad sees and the barriers-for-entry
> previously stated to keep the feature in HBase 2.0. My hope is that this
> can be an honest discussion given 2.0-beta timelines, with a concrete
> action plan. I'm trying my best to not re-hash the logic/reasoning/caveats
> behind previous concerns; anything folks feel is a blocker that I haven't
> covered below is unintentional.
>
> The list:
>
> 1. Documentation. It must be updated and committed, ensuring it covers the
> details operators/architects need to know to use it effectively
> (HBASE-16574). Vlad will help with content, myself and/or Frank will get it
> updated to asciidoc.
>
> 2. Distributed testing missing. Vlad has taken my previous document on
> goals and translated that into an implementation outline[1]. Ted and I have
> already weighed in -- I believe it hits the salient points for the quality
> of testing we're looking for. I'll get started on this while Vlad does #4
> (after consensus on approach, of course). Needs JIRA issue (maybe?).
>
> 3. Operator utility to verify backups. In abstract, this should just be
> the same guts of a tool like VerifyReplication. In practice, this should be
> the same code that #3 uses (if not _actually_ the same guts as
> VerifyReplication). The hope is that this will be encapsulated (time-wise)
> by #3. Needs JIRA issue (maybe?).
>
> 4. Polish DistCP for bulk-loaded files/fault-tolerance (HBASE-17852). I
> don't have specifics here -- will rely on Vlad to correct me if there's a
> better JIRA issue to track than the aforementioned. Will rely on details to
> show up the JIRA issue to track it.
>
> Current due dates:
>
> 1. End of week (2017/11/10)
> 2. Before US Thanksgiving (2017/11/22)
> 3. Same as #2
> 4. Same as #1
>
> My current thought is that this is reasonable for implementation times,
> and would not derail the rest of the beta-1 train. I appreciate the
> patience from all parties, and I hope that those trying to make this better
> can find a little more time to give some feedback. Thanks for the long read
> if nothing else.
>
> - Josh
>
> [1] https://docs.google.com/document/d/1xbPlLKjOcPq2LDqjbSkF6uND
> AG0mzgOxek6P3POLeMc/edit?usp=sharing
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Mike Drob <md...@apache.org>.
I think it's important to have some _external_ method to verify
correctness. There is a lot of space between the system thinks it is
correct and the user is confident that it is correct.

We don't intentionally set out to lose data, and yet it still happens
sometimes in HBase. It's good that we have things like VerifyReplication
though, to reassure us that things are proper.

On Fri, Dec 1, 2017 at 1:04 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:

> Thanks, Mike
>
> #1 is done
> #4 is done, but not committed yet
> #3 is questionable to say the least. All B&R tools provide guarantee of
> operation correctness if operation succeeds. Otherwise, what is the point
>      of separate Fault-tolerance work? FT includes correctness guarantee as
> well, we track all the needed WAL files or bulk-loaded files during
> incremental backup
>      and guarantee that every single file will be converted and moved to
> backup destination.If you need additional guarantee - restore backup into
> separate table and do verification yourself.
>
> #2 is ongoing
>
> On Fri, Dec 1, 2017 at 10:30 AM, Mike Drob <md...@apache.org> wrote:
>
> > The list is what Josh proposed in the original email to the list.
> >
> > What is the JIRA for #3?
> >
> > On Fri, Dec 1, 2017 at 12:20 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> > >
> > wrote:
> >
> > > Where did you get this from, Stack?
> > >
> > > I am doing scale testing now and this is last task on *my* list for
> > beta-1.
> > >
> > > On Thu, Nov 30, 2017 at 10:27 PM, Stack <st...@duboce.net> wrote:
> > >
> > > > On Tue, Nov 7, 2017 at 8:30 PM, Josh Elser <el...@apache.org>
> wrote:
> > > >
> > > > > Folks,
> > > > >
> > > > > I've been working with Vlad and Ted offline to make sure we have a
> > plan
> > > > > that addresses the implementation gaps Vlad sees and the
> > > > barriers-for-entry
> > > > > previously stated to keep the feature in HBase 2.0. My hope is that
> > > this
> > > > > can be an honest discussion given 2.0-beta timelines, with a
> concrete
> > > > > action plan. I'm trying my best to not re-hash the
> > > > logic/reasoning/caveats
> > > > > behind previous concerns; anything folks feel is a blocker that I
> > > haven't
> > > > > covered below is unintentional.
> > > > >
> > > > > The list:
> > > > >
> > > > > 1. Documentation. It must be updated and committed, ensuring it
> > covers
> > > > the
> > > > > details operators/architects need to know to use it effectively
> > > > > (HBASE-16574). Vlad will help with content, myself and/or Frank
> will
> > > get
> > > > it
> > > > > updated to asciidoc.
> > > > >
> > > > > 2. Distributed testing missing. Vlad has taken my previous document
> > on
> > > > > goals and translated that into an implementation outline[1]. Ted
> and
> > I
> > > > have
> > > > > already weighed in -- I believe it hits the salient points for the
> > > > quality
> > > > > of testing we're looking for. I'll get started on this while Vlad
> > does
> > > #4
> > > > > (after consensus on approach, of course). Needs JIRA issue
> (maybe?).
> > > > >
> > > > > 3. Operator utility to verify backups. In abstract, this should
> just
> > be
> > > > > the same guts of a tool like VerifyReplication. In practice, this
> > > should
> > > > be
> > > > > the same code that #3 uses (if not _actually_ the same guts as
> > > > > VerifyReplication). The hope is that this will be encapsulated
> > > > (time-wise)
> > > > > by #3. Needs JIRA issue (maybe?).
> > > > >
> > > > > 4. Polish DistCP for bulk-loaded files/fault-tolerance
> > (HBASE-17852). I
> > > > > don't have specifics here -- will rely on Vlad to correct me if
> > > there's a
> > > > > better JIRA issue to track than the aforementioned. Will rely on
> > > details
> > > > to
> > > > > show up the JIRA issue to track it.
> > > > >
> > > > > Current due dates:
> > > > >
> > > > >
> > > > Checking in on the plan.
> > > >
> > > >
> > > > > 1. End of week (2017/11/10)
> > > > >
> > > >
> > > > I believe this is done.
> > > >
> > > >
> > > > > 2. Before US Thanksgiving (2017/11/22)
> > > > > 3. Same as #2
> > > > > 4. Same as #1
> > > > >
> > > > >
> > > > These were not done in time for thanksgiving? Correct me if I'm
> wrong.
> > > >
> > > > Thanks,
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > > My current thought is that this is reasonable for implementation
> > times,
> > > > > and would not derail the rest of the beta-1 train. I appreciate the
> > > > > patience from all parties, and I hope that those trying to make
> this
> > > > better
> > > > > can find a little more time to give some feedback. Thanks for the
> > long
> > > > read
> > > > > if nothing else.
> > > > >
> > > > > - Josh
> > > > >
> > > > > [1] https://docs.google.com/document/d/1xbPlLKjOcPq2LDqjbSkF6uND
> > > > > AG0mzgOxek6P3POLeMc/edit?usp=sharing
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Stack <st...@duboce.net>.
On Fri, Dec 1, 2017 at 11:32 AM, Stack <st...@duboce.net> wrote:

> ...
> Thanks all who participated trying to land this feature,
>


Just in case I give the wrong impression, the feature is in master branch
still under development there. Come hbase-2.1.0 (or later), if demonstrable
progress on the outstanding 3 points, lets revisit whether to include.
Thanks,
St.Ack




> St.Ack
>
>
>
>> On Fri, Dec 1, 2017 at 10:30 AM, Mike Drob <md...@apache.org> wrote:
>>
>> > The list is what Josh proposed in the original email to the list.
>> >
>> > What is the JIRA for #3?
>> >
>> > On Fri, Dec 1, 2017 at 12:20 PM, Vladimir Rodionov <
>> vladrodionov@gmail.com
>> > >
>> > wrote:
>> >
>> > > Where did you get this from, Stack?
>> > >
>> > > I am doing scale testing now and this is last task on *my* list for
>> > beta-1.
>> > >
>> > > On Thu, Nov 30, 2017 at 10:27 PM, Stack <st...@duboce.net> wrote:
>> > >
>> > > > On Tue, Nov 7, 2017 at 8:30 PM, Josh Elser <el...@apache.org>
>> wrote:
>> > > >
>> > > > > Folks,
>> > > > >
>> > > > > I've been working with Vlad and Ted offline to make sure we have a
>> > plan
>> > > > > that addresses the implementation gaps Vlad sees and the
>> > > > barriers-for-entry
>> > > > > previously stated to keep the feature in HBase 2.0. My hope is
>> that
>> > > this
>> > > > > can be an honest discussion given 2.0-beta timelines, with a
>> concrete
>> > > > > action plan. I'm trying my best to not re-hash the
>> > > > logic/reasoning/caveats
>> > > > > behind previous concerns; anything folks feel is a blocker that I
>> > > haven't
>> > > > > covered below is unintentional.
>> > > > >
>> > > > > The list:
>> > > > >
>> > > > > 1. Documentation. It must be updated and committed, ensuring it
>> > covers
>> > > > the
>> > > > > details operators/architects need to know to use it effectively
>> > > > > (HBASE-16574). Vlad will help with content, myself and/or Frank
>> will
>> > > get
>> > > > it
>> > > > > updated to asciidoc.
>> > > > >
>> > > > > 2. Distributed testing missing. Vlad has taken my previous
>> document
>> > on
>> > > > > goals and translated that into an implementation outline[1]. Ted
>> and
>> > I
>> > > > have
>> > > > > already weighed in -- I believe it hits the salient points for the
>> > > > quality
>> > > > > of testing we're looking for. I'll get started on this while Vlad
>> > does
>> > > #4
>> > > > > (after consensus on approach, of course). Needs JIRA issue
>> (maybe?).
>> > > > >
>> > > > > 3. Operator utility to verify backups. In abstract, this should
>> just
>> > be
>> > > > > the same guts of a tool like VerifyReplication. In practice, this
>> > > should
>> > > > be
>> > > > > the same code that #3 uses (if not _actually_ the same guts as
>> > > > > VerifyReplication). The hope is that this will be encapsulated
>> > > > (time-wise)
>> > > > > by #3. Needs JIRA issue (maybe?).
>> > > > >
>> > > > > 4. Polish DistCP for bulk-loaded files/fault-tolerance
>> > (HBASE-17852). I
>> > > > > don't have specifics here -- will rely on Vlad to correct me if
>> > > there's a
>> > > > > better JIRA issue to track than the aforementioned. Will rely on
>> > > details
>> > > > to
>> > > > > show up the JIRA issue to track it.
>> > > > >
>> > > > > Current due dates:
>> > > > >
>> > > > >
>> > > > Checking in on the plan.
>> > > >
>> > > >
>> > > > > 1. End of week (2017/11/10)
>> > > > >
>> > > >
>> > > > I believe this is done.
>> > > >
>> > > >
>> > > > > 2. Before US Thanksgiving (2017/11/22)
>> > > > > 3. Same as #2
>> > > > > 4. Same as #1
>> > > > >
>> > > > >
>> > > > These were not done in time for thanksgiving? Correct me if I'm
>> wrong.
>> > > >
>> > > > Thanks,
>> > > > St.Ack
>> > > >
>> > > >
>> > > >
>> > > > > My current thought is that this is reasonable for implementation
>> > times,
>> > > > > and would not derail the rest of the beta-1 train. I appreciate
>> the
>> > > > > patience from all parties, and I hope that those trying to make
>> this
>> > > > better
>> > > > > can find a little more time to give some feedback. Thanks for the
>> > long
>> > > > read
>> > > > > if nothing else.
>> > > > >
>> > > > > - Josh
>> > > > >
>> > > > > [1] https://docs.google.com/document/d/1xbPlLKjOcPq2LDqjbSkF6uND
>> > > > > AG0mzgOxek6P3POLeMc/edit?usp=sharing
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Stack <st...@duboce.net>.
On Fri, Dec 1, 2017 at 11:04 AM, Vladimir Rodionov <vl...@gmail.com>
wrote:

> Thanks, Mike
>
> #1 is done
> #4 is done, but not committed yet
> #3 is questionable to say the least. All B&R tools provide guarantee of
> operation correctness if operation succeeds. Otherwise, what is the point
>      of separate Fault-tolerance work? FT includes correctness guarantee as
> well, we track all the needed WAL files or bulk-loaded files during
> incremental backup
>      and guarantee that every single file will be converted and moved to
> backup destination.If you need additional guarantee - restore backup into
> separate table and do verification yourself.
>
> #2 is ongoing
>
>
Ok. An exception was made for B/R and our Josh made a reasonable plan for
landing it (see head of this thread).

#1 is done.
#2 maybe ongoing but there is no evidence of such out here in public
#3 if a problem, should have been objected to before the plan launch (There
were no objections to the provisions of the plan at time of launch). It is
not done.
#4 doesn't seem done given discussion is ongoing.

We are more than a week over the agreed-to deadline. Let me punt backup
from hbase2.
Thanks all who participated trying to land this feature,
St.Ack



> On Fri, Dec 1, 2017 at 10:30 AM, Mike Drob <md...@apache.org> wrote:
>
> > The list is what Josh proposed in the original email to the list.
> >
> > What is the JIRA for #3?
> >
> > On Fri, Dec 1, 2017 at 12:20 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> > >
> > wrote:
> >
> > > Where did you get this from, Stack?
> > >
> > > I am doing scale testing now and this is last task on *my* list for
> > beta-1.
> > >
> > > On Thu, Nov 30, 2017 at 10:27 PM, Stack <st...@duboce.net> wrote:
> > >
> > > > On Tue, Nov 7, 2017 at 8:30 PM, Josh Elser <el...@apache.org>
> wrote:
> > > >
> > > > > Folks,
> > > > >
> > > > > I've been working with Vlad and Ted offline to make sure we have a
> > plan
> > > > > that addresses the implementation gaps Vlad sees and the
> > > > barriers-for-entry
> > > > > previously stated to keep the feature in HBase 2.0. My hope is that
> > > this
> > > > > can be an honest discussion given 2.0-beta timelines, with a
> concrete
> > > > > action plan. I'm trying my best to not re-hash the
> > > > logic/reasoning/caveats
> > > > > behind previous concerns; anything folks feel is a blocker that I
> > > haven't
> > > > > covered below is unintentional.
> > > > >
> > > > > The list:
> > > > >
> > > > > 1. Documentation. It must be updated and committed, ensuring it
> > covers
> > > > the
> > > > > details operators/architects need to know to use it effectively
> > > > > (HBASE-16574). Vlad will help with content, myself and/or Frank
> will
> > > get
> > > > it
> > > > > updated to asciidoc.
> > > > >
> > > > > 2. Distributed testing missing. Vlad has taken my previous document
> > on
> > > > > goals and translated that into an implementation outline[1]. Ted
> and
> > I
> > > > have
> > > > > already weighed in -- I believe it hits the salient points for the
> > > > quality
> > > > > of testing we're looking for. I'll get started on this while Vlad
> > does
> > > #4
> > > > > (after consensus on approach, of course). Needs JIRA issue
> (maybe?).
> > > > >
> > > > > 3. Operator utility to verify backups. In abstract, this should
> just
> > be
> > > > > the same guts of a tool like VerifyReplication. In practice, this
> > > should
> > > > be
> > > > > the same code that #3 uses (if not _actually_ the same guts as
> > > > > VerifyReplication). The hope is that this will be encapsulated
> > > > (time-wise)
> > > > > by #3. Needs JIRA issue (maybe?).
> > > > >
> > > > > 4. Polish DistCP for bulk-loaded files/fault-tolerance
> > (HBASE-17852). I
> > > > > don't have specifics here -- will rely on Vlad to correct me if
> > > there's a
> > > > > better JIRA issue to track than the aforementioned. Will rely on
> > > details
> > > > to
> > > > > show up the JIRA issue to track it.
> > > > >
> > > > > Current due dates:
> > > > >
> > > > >
> > > > Checking in on the plan.
> > > >
> > > >
> > > > > 1. End of week (2017/11/10)
> > > > >
> > > >
> > > > I believe this is done.
> > > >
> > > >
> > > > > 2. Before US Thanksgiving (2017/11/22)
> > > > > 3. Same as #2
> > > > > 4. Same as #1
> > > > >
> > > > >
> > > > These were not done in time for thanksgiving? Correct me if I'm
> wrong.
> > > >
> > > > Thanks,
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > > My current thought is that this is reasonable for implementation
> > times,
> > > > > and would not derail the rest of the beta-1 train. I appreciate the
> > > > > patience from all parties, and I hope that those trying to make
> this
> > > > better
> > > > > can find a little more time to give some feedback. Thanks for the
> > long
> > > > read
> > > > > if nothing else.
> > > > >
> > > > > - Josh
> > > > >
> > > > > [1] https://docs.google.com/document/d/1xbPlLKjOcPq2LDqjbSkF6uND
> > > > > AG0mzgOxek6P3POLeMc/edit?usp=sharing
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Vladimir Rodionov <vl...@gmail.com>.
Thanks, Mike

#1 is done
#4 is done, but not committed yet
#3 is questionable to say the least. All B&R tools provide guarantee of
operation correctness if operation succeeds. Otherwise, what is the point
     of separate Fault-tolerance work? FT includes correctness guarantee as
well, we track all the needed WAL files or bulk-loaded files during
incremental backup
     and guarantee that every single file will be converted and moved to
backup destination.If you need additional guarantee - restore backup into
separate table and do verification yourself.

#2 is ongoing

On Fri, Dec 1, 2017 at 10:30 AM, Mike Drob <md...@apache.org> wrote:

> The list is what Josh proposed in the original email to the list.
>
> What is the JIRA for #3?
>
> On Fri, Dec 1, 2017 at 12:20 PM, Vladimir Rodionov <vladrodionov@gmail.com
> >
> wrote:
>
> > Where did you get this from, Stack?
> >
> > I am doing scale testing now and this is last task on *my* list for
> beta-1.
> >
> > On Thu, Nov 30, 2017 at 10:27 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Tue, Nov 7, 2017 at 8:30 PM, Josh Elser <el...@apache.org> wrote:
> > >
> > > > Folks,
> > > >
> > > > I've been working with Vlad and Ted offline to make sure we have a
> plan
> > > > that addresses the implementation gaps Vlad sees and the
> > > barriers-for-entry
> > > > previously stated to keep the feature in HBase 2.0. My hope is that
> > this
> > > > can be an honest discussion given 2.0-beta timelines, with a concrete
> > > > action plan. I'm trying my best to not re-hash the
> > > logic/reasoning/caveats
> > > > behind previous concerns; anything folks feel is a blocker that I
> > haven't
> > > > covered below is unintentional.
> > > >
> > > > The list:
> > > >
> > > > 1. Documentation. It must be updated and committed, ensuring it
> covers
> > > the
> > > > details operators/architects need to know to use it effectively
> > > > (HBASE-16574). Vlad will help with content, myself and/or Frank will
> > get
> > > it
> > > > updated to asciidoc.
> > > >
> > > > 2. Distributed testing missing. Vlad has taken my previous document
> on
> > > > goals and translated that into an implementation outline[1]. Ted and
> I
> > > have
> > > > already weighed in -- I believe it hits the salient points for the
> > > quality
> > > > of testing we're looking for. I'll get started on this while Vlad
> does
> > #4
> > > > (after consensus on approach, of course). Needs JIRA issue (maybe?).
> > > >
> > > > 3. Operator utility to verify backups. In abstract, this should just
> be
> > > > the same guts of a tool like VerifyReplication. In practice, this
> > should
> > > be
> > > > the same code that #3 uses (if not _actually_ the same guts as
> > > > VerifyReplication). The hope is that this will be encapsulated
> > > (time-wise)
> > > > by #3. Needs JIRA issue (maybe?).
> > > >
> > > > 4. Polish DistCP for bulk-loaded files/fault-tolerance
> (HBASE-17852). I
> > > > don't have specifics here -- will rely on Vlad to correct me if
> > there's a
> > > > better JIRA issue to track than the aforementioned. Will rely on
> > details
> > > to
> > > > show up the JIRA issue to track it.
> > > >
> > > > Current due dates:
> > > >
> > > >
> > > Checking in on the plan.
> > >
> > >
> > > > 1. End of week (2017/11/10)
> > > >
> > >
> > > I believe this is done.
> > >
> > >
> > > > 2. Before US Thanksgiving (2017/11/22)
> > > > 3. Same as #2
> > > > 4. Same as #1
> > > >
> > > >
> > > These were not done in time for thanksgiving? Correct me if I'm wrong.
> > >
> > > Thanks,
> > > St.Ack
> > >
> > >
> > >
> > > > My current thought is that this is reasonable for implementation
> times,
> > > > and would not derail the rest of the beta-1 train. I appreciate the
> > > > patience from all parties, and I hope that those trying to make this
> > > better
> > > > can find a little more time to give some feedback. Thanks for the
> long
> > > read
> > > > if nothing else.
> > > >
> > > > - Josh
> > > >
> > > > [1] https://docs.google.com/document/d/1xbPlLKjOcPq2LDqjbSkF6uND
> > > > AG0mzgOxek6P3POLeMc/edit?usp=sharing
> > > >
> > >
> >
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Mike Drob <md...@apache.org>.
The list is what Josh proposed in the original email to the list.

What is the JIRA for #3?

On Fri, Dec 1, 2017 at 12:20 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:

> Where did you get this from, Stack?
>
> I am doing scale testing now and this is last task on *my* list for beta-1.
>
> On Thu, Nov 30, 2017 at 10:27 PM, Stack <st...@duboce.net> wrote:
>
> > On Tue, Nov 7, 2017 at 8:30 PM, Josh Elser <el...@apache.org> wrote:
> >
> > > Folks,
> > >
> > > I've been working with Vlad and Ted offline to make sure we have a plan
> > > that addresses the implementation gaps Vlad sees and the
> > barriers-for-entry
> > > previously stated to keep the feature in HBase 2.0. My hope is that
> this
> > > can be an honest discussion given 2.0-beta timelines, with a concrete
> > > action plan. I'm trying my best to not re-hash the
> > logic/reasoning/caveats
> > > behind previous concerns; anything folks feel is a blocker that I
> haven't
> > > covered below is unintentional.
> > >
> > > The list:
> > >
> > > 1. Documentation. It must be updated and committed, ensuring it covers
> > the
> > > details operators/architects need to know to use it effectively
> > > (HBASE-16574). Vlad will help with content, myself and/or Frank will
> get
> > it
> > > updated to asciidoc.
> > >
> > > 2. Distributed testing missing. Vlad has taken my previous document on
> > > goals and translated that into an implementation outline[1]. Ted and I
> > have
> > > already weighed in -- I believe it hits the salient points for the
> > quality
> > > of testing we're looking for. I'll get started on this while Vlad does
> #4
> > > (after consensus on approach, of course). Needs JIRA issue (maybe?).
> > >
> > > 3. Operator utility to verify backups. In abstract, this should just be
> > > the same guts of a tool like VerifyReplication. In practice, this
> should
> > be
> > > the same code that #3 uses (if not _actually_ the same guts as
> > > VerifyReplication). The hope is that this will be encapsulated
> > (time-wise)
> > > by #3. Needs JIRA issue (maybe?).
> > >
> > > 4. Polish DistCP for bulk-loaded files/fault-tolerance (HBASE-17852). I
> > > don't have specifics here -- will rely on Vlad to correct me if
> there's a
> > > better JIRA issue to track than the aforementioned. Will rely on
> details
> > to
> > > show up the JIRA issue to track it.
> > >
> > > Current due dates:
> > >
> > >
> > Checking in on the plan.
> >
> >
> > > 1. End of week (2017/11/10)
> > >
> >
> > I believe this is done.
> >
> >
> > > 2. Before US Thanksgiving (2017/11/22)
> > > 3. Same as #2
> > > 4. Same as #1
> > >
> > >
> > These were not done in time for thanksgiving? Correct me if I'm wrong.
> >
> > Thanks,
> > St.Ack
> >
> >
> >
> > > My current thought is that this is reasonable for implementation times,
> > > and would not derail the rest of the beta-1 train. I appreciate the
> > > patience from all parties, and I hope that those trying to make this
> > better
> > > can find a little more time to give some feedback. Thanks for the long
> > read
> > > if nothing else.
> > >
> > > - Josh
> > >
> > > [1] https://docs.google.com/document/d/1xbPlLKjOcPq2LDqjbSkF6uND
> > > AG0mzgOxek6P3POLeMc/edit?usp=sharing
> > >
> >
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Vladimir Rodionov <vl...@gmail.com>.
Where did you get this from, Stack?

I am doing scale testing now and this is last task on *my* list for beta-1.

On Thu, Nov 30, 2017 at 10:27 PM, Stack <st...@duboce.net> wrote:

> On Tue, Nov 7, 2017 at 8:30 PM, Josh Elser <el...@apache.org> wrote:
>
> > Folks,
> >
> > I've been working with Vlad and Ted offline to make sure we have a plan
> > that addresses the implementation gaps Vlad sees and the
> barriers-for-entry
> > previously stated to keep the feature in HBase 2.0. My hope is that this
> > can be an honest discussion given 2.0-beta timelines, with a concrete
> > action plan. I'm trying my best to not re-hash the
> logic/reasoning/caveats
> > behind previous concerns; anything folks feel is a blocker that I haven't
> > covered below is unintentional.
> >
> > The list:
> >
> > 1. Documentation. It must be updated and committed, ensuring it covers
> the
> > details operators/architects need to know to use it effectively
> > (HBASE-16574). Vlad will help with content, myself and/or Frank will get
> it
> > updated to asciidoc.
> >
> > 2. Distributed testing missing. Vlad has taken my previous document on
> > goals and translated that into an implementation outline[1]. Ted and I
> have
> > already weighed in -- I believe it hits the salient points for the
> quality
> > of testing we're looking for. I'll get started on this while Vlad does #4
> > (after consensus on approach, of course). Needs JIRA issue (maybe?).
> >
> > 3. Operator utility to verify backups. In abstract, this should just be
> > the same guts of a tool like VerifyReplication. In practice, this should
> be
> > the same code that #3 uses (if not _actually_ the same guts as
> > VerifyReplication). The hope is that this will be encapsulated
> (time-wise)
> > by #3. Needs JIRA issue (maybe?).
> >
> > 4. Polish DistCP for bulk-loaded files/fault-tolerance (HBASE-17852). I
> > don't have specifics here -- will rely on Vlad to correct me if there's a
> > better JIRA issue to track than the aforementioned. Will rely on details
> to
> > show up the JIRA issue to track it.
> >
> > Current due dates:
> >
> >
> Checking in on the plan.
>
>
> > 1. End of week (2017/11/10)
> >
>
> I believe this is done.
>
>
> > 2. Before US Thanksgiving (2017/11/22)
> > 3. Same as #2
> > 4. Same as #1
> >
> >
> These were not done in time for thanksgiving? Correct me if I'm wrong.
>
> Thanks,
> St.Ack
>
>
>
> > My current thought is that this is reasonable for implementation times,
> > and would not derail the rest of the beta-1 train. I appreciate the
> > patience from all parties, and I hope that those trying to make this
> better
> > can find a little more time to give some feedback. Thanks for the long
> read
> > if nothing else.
> >
> > - Josh
> >
> > [1] https://docs.google.com/document/d/1xbPlLKjOcPq2LDqjbSkF6uND
> > AG0mzgOxek6P3POLeMc/edit?usp=sharing
> >
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Stack <st...@duboce.net>.
On Tue, Nov 7, 2017 at 8:30 PM, Josh Elser <el...@apache.org> wrote:

> Folks,
>
> I've been working with Vlad and Ted offline to make sure we have a plan
> that addresses the implementation gaps Vlad sees and the barriers-for-entry
> previously stated to keep the feature in HBase 2.0. My hope is that this
> can be an honest discussion given 2.0-beta timelines, with a concrete
> action plan. I'm trying my best to not re-hash the logic/reasoning/caveats
> behind previous concerns; anything folks feel is a blocker that I haven't
> covered below is unintentional.
>
> The list:
>
> 1. Documentation. It must be updated and committed, ensuring it covers the
> details operators/architects need to know to use it effectively
> (HBASE-16574). Vlad will help with content, myself and/or Frank will get it
> updated to asciidoc.
>
> 2. Distributed testing missing. Vlad has taken my previous document on
> goals and translated that into an implementation outline[1]. Ted and I have
> already weighed in -- I believe it hits the salient points for the quality
> of testing we're looking for. I'll get started on this while Vlad does #4
> (after consensus on approach, of course). Needs JIRA issue (maybe?).
>
> 3. Operator utility to verify backups. In abstract, this should just be
> the same guts of a tool like VerifyReplication. In practice, this should be
> the same code that #3 uses (if not _actually_ the same guts as
> VerifyReplication). The hope is that this will be encapsulated (time-wise)
> by #3. Needs JIRA issue (maybe?).
>
> 4. Polish DistCP for bulk-loaded files/fault-tolerance (HBASE-17852). I
> don't have specifics here -- will rely on Vlad to correct me if there's a
> better JIRA issue to track than the aforementioned. Will rely on details to
> show up the JIRA issue to track it.
>
> Current due dates:
>
>
Checking in on the plan.


> 1. End of week (2017/11/10)
>

I believe this is done.


> 2. Before US Thanksgiving (2017/11/22)
> 3. Same as #2
> 4. Same as #1
>
>
These were not done in time for thanksgiving? Correct me if I'm wrong.

Thanks,
St.Ack



> My current thought is that this is reasonable for implementation times,
> and would not derail the rest of the beta-1 train. I appreciate the
> patience from all parties, and I hope that those trying to make this better
> can find a little more time to give some feedback. Thanks for the long read
> if nothing else.
>
> - Josh
>
> [1] https://docs.google.com/document/d/1xbPlLKjOcPq2LDqjbSkF6uND
> AG0mzgOxek6P3POLeMc/edit?usp=sharing
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Josh Elser <el...@apache.org>.
On 11/11/17 5:31 PM, Stack wrote:
> On Fri, Nov 10, 2017 at 3:17 PM, Josh Elser<el...@apache.org>  wrote:
> 
>> A few more areas of movement:
>>
>> * I've cleaned up HBASE-14414 to encapsulate what is still being proposed
>> for beta-1. Shouldn't be any surprises here. Things outstanding deferred to
>> HBASE-17362 (fixVersion=2.1.0).
>>
> 
> Is there a high-level overview of what the feature should be able to do in
> hbase-2? (The issue HBASE-14414  has a bunch of issues hanging off it. It
> is hard to get an overview).
> 
> I read over the doc patch (its nice). It seems to give a high-level
> overview.  It has a limitations section on the end which is good. Is there
> anything on what user can expect in terms of size consumptions, resources
> consumed effecting a backup, or how long a restore will take? I would think
> it useful I'd imagine, particularly the latter bit of info as a rough
> gauge. Is this where I (a user) gets an overview of what would be in hbase2?
> 
> Doc says merge of incrementals works? Thought I saw somewhere that it was
> not done -- could be wrong.
> 
> Has anyone tried the example in the doc? (Backup to s3?).
> 
> 

I don't think this section was ever commented on.

I've tried to clean up the Phase 3 "uber" issue to start tracking things 
outstanding for 2.0.0. The only exception are things listed in 
HBASE-18892 which aren't currently slated to be done for 2.0. Stating 
for others who might not be watching the doc update issue (HBASE-16574), 
the new chapter was updated to include a section which acknowledges 
"known deficiencies/impl-choices". I'll be opening one more (hanging off 
Phase3) to track the new IntegrationTest.

Incremental backups do work as you later noted :)

I think Vlad had done S3 testing a while back (I'll let him confirm -- I 
could be confused). Not sure if it's been done recently.

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Stack <st...@duboce.net>.
On Sat, Nov 11, 2017 at 2:31 PM, Stack <st...@duboce.net> wrote:

> On Fri, Nov 10, 2017 at 3:17 PM, Josh Elser <el...@apache.org> wrote:
>
>> ..
>
> Doc says merge of incrementals works? Thought I saw somewhere that it was
> not done -- could be wrong.
>
>
Misread. Looks like the above is done afterall:

"There is no merge for incremental images (HBASE-14135
<https://issues.apache.org/jira/browse/HBASE-14135>). This can increase
restore time. Users will need to periodically execute full backups to be
able to restore data faster. "

S*

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Vladimir Rodionov <vl...@gmail.com>.
Nope, Mike. Fortunately, 99% of a FT code will remains after introducing
concurrent sessions support

Just two lines will be changed: TakeSnapshot -> BeginTX, RestoreSnapshot ->
RollbackTx

-Vlad

On Thu, Nov 30, 2017 at 7:20 PM, Mike Drob <md...@apache.org> wrote:

> Bringing this thread up again, because I don't really know where else to
> ask...
>
> The current backup&restore solution snapshots the backup metadata table and
> will restore-via-snapshot in case something goes wrong (or this is still in
> a patch? unclear if this has been committed or not, since there's a ton of
> code to dig through)
>
> AFAICT this is the major reason that we do not support concurrent backup or
> restore operations. (Are there others? Also couldn't find this.)
>
> The fault tolerance that we're working on now will need to be gutted and
> completely rewritten for the future improvements. I get that this is all
> internal and as long as we make it seamless for the operators then we have
> wide latitude to make our own changes. But an important question is just
> because we can, does it mean we should do this? I'm concerned that we're
> writing code that we know will get thrown away and replaced, except we will
> have to continue to support it for as long as 2.0 is an active branch.
>
> Mike
>
>
> On Wed, Nov 15, 2017 at 3:05 PM, Josh Elser <el...@apache.org> wrote:
>
> > On 11/14/17 4:54 PM, Mike Drob wrote:
> >
> >> I can see a small section on the documentation update I've already been
> >>> hacking on to include details on the issue "We can't help you secure
> >>> where
> >>> you put the data". Given how many instances of "globally readable S3
> >>> bucket" I've seen recently, this strikes me as prudent.
> >>>
> >>> I would prefer this to be a giant, hard to miss, red letters, all caps
> >> warning; not a small section. I do think it is our responsibility for
> >> telling users how to configure the backup/restore process for
> >> communicating
> >> with secure systems. Or, at a minimum, documenting how we pass arbitrary
> >> configuration options that can then be used to communicate with said
> >> systems.
> >>
> >
> > :D
> >
> > For example, if we support writing backups to S3, then we should have a
> way
> >> to specify an Auth string and maybe even some of the custom headers like
> >> x-amz-acl. We don't have to explicitly enumerate best practices, but if
> >> the
> >> only option is to write to a globally open bucket, then I don't think we
> >> should advertise writing to S3 as an available option.
> >>
> >> Similarly, if we tell people that they can send backups to HDFS, then we
> >> should give them the hooks to correctly interface with a kerberized
> HDFS.
> >>
> >> Maybe this is already in the proposed patch, I haven't gone looking yet.
> >>
> >
> > Nope. I actually meant to include this in the patch I re-rolled today but
> > forgot. Let me update once more.
> >
> > Thanks again, Mike. Good questions/feedback!
> >
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Mike Drob <md...@apache.org>.
Bringing this thread up again, because I don't really know where else to
ask...

The current backup&restore solution snapshots the backup metadata table and
will restore-via-snapshot in case something goes wrong (or this is still in
a patch? unclear if this has been committed or not, since there's a ton of
code to dig through)

AFAICT this is the major reason that we do not support concurrent backup or
restore operations. (Are there others? Also couldn't find this.)

The fault tolerance that we're working on now will need to be gutted and
completely rewritten for the future improvements. I get that this is all
internal and as long as we make it seamless for the operators then we have
wide latitude to make our own changes. But an important question is just
because we can, does it mean we should do this? I'm concerned that we're
writing code that we know will get thrown away and replaced, except we will
have to continue to support it for as long as 2.0 is an active branch.

Mike


On Wed, Nov 15, 2017 at 3:05 PM, Josh Elser <el...@apache.org> wrote:

> On 11/14/17 4:54 PM, Mike Drob wrote:
>
>> I can see a small section on the documentation update I've already been
>>> hacking on to include details on the issue "We can't help you secure
>>> where
>>> you put the data". Given how many instances of "globally readable S3
>>> bucket" I've seen recently, this strikes me as prudent.
>>>
>>> I would prefer this to be a giant, hard to miss, red letters, all caps
>> warning; not a small section. I do think it is our responsibility for
>> telling users how to configure the backup/restore process for
>> communicating
>> with secure systems. Or, at a minimum, documenting how we pass arbitrary
>> configuration options that can then be used to communicate with said
>> systems.
>>
>
> :D
>
> For example, if we support writing backups to S3, then we should have a way
>> to specify an Auth string and maybe even some of the custom headers like
>> x-amz-acl. We don't have to explicitly enumerate best practices, but if
>> the
>> only option is to write to a globally open bucket, then I don't think we
>> should advertise writing to S3 as an available option.
>>
>> Similarly, if we tell people that they can send backups to HDFS, then we
>> should give them the hooks to correctly interface with a kerberized HDFS.
>>
>> Maybe this is already in the proposed patch, I haven't gone looking yet.
>>
>
> Nope. I actually meant to include this in the patch I re-rolled today but
> forgot. Let me update once more.
>
> Thanks again, Mike. Good questions/feedback!
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Josh Elser <el...@apache.org>.
On 11/14/17 4:54 PM, Mike Drob wrote:
>> I can see a small section on the documentation update I've already been
>> hacking on to include details on the issue "We can't help you secure where
>> you put the data". Given how many instances of "globally readable S3
>> bucket" I've seen recently, this strikes me as prudent.
>>
> I would prefer this to be a giant, hard to miss, red letters, all caps
> warning; not a small section. I do think it is our responsibility for
> telling users how to configure the backup/restore process for communicating
> with secure systems. Or, at a minimum, documenting how we pass arbitrary
> configuration options that can then be used to communicate with said
> systems.

:D

> For example, if we support writing backups to S3, then we should have a way
> to specify an Auth string and maybe even some of the custom headers like
> x-amz-acl. We don't have to explicitly enumerate best practices, but if the
> only option is to write to a globally open bucket, then I don't think we
> should advertise writing to S3 as an available option.
> 
> Similarly, if we tell people that they can send backups to HDFS, then we
> should give them the hooks to correctly interface with a kerberized HDFS.
> 
> Maybe this is already in the proposed patch, I haven't gone looking yet.

Nope. I actually meant to include this in the patch I re-rolled today 
but forgot. Let me update once more.

Thanks again, Mike. Good questions/feedback!

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Mike Drob <md...@apache.org>.
On Tue, Nov 14, 2017 at 2:57 PM, Josh Elser <el...@apache.org> wrote:

> On 11/14/17 3:04 PM, Mike Drob wrote:
>
>> I don't think the second part of my email ever got addressed.
>>
>> I see  "HBase Backup/Restore Phase 3: Security"[1] resolved as "Later"
>>>
>> and claims that it will be implemented in the client, both of which make
>> me
>> uncomfortable. Security Later is a general bad practice, and it is very
>> rarely correct to rely on client-side security for anything.
>>
>>> Is there another issue that covers security? Do we rely completely on
>>>
>> HDFS security here for more than just the DistCP? What kind of testing has
>> been done with security, do we have assurances that the backups aren't
>> accidentally exposing tables to the world?
>>
>
> "Security" as you phrase is pretty open ended, no? The current security
> model is based around the filesystem permissions and the enforcement of an
> HBase superuser to execute the necessary service operations behind the
> BackupAdmin "facade" (e.g. WAL roll procedure execution, snapshot creation,
> snapshot restore, update hbase:backup are the HBase client actions actually
> being performed). That's the state of what it is right now and, yes, it
> does rely on the filesystem backups are sent to (e.g. HDFS, S3, Isilon,
> WASB) are properly secured. We certainly don't want to be testing
> correctness of those systems in HBase.
>
>
Yea, it's somewhat open ended. Relying on filesystem enforcement is
probably sufficient for now, and I agree that it is not within out scope to
be testing correctness of their implementation.


> I can see a small section on the documentation update I've already been
> hacking on to include details on the issue "We can't help you secure where
> you put the data". Given how many instances of "globally readable S3
> bucket" I've seen recently, this strikes me as prudent.
>

I would prefer this to be a giant, hard to miss, red letters, all caps
warning; not a small section. I do think it is our responsibility for
telling users how to configure the backup/restore process for communicating
with secure systems. Or, at a minimum, documenting how we pass arbitrary
configuration options that can then be used to communicate with said
systems.

For example, if we support writing backups to S3, then we should have a way
to specify an Auth string and maybe even some of the custom headers like
x-amz-acl. We don't have to explicitly enumerate best practices, but if the
only option is to write to a globally open bucket, then I don't think we
should advertise writing to S3 as an available option.

Similarly, if we tell people that they can send backups to HDFS, then we
should give them the hooks to correctly interface with a kerberized HDFS.

Maybe this is already in the proposed patch, I haven't gone looking yet.


> The final issue then is about the backup containing other table's data --
> somehow a backup would reference data from another table than the one the
> admin intended to access. For full backups, this is out of scope (the full
> backup is relying on Snapshots -- we shouldn't be testing correctness of
> Snapshots via B&R). For incremental backups, specifically when we're
> filtering WALs, this is a concern. Thankfully, it's an analogous problem to
> "correctness". We have unit test coverage in this area already, and we
> should get good coverage in the up-coming integration test.
>

Again, agree on the general outline of scope you've suggested. Are we
testing the correctness on the backup itself or on a table built from the
restore of that backup? There may be a subtle difference between the two.

I've got some ideas for interesting sequences that would be good to verify,
but need a bit of time to check that I'd be asserting what I think I'm
asserting. Will need a few days to digest and then I should have something
I can concretely point at and ask "what about this?"


>
> Does that help paint a better picture, Mike? Have I missed or glossed over
> any points?
>

Yes, this was very helpful. Thanks, Josh.

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Josh Elser <el...@apache.org>.
On 11/14/17 3:04 PM, Mike Drob wrote:
> I don't think the second part of my email ever got addressed.
> 
>> I see  "HBase Backup/Restore Phase 3: Security"[1] resolved as "Later"
> and claims that it will be implemented in the client, both of which make me
> uncomfortable. Security Later is a general bad practice, and it is very
> rarely correct to rely on client-side security for anything.
>> Is there another issue that covers security? Do we rely completely on
> HDFS security here for more than just the DistCP? What kind of testing has
> been done with security, do we have assurances that the backups aren't
> accidentally exposing tables to the world?

"Security" as you phrase is pretty open ended, no? The current security 
model is based around the filesystem permissions and the enforcement of 
an HBase superuser to execute the necessary service operations behind 
the BackupAdmin "facade" (e.g. WAL roll procedure execution, snapshot 
creation, snapshot restore, update hbase:backup are the HBase client 
actions actually being performed). That's the state of what it is right 
now and, yes, it does rely on the filesystem backups are sent to (e.g. 
HDFS, S3, Isilon, WASB) are properly secured. We certainly don't want to 
be testing correctness of those systems in HBase.

I can see a small section on the documentation update I've already been 
hacking on to include details on the issue "We can't help you secure 
where you put the data". Given how many instances of "globally readable 
S3 bucket" I've seen recently, this strikes me as prudent.

The final issue then is about the backup containing other table's data 
-- somehow a backup would reference data from another table than the one 
the admin intended to access. For full backups, this is out of scope 
(the full backup is relying on Snapshots -- we shouldn't be testing 
correctness of Snapshots via B&R). For incremental backups, specifically 
when we're filtering WALs, this is a concern. Thankfully, it's an 
analogous problem to "correctness". We have unit test coverage in this 
area already, and we should get good coverage in the up-coming 
integration test.

Does that help paint a better picture, Mike? Have I missed or glossed 
over any points?

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Mike Drob <md...@apache.org>.
I don't think the second part of my email ever got addressed.

> I see  "HBase Backup/Restore Phase 3: Security"[1] resolved as "Later"
and claims that it will be implemented in the client, both of which make me
uncomfortable. Security Later is a general bad practice, and it is very
rarely correct to rely on client-side security for anything.
> Is there another issue that covers security? Do we rely completely on
HDFS security here for more than just the DistCP? What kind of testing has
been done with security, do we have assurances that the backups aren't
accidentally exposing tables to the world?

Vlad? Ted? Josh?

On Mon, Nov 13, 2017 at 11:02 AM, Mike Drob <md...@apache.org> wrote:

> I know I'm late to the party here, but I've got another potential blocker
> to add.
>
> We just ran an HP fortify scan internally and the results did not look
> good, specifically on IncrementalTableBackupClient and
> MapReduceBackupCopyJob. I'm still sorting through whether these are
> actually exploitable, or whether it's a symptom of MapReduce being an
> arbitrary code execution framework anyway but this does make me wonder
> about the overall security posture.
>
> I see  "HBase Backup/Restore Phase 3: Security"[1] resolved as "Later" and
> claims that it will be implemented in the client, both of which make me
> uncomfortable. Security Later is a general bad practice, and it is very
> rarely correct to rely on client-side security for anything.
>
> Is there another issue that covers security? Do we rely completely on HDFS
> security here for more than just the DistCP? What kind of testing has been
> done with security, do we have assurances that the backups aren't
> accidentally exposing tables to the world?
>
> Thanks,
> Mike
>
> [1]: https://issues.apache.org/jira/browse/HBASE-14138
>
> On Mon, Nov 13, 2017 at 10:38 AM, Josh Elser <el...@apache.org> wrote:
>
>> On 11/11/17 5:31 PM, Stack wrote:
>>
>>> Don't want to make any assumptions, but I hope the lack of hard objection
>>>> can be interpreted as (begrudging, perhaps) acceptance of the plan. Let
>>>> me/us know when possible, please!
>>>>
>>>>
>>>> Plan seems fine.
>>>
>>> Are you the owner of this feature now Josh or just shepherding it in?
>>>
>>
>> Thanks, Stack.
>>
>> Good question: should have included that out-right. Vlad, Ted, and myself
>> had a chat on this last week.
>>
>> While Vlad is polishing HBASE-17852 and HBASE-17825, I told him I'll help
>> out with the HBASE-18892 (testing) and the Book update. Was waiting for
>> some consensus on the testing gdoc before picking that up.
>>
>> I think Vlad is still the owner, but you could certainly call me a
>> shepherd. I also answer to "sherpa" ;)
>>
>
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Vladimir Rodionov <vl...@gmail.com>.
Thanks, Mike

We will take a look.

-Vlad

On Mon, Nov 13, 2017 at 11:45 AM, Mike Drob <ma...@cloudera.com> wrote:

> Sure, I don't think there are any issue with sharing this publicly, since
> the code has only gone out in alpha releases.
>
> The suspect lines in IncrementalTableBackupClient are 163 and 326. I'm
> still working on validating the call path that leads to those getting
> flagged.
>
> The issues in MapReduceBackupCopyJob are on lines 386, 405, and 407.
>
> All of them relate to un-sanitized inputs in one way or another.
>
> On Mon, Nov 13, 2017 at 12:50 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Mike:
> > Can you share your finding w.r.t. IncrementalTableBackupClient and
> > MapReduceBackupCopyJob
> > ?
> >
> > IncrementalTableBackupClient utilizes WALPlayer directly.
> >
> > I wonder what vulnerability there is.
> >
> > Thanks
> >
> > On Mon, Nov 13, 2017 at 9:02 AM, Mike Drob <md...@apache.org> wrote:
> >
> > > I know I'm late to the party here, but I've got another potential
> blocker
> > > to add.
> > >
> > > We just ran an HP fortify scan internally and the results did not look
> > > good, specifically on IncrementalTableBackupClient and
> > > MapReduceBackupCopyJob. I'm still sorting through whether these are
> > > actually exploitable, or whether it's a symptom of MapReduce being an
> > > arbitrary code execution framework anyway but this does make me wonder
> > > about the overall security posture.
> > >
> > > I see  "HBase Backup/Restore Phase 3: Security"[1] resolved as "Later"
> > and
> > > claims that it will be implemented in the client, both of which make me
> > > uncomfortable. Security Later is a general bad practice, and it is very
> > > rarely correct to rely on client-side security for anything.
> > >
> > > Is there another issue that covers security? Do we rely completely on
> > HDFS
> > > security here for more than just the DistCP? What kind of testing has
> > been
> > > done with security, do we have assurances that the backups aren't
> > > accidentally exposing tables to the world?
> > >
> > > Thanks,
> > > Mike
> > >
> > > [1]: https://issues.apache.org/jira/browse/HBASE-14138
> > >
> > > On Mon, Nov 13, 2017 at 10:38 AM, Josh Elser <el...@apache.org>
> wrote:
> > >
> > > > On 11/11/17 5:31 PM, Stack wrote:
> > > >
> > > >> Don't want to make any assumptions, but I hope the lack of hard
> > > objection
> > > >>> can be interpreted as (begrudging, perhaps) acceptance of the plan.
> > Let
> > > >>> me/us know when possible, please!
> > > >>>
> > > >>>
> > > >>> Plan seems fine.
> > > >>
> > > >> Are you the owner of this feature now Josh or just shepherding it
> in?
> > > >>
> > > >
> > > > Thanks, Stack.
> > > >
> > > > Good question: should have included that out-right. Vlad, Ted, and
> > myself
> > > > had a chat on this last week.
> > > >
> > > > While Vlad is polishing HBASE-17852 and HBASE-17825, I told him I'll
> > help
> > > > out with the HBASE-18892 (testing) and the Book update. Was waiting
> for
> > > > some consensus on the testing gdoc before picking that up.
> > > >
> > > > I think Vlad is still the owner, but you could certainly call me a
> > > > shepherd. I also answer to "sherpa" ;)
> > > >
> > >
> >
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Ted Yu <yu...@gmail.com>.
From the refguide patch:

Only an HBase superuser (e.g. hbase) is allowed to perform backup/restore

Would the lines you mentioned pose potential concern even when run by
superuser ?

On Mon, Nov 13, 2017 at 11:45 AM, Mike Drob <ma...@cloudera.com> wrote:

> Sure, I don't think there are any issue with sharing this publicly, since
> the code has only gone out in alpha releases.
>
> The suspect lines in IncrementalTableBackupClient are 163 and 326. I'm
> still working on validating the call path that leads to those getting
> flagged.
>
> The issues in MapReduceBackupCopyJob are on lines 386, 405, and 407.
>
> All of them relate to un-sanitized inputs in one way or another.
>
> On Mon, Nov 13, 2017 at 12:50 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Mike:
> > Can you share your finding w.r.t. IncrementalTableBackupClient and
> > MapReduceBackupCopyJob
> > ?
> >
> > IncrementalTableBackupClient utilizes WALPlayer directly.
> >
> > I wonder what vulnerability there is.
> >
> > Thanks
> >
> > On Mon, Nov 13, 2017 at 9:02 AM, Mike Drob <md...@apache.org> wrote:
> >
> > > I know I'm late to the party here, but I've got another potential
> blocker
> > > to add.
> > >
> > > We just ran an HP fortify scan internally and the results did not look
> > > good, specifically on IncrementalTableBackupClient and
> > > MapReduceBackupCopyJob. I'm still sorting through whether these are
> > > actually exploitable, or whether it's a symptom of MapReduce being an
> > > arbitrary code execution framework anyway but this does make me wonder
> > > about the overall security posture.
> > >
> > > I see  "HBase Backup/Restore Phase 3: Security"[1] resolved as "Later"
> > and
> > > claims that it will be implemented in the client, both of which make me
> > > uncomfortable. Security Later is a general bad practice, and it is very
> > > rarely correct to rely on client-side security for anything.
> > >
> > > Is there another issue that covers security? Do we rely completely on
> > HDFS
> > > security here for more than just the DistCP? What kind of testing has
> > been
> > > done with security, do we have assurances that the backups aren't
> > > accidentally exposing tables to the world?
> > >
> > > Thanks,
> > > Mike
> > >
> > > [1]: https://issues.apache.org/jira/browse/HBASE-14138
> > >
> > > On Mon, Nov 13, 2017 at 10:38 AM, Josh Elser <el...@apache.org>
> wrote:
> > >
> > > > On 11/11/17 5:31 PM, Stack wrote:
> > > >
> > > >> Don't want to make any assumptions, but I hope the lack of hard
> > > objection
> > > >>> can be interpreted as (begrudging, perhaps) acceptance of the plan.
> > Let
> > > >>> me/us know when possible, please!
> > > >>>
> > > >>>
> > > >>> Plan seems fine.
> > > >>
> > > >> Are you the owner of this feature now Josh or just shepherding it
> in?
> > > >>
> > > >
> > > > Thanks, Stack.
> > > >
> > > > Good question: should have included that out-right. Vlad, Ted, and
> > myself
> > > > had a chat on this last week.
> > > >
> > > > While Vlad is polishing HBASE-17852 and HBASE-17825, I told him I'll
> > help
> > > > out with the HBASE-18892 (testing) and the Book update. Was waiting
> for
> > > > some consensus on the testing gdoc before picking that up.
> > > >
> > > > I think Vlad is still the owner, but you could certainly call me a
> > > > shepherd. I also answer to "sherpa" ;)
> > > >
> > >
> >
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Mike Drob <ma...@cloudera.com>.
Sure, I don't think there are any issue with sharing this publicly, since
the code has only gone out in alpha releases.

The suspect lines in IncrementalTableBackupClient are 163 and 326. I'm
still working on validating the call path that leads to those getting
flagged.

The issues in MapReduceBackupCopyJob are on lines 386, 405, and 407.

All of them relate to un-sanitized inputs in one way or another.

On Mon, Nov 13, 2017 at 12:50 PM, Ted Yu <yu...@gmail.com> wrote:

> Mike:
> Can you share your finding w.r.t. IncrementalTableBackupClient and
> MapReduceBackupCopyJob
> ?
>
> IncrementalTableBackupClient utilizes WALPlayer directly.
>
> I wonder what vulnerability there is.
>
> Thanks
>
> On Mon, Nov 13, 2017 at 9:02 AM, Mike Drob <md...@apache.org> wrote:
>
> > I know I'm late to the party here, but I've got another potential blocker
> > to add.
> >
> > We just ran an HP fortify scan internally and the results did not look
> > good, specifically on IncrementalTableBackupClient and
> > MapReduceBackupCopyJob. I'm still sorting through whether these are
> > actually exploitable, or whether it's a symptom of MapReduce being an
> > arbitrary code execution framework anyway but this does make me wonder
> > about the overall security posture.
> >
> > I see  "HBase Backup/Restore Phase 3: Security"[1] resolved as "Later"
> and
> > claims that it will be implemented in the client, both of which make me
> > uncomfortable. Security Later is a general bad practice, and it is very
> > rarely correct to rely on client-side security for anything.
> >
> > Is there another issue that covers security? Do we rely completely on
> HDFS
> > security here for more than just the DistCP? What kind of testing has
> been
> > done with security, do we have assurances that the backups aren't
> > accidentally exposing tables to the world?
> >
> > Thanks,
> > Mike
> >
> > [1]: https://issues.apache.org/jira/browse/HBASE-14138
> >
> > On Mon, Nov 13, 2017 at 10:38 AM, Josh Elser <el...@apache.org> wrote:
> >
> > > On 11/11/17 5:31 PM, Stack wrote:
> > >
> > >> Don't want to make any assumptions, but I hope the lack of hard
> > objection
> > >>> can be interpreted as (begrudging, perhaps) acceptance of the plan.
> Let
> > >>> me/us know when possible, please!
> > >>>
> > >>>
> > >>> Plan seems fine.
> > >>
> > >> Are you the owner of this feature now Josh or just shepherding it in?
> > >>
> > >
> > > Thanks, Stack.
> > >
> > > Good question: should have included that out-right. Vlad, Ted, and
> myself
> > > had a chat on this last week.
> > >
> > > While Vlad is polishing HBASE-17852 and HBASE-17825, I told him I'll
> help
> > > out with the HBASE-18892 (testing) and the Book update. Was waiting for
> > > some consensus on the testing gdoc before picking that up.
> > >
> > > I think Vlad is still the owner, but you could certainly call me a
> > > shepherd. I also answer to "sherpa" ;)
> > >
> >
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Ted Yu <yu...@gmail.com>.
Mike:
Can you share your finding w.r.t. IncrementalTableBackupClient and
MapReduceBackupCopyJob
?

IncrementalTableBackupClient utilizes WALPlayer directly.

I wonder what vulnerability there is.

Thanks

On Mon, Nov 13, 2017 at 9:02 AM, Mike Drob <md...@apache.org> wrote:

> I know I'm late to the party here, but I've got another potential blocker
> to add.
>
> We just ran an HP fortify scan internally and the results did not look
> good, specifically on IncrementalTableBackupClient and
> MapReduceBackupCopyJob. I'm still sorting through whether these are
> actually exploitable, or whether it's a symptom of MapReduce being an
> arbitrary code execution framework anyway but this does make me wonder
> about the overall security posture.
>
> I see  "HBase Backup/Restore Phase 3: Security"[1] resolved as "Later" and
> claims that it will be implemented in the client, both of which make me
> uncomfortable. Security Later is a general bad practice, and it is very
> rarely correct to rely on client-side security for anything.
>
> Is there another issue that covers security? Do we rely completely on HDFS
> security here for more than just the DistCP? What kind of testing has been
> done with security, do we have assurances that the backups aren't
> accidentally exposing tables to the world?
>
> Thanks,
> Mike
>
> [1]: https://issues.apache.org/jira/browse/HBASE-14138
>
> On Mon, Nov 13, 2017 at 10:38 AM, Josh Elser <el...@apache.org> wrote:
>
> > On 11/11/17 5:31 PM, Stack wrote:
> >
> >> Don't want to make any assumptions, but I hope the lack of hard
> objection
> >>> can be interpreted as (begrudging, perhaps) acceptance of the plan. Let
> >>> me/us know when possible, please!
> >>>
> >>>
> >>> Plan seems fine.
> >>
> >> Are you the owner of this feature now Josh or just shepherding it in?
> >>
> >
> > Thanks, Stack.
> >
> > Good question: should have included that out-right. Vlad, Ted, and myself
> > had a chat on this last week.
> >
> > While Vlad is polishing HBASE-17852 and HBASE-17825, I told him I'll help
> > out with the HBASE-18892 (testing) and the Book update. Was waiting for
> > some consensus on the testing gdoc before picking that up.
> >
> > I think Vlad is still the owner, but you could certainly call me a
> > shepherd. I also answer to "sherpa" ;)
> >
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Vladimir Rodionov <vl...@gmail.com>.
Yes, you are correct, Sean :)

On Mon, Nov 13, 2017 at 10:16 AM, Sean Busbey <bu...@apache.org> wrote:

> On Mon, Nov 13, 2017 at 11:46 AM, Vladimir Rodionov
> <vl...@gmail.com> wrote:
> >>>Is there a high-level overview of what the feature should be able to do
> in
> >>>hbase-2? (The issue HBASE-14414  has a bunch of issues hanging off it.
> It
> >>>is hard to get an overview).
> >
> > Yes, it is in hbase book, Michael. HBASE-16754
> >
>
>
> HBASE-16754 Regions failing compaction due to referencing non-existent
> store file
>
>
> Probably HBASE-16574?
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Sean Busbey <bu...@apache.org>.
On Mon, Nov 13, 2017 at 11:46 AM, Vladimir Rodionov
<vl...@gmail.com> wrote:
>>>Is there a high-level overview of what the feature should be able to do in
>>>hbase-2? (The issue HBASE-14414  has a bunch of issues hanging off it. It
>>>is hard to get an overview).
>
> Yes, it is in hbase book, Michael. HBASE-16754
>


HBASE-16754 Regions failing compaction due to referencing non-existent
store file


Probably HBASE-16574?

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Vladimir Rodionov <vl...@gmail.com>.
>>Is there a high-level overview of what the feature should be able to do in
>>hbase-2? (The issue HBASE-14414  has a bunch of issues hanging off it. It
>>is hard to get an overview).

Yes, it is in hbase book, Michael. HBASE-16754

>>Is there
>>anything on what user can expect in terms of size consumptions, resources
>>consumed effecting a backup, or how long a restore will take? I would
think
>>it useful I'd imagine, particularly the latter bit of info as a rough
>>gauge.

Resource consumptions for backup and restore are defined by YARN resource
allocation
to a queue we run both in : backup and restore. That is probably should be
mentioned explicitly in a a doc

Restore is completely sequence of M/R jobs, backup has some non M/R  stages:
snapshot (full backup) and distributed log roll stage


>> Has anyone tried the example in the doc? (Backup to s3?).

Yes,  as far as I remember, some time ago. We will include s3 testing into
beta2 testing cycle

-Vlad

On Mon, Nov 13, 2017 at 9:02 AM, Mike Drob <md...@apache.org> wrote:

> I know I'm late to the party here, but I've got another potential blocker
> to add.
>
> We just ran an HP fortify scan internally and the results did not look
> good, specifically on IncrementalTableBackupClient and
> MapReduceBackupCopyJob. I'm still sorting through whether these are
> actually exploitable, or whether it's a symptom of MapReduce being an
> arbitrary code execution framework anyway but this does make me wonder
> about the overall security posture.
>
> I see  "HBase Backup/Restore Phase 3: Security"[1] resolved as "Later" and
> claims that it will be implemented in the client, both of which make me
> uncomfortable. Security Later is a general bad practice, and it is very
> rarely correct to rely on client-side security for anything.
>
> Is there another issue that covers security? Do we rely completely on HDFS
> security here for more than just the DistCP? What kind of testing has been
> done with security, do we have assurances that the backups aren't
> accidentally exposing tables to the world?
>
> Thanks,
> Mike
>
> [1]: https://issues.apache.org/jira/browse/HBASE-14138
>
> On Mon, Nov 13, 2017 at 10:38 AM, Josh Elser <el...@apache.org> wrote:
>
> > On 11/11/17 5:31 PM, Stack wrote:
> >
> >> Don't want to make any assumptions, but I hope the lack of hard
> objection
> >>> can be interpreted as (begrudging, perhaps) acceptance of the plan. Let
> >>> me/us know when possible, please!
> >>>
> >>>
> >>> Plan seems fine.
> >>
> >> Are you the owner of this feature now Josh or just shepherding it in?
> >>
> >
> > Thanks, Stack.
> >
> > Good question: should have included that out-right. Vlad, Ted, and myself
> > had a chat on this last week.
> >
> > While Vlad is polishing HBASE-17852 and HBASE-17825, I told him I'll help
> > out with the HBASE-18892 (testing) and the Book update. Was waiting for
> > some consensus on the testing gdoc before picking that up.
> >
> > I think Vlad is still the owner, but you could certainly call me a
> > shepherd. I also answer to "sherpa" ;)
> >
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Mike Drob <md...@apache.org>.
I know I'm late to the party here, but I've got another potential blocker
to add.

We just ran an HP fortify scan internally and the results did not look
good, specifically on IncrementalTableBackupClient and
MapReduceBackupCopyJob. I'm still sorting through whether these are
actually exploitable, or whether it's a symptom of MapReduce being an
arbitrary code execution framework anyway but this does make me wonder
about the overall security posture.

I see  "HBase Backup/Restore Phase 3: Security"[1] resolved as "Later" and
claims that it will be implemented in the client, both of which make me
uncomfortable. Security Later is a general bad practice, and it is very
rarely correct to rely on client-side security for anything.

Is there another issue that covers security? Do we rely completely on HDFS
security here for more than just the DistCP? What kind of testing has been
done with security, do we have assurances that the backups aren't
accidentally exposing tables to the world?

Thanks,
Mike

[1]: https://issues.apache.org/jira/browse/HBASE-14138

On Mon, Nov 13, 2017 at 10:38 AM, Josh Elser <el...@apache.org> wrote:

> On 11/11/17 5:31 PM, Stack wrote:
>
>> Don't want to make any assumptions, but I hope the lack of hard objection
>>> can be interpreted as (begrudging, perhaps) acceptance of the plan. Let
>>> me/us know when possible, please!
>>>
>>>
>>> Plan seems fine.
>>
>> Are you the owner of this feature now Josh or just shepherding it in?
>>
>
> Thanks, Stack.
>
> Good question: should have included that out-right. Vlad, Ted, and myself
> had a chat on this last week.
>
> While Vlad is polishing HBASE-17852 and HBASE-17825, I told him I'll help
> out with the HBASE-18892 (testing) and the Book update. Was waiting for
> some consensus on the testing gdoc before picking that up.
>
> I think Vlad is still the owner, but you could certainly call me a
> shepherd. I also answer to "sherpa" ;)
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Josh Elser <el...@apache.org>.
On 11/11/17 5:31 PM, Stack wrote:
>> Don't want to make any assumptions, but I hope the lack of hard objection
>> can be interpreted as (begrudging, perhaps) acceptance of the plan. Let
>> me/us know when possible, please!
>>
>>
> Plan seems fine.
> 
> Are you the owner of this feature now Josh or just shepherding it in?

Thanks, Stack.

Good question: should have included that out-right. Vlad, Ted, and 
myself had a chat on this last week.

While Vlad is polishing HBASE-17852 and HBASE-17825, I told him I'll 
help out with the HBASE-18892 (testing) and the Book update. Was waiting 
for some consensus on the testing gdoc before picking that up.

I think Vlad is still the owner, but you could certainly call me a 
shepherd. I also answer to "sherpa" ;)

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Stack <st...@duboce.net>.
On Fri, Nov 10, 2017 at 3:17 PM, Josh Elser <el...@apache.org> wrote:

> A few more areas of movement:
>
> * I've cleaned up HBASE-14414 to encapsulate what is still being proposed
> for beta-1. Shouldn't be any surprises here. Things outstanding deferred to
> HBASE-17362 (fixVersion=2.1.0).
>


Is there a high-level overview of what the feature should be able to do in
hbase-2? (The issue HBASE-14414  has a bunch of issues hanging off it. It
is hard to get an overview).

I read over the doc patch (its nice). It seems to give a high-level
overview.  It has a limitations section on the end which is good. Is there
anything on what user can expect in terms of size consumptions, resources
consumed effecting a backup, or how long a restore will take? I would think
it useful I'd imagine, particularly the latter bit of info as a rough
gauge. Is this where I (a user) gets an overview of what would be in hbase2?

Doc says merge of incrementals works? Thought I saw somewhere that it was
not done -- could be wrong.

Has anyone tried the example in the doc? (Backup to s3?).


> Don't want to make any assumptions, but I hope the lack of hard objection
> can be interpreted as (begrudging, perhaps) acceptance of the plan. Let
> me/us know when possible, please!
>
>
Plan seems fine.

Are you the owner of this feature now Josh or just shepherding it in?

Thanks,
St.Ack





> Thanks all.
>
>
> On 11/7/17 11:30 PM, Josh Elser wrote:
>
>> Folks,
>>
>> I've been working with Vlad and Ted offline to make sure we have a plan
>> that addresses the implementation gaps Vlad sees and the barriers-for-entry
>> previously stated to keep the feature in HBase 2.0. My hope is that this
>> can be an honest discussion given 2.0-beta timelines, with a concrete
>> action plan. I'm trying my best to not re-hash the logic/reasoning/caveats
>> behind previous concerns; anything folks feel is a blocker that I haven't
>> covered below is unintentional.
>>
>> The list:
>>
>> 1. Documentation. It must be updated and committed, ensuring it covers
>> the details operators/architects need to know to use it effectively
>> (HBASE-16574). Vlad will help with content, myself and/or Frank will get it
>> updated to asciidoc.
>>
>> 2. Distributed testing missing. Vlad has taken my previous document on
>> goals and translated that into an implementation outline[1]. Ted and I have
>> already weighed in -- I believe it hits the salient points for the quality
>> of testing we're looking for. I'll get started on this while Vlad does #4
>> (after consensus on approach, of course). Needs JIRA issue (maybe?).
>>
>> 3. Operator utility to verify backups. In abstract, this should just be
>> the same guts of a tool like VerifyReplication. In practice, this should be
>> the same code that #3 uses (if not _actually_ the same guts as
>> VerifyReplication). The hope is that this will be encapsulated (time-wise)
>> by #3. Needs JIRA issue (maybe?).
>>
>> 4. Polish DistCP for bulk-loaded files/fault-tolerance (HBASE-17852). I
>> don't have specifics here -- will rely on Vlad to correct me if there's a
>> better JIRA issue to track than the aforementioned. Will rely on details to
>> show up the JIRA issue to track it.
>>
>> Current due dates:
>>
>> 1. End of week (2017/11/10)
>> 2. Before US Thanksgiving (2017/11/22)
>> 3. Same as #2
>> 4. Same as #1
>>
>> My current thought is that this is reasonable for implementation times,
>> and would not derail the rest of the beta-1 train. I appreciate the
>> patience from all parties, and I hope that those trying to make this better
>> can find a little more time to give some feedback. Thanks for the long read
>> if nothing else.
>>
>> - Josh
>>
>> [1] https://docs.google.com/document/d/1xbPlLKjOcPq2LDqjbSkF6uND
>> AG0mzgOxek6P3POLeMc/edit?usp=sharing
>>
>

Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

Posted by Josh Elser <el...@apache.org>.
A few more areas of movement:

* I've cleaned up HBASE-14414 to encapsulate what is still being 
proposed for beta-1. Shouldn't be any surprises here. Things outstanding 
deferred to HBASE-17362 (fixVersion=2.1.0).
* I've just updated the doc updates on HBASE-16574. Thanks to Vlad for 
the content changes. I tried to update some more "important" 
considerations that users would run into.
* There's been some more discussion on the gdoc (see prev message) if 
those haven't noticed otherwise.

Don't want to make any assumptions, but I hope the lack of hard 
objection can be interpreted as (begrudging, perhaps) acceptance of the 
plan. Let me/us know when possible, please!

Thanks all.

On 11/7/17 11:30 PM, Josh Elser wrote:
> Folks,
> 
> I've been working with Vlad and Ted offline to make sure we have a plan 
> that addresses the implementation gaps Vlad sees and the 
> barriers-for-entry previously stated to keep the feature in HBase 2.0. 
> My hope is that this can be an honest discussion given 2.0-beta 
> timelines, with a concrete action plan. I'm trying my best to not 
> re-hash the logic/reasoning/caveats behind previous concerns; anything 
> folks feel is a blocker that I haven't covered below is unintentional.
> 
> The list:
> 
> 1. Documentation. It must be updated and committed, ensuring it covers 
> the details operators/architects need to know to use it effectively 
> (HBASE-16574). Vlad will help with content, myself and/or Frank will get 
> it updated to asciidoc.
> 
> 2. Distributed testing missing. Vlad has taken my previous document on 
> goals and translated that into an implementation outline[1]. Ted and I 
> have already weighed in -- I believe it hits the salient points for the 
> quality of testing we're looking for. I'll get started on this while 
> Vlad does #4 (after consensus on approach, of course). Needs JIRA issue 
> (maybe?).
> 
> 3. Operator utility to verify backups. In abstract, this should just be 
> the same guts of a tool like VerifyReplication. In practice, this should 
> be the same code that #3 uses (if not _actually_ the same guts as 
> VerifyReplication). The hope is that this will be encapsulated 
> (time-wise) by #3. Needs JIRA issue (maybe?).
> 
> 4. Polish DistCP for bulk-loaded files/fault-tolerance (HBASE-17852). I 
> don't have specifics here -- will rely on Vlad to correct me if there's 
> a better JIRA issue to track than the aforementioned. Will rely on 
> details to show up the JIRA issue to track it.
> 
> Current due dates:
> 
> 1. End of week (2017/11/10)
> 2. Before US Thanksgiving (2017/11/22)
> 3. Same as #2
> 4. Same as #1
> 
> My current thought is that this is reasonable for implementation times, 
> and would not derail the rest of the beta-1 train. I appreciate the 
> patience from all parties, and I hope that those trying to make this 
> better can find a little more time to give some feedback. Thanks for the 
> long read if nothing else.
> 
> - Josh
> 
> [1] 
> https://docs.google.com/document/d/1xbPlLKjOcPq2LDqjbSkF6uNDAG0mzgOxek6P3POLeMc/edit?usp=sharing 
>