You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Aaron Fabbri <fa...@cloudera.com> on 2017/02/02 08:52:23 UTC

Proposal to merge S3Guard feature branch

Hello,

I'd like to propose merging the HADOOP-13345 feature branch to trunk.

I just wrote up a status summary on HADOOP-13998 "initial s3guard preview"
that goes into more detail.

Cheers,
Aaron

Re: Proposal to merge S3Guard feature branch

Posted by Aaron Fabbri <fa...@cloudera.com>.
On Thu, Feb 2, 2017 at 2:56 PM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> > On 2 Feb 2017, at 08:52, Aaron Fabbri <fa...@cloudera.com> wrote:
> >
> > Hello,
> >
> > I'd like to propose merging the HADOOP-13345 feature branch to trunk.
> >
> > I just wrote up a status summary on HADOOP-13998 "initial s3guard
> preview"
> > that goes into more detail.
> >
> > Cheers,
> > Aaron
>
> Even though I've been working on it, I'm not convinced it's ready
>
>
Ok.   Would love if we could track outstanding items in HADOOP-13998 so I
can have some indication of how this branch will terminate.  I worked
really hard this week to knock out the remaining items there in hopes of a
merge.


> 1. there's still "TODO s3guard" in bits of the code
> 2. there's not been that much in terms of active play —that is, beyond
> integration tests and benchmarks
> 3. the db format is still stabilising and once that's out, life gets more
> complex. Example: the version marker last week, HADOOP-13876 this week,
> which I still need to review.
>
> I just don't think it's stable enough.
>

Thanks for your response here.  I hope we can weigh the cost of maintaining
a separate S3AFileSystem version against the risk of earlier integration
with trunk.  I'm pretty biased against long-lived feature branches,
personally.

As I mentioned in the JIRA I plan to work on the way we handle empty
directories in S3A.  This could get painful if we continue change
S3AFileSystem in trunk.  The coming metrics changes I want to do also may
be a source of merge conflicts.


>
> Once it' merged in
>
> -development time slows, cost increases: you need review and a +1 from a
> full committer, not a branch committer
>

Functionally this is what I'm doing today.. Will try to get another branch
committer to help you with the workload though.  Really appreciate the
reviews so far!


> -if any change causes a regression in the functionality of trunk, it's
> more of an issue. A regression before the merge is a detail, one on trunk,
> even if short lived, isn't welcome.
>
>
For sure.  I'd hope that the default setting (S3Guard disabled) should be
very solid by now though.  The documentation has scary "this is
experimental" warnings still if folks try to turn it on.

My work on failure injection and DynamoDB load testing should be some
indication I care about stability very much as well.

Thanks!
Aaron

I'm happy with someone to do their own preview of a 3.0.x + s3guard, say
> "play with this and see how much performance you get", but right now, I
> think it needs a few more weeks before getting the broader review which is
> going to be needed, and everyone working on it is confident that it's going
> to be stable

Re: Proposal to merge S3Guard feature branch

Posted by Steve Loughran <st...@hortonworks.com>.
> On 2 Feb 2017, at 08:52, Aaron Fabbri <fa...@cloudera.com> wrote:
> 
> Hello,
> 
> I'd like to propose merging the HADOOP-13345 feature branch to trunk.
> 
> I just wrote up a status summary on HADOOP-13998 "initial s3guard preview"
> that goes into more detail.
> 
> Cheers,
> Aaron

Even though I've been working on it, I'm not convinced it's ready

1. there's still "TODO s3guard" in bits of the code
2. there's not been that much in terms of active play —that is, beyond integration tests and benchmarks
3. the db format is still stabilising and once that's out, life gets more complex. Example: the version marker last week, HADOOP-13876 this week, which I still need to review.

I just don't think it's stable enough.

Once it' merged in

-development time slows, cost increases: you need review and a +1 from a full committer, not a branch committer
-if any change causes a regression in the functionality of trunk, it's more of an issue. A regression before the merge is a detail, one on trunk, even if short lived, isn't welcome.

I'm happy with someone to do their own preview of a 3.0.x + s3guard, say "play with this and see how much performance you get", but right now, I think it needs a few more weeks before getting the broader review which is going to be needed, and everyone working on it is confident that it's going to be stable