You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Todd Lipcon <to...@cloudera.com> on 2011/03/28 22:14:06 UTC

Branch for HDFS-1073 and related work

Hi all,

I discussed this with a couple folks over the weekend who are involved in
the project, but wanted to let the dev community at large know:

I'm planning on creating a new SVN branch for HDFS-1073 and its subtasks.
For those not aware, HDFS-1073 is a rethinking of how the NN, 2NN, and
BackupNode store images and edit logs on disk. This will help make HA more
manageable down the line and has a lot of operational benefits as described
in the JIRA. The "related work" is the addition of transaction IDs to the
persistent storage of the NN, and some refactoring in the edit log
subsystem.

The reasoning behind creating a branch is that, since this is a fairly large
change, it is easier to develop through a number of subtasks. But at some of
the intermediate points, various components will be temporarily broken.
Developing on a branch will allow us to make incremental progress without
worrying about keeping all tests green after every change. We will of course
make sure all tests pass before merging back into trunk. There will also be
another opportunity to review before the merge into trunk. This is the same
development methodology as was done for the 0.21 Append work and is now
being used for Federation.

Given that there will be another opportunity to review these changes before
merging into trunk, I would also like to propose that Ivan Kelly be granted
the ability to "+1" patches on this branch despite not being an HDFS
committer. Ivan is actively contributing on this project and understands the
code well.

Unless there are any objections, I will create this branch and an associated
Fix Version on JIRA tonight.

Thanks!
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Branch for HDFS-1073 and related work

Posted by Todd Lipcon <to...@cloudera.com>.

Hey folks,

There's been a bit of off-list discussion about this, so I wanted to
clarify a few points about the development style for this branch,
since I think there was some confusion.

First, a little bit about the motivation for following something
closer to CTR rather than RTC on the branch:

As mentioned by Konstantin and Jakob both, the HDFS-1073 project is
very complicated. To me, this means the following things:

- The changes should be reviewed by the experts in this area of the
code (in particular Konstantin, Sanjay, and Dhruba have done a lot of
work in this area, though others of course know it well too)
- The changes should be split into small patches to make the reviews
manageable. Last October I had a working prototype of the project
(albeit with a slightly different design) and it was a 200+KB patch.
It's unreasonable to expect someone to review that (in fact no one
did).

Trying to make progress on this project on the main branch over the
last several months has shown that it's very hard to achieve both of
the above, for the following reasons:

- The experts on this area of the code are very busy, so review
turnaround can be 3-4 days.
- Splitting into small patches limits how far "ahead of reviews" I can
get on development. I've been working on this on a git branch to try
to "stay ahead", but found that responding to even small review
feedback on patch #1 in a series of 8 causes patch conflicts on the
remaining 7. Rather than spending most of my time actually making
progress, the cycle has been:

1. post patch
2. wait 3 days for review
3. page back in the project, respond with simple changes (5 minutes of
actual work),
4. spend several hours rebasing remaining patches on top of changed base patch
5. rinse and repeat

Proposing a modified CTR process is an effort to cut step 2 down, and
hopefully move the "development wave-front" to be just ahead of the
reviewer, so that step 4 is completely removed.

Of course I understand that people are busy and may not get to review
every patch within a day of being posted (I myself am probably
guiltier than most on this). So, people are encouraged to review
patches even after they've been put on the branch, and the review
feedback will be addressed prior to the merge. Addressing review
feedback in a separate followup commit means that it won't invalidate
the patch series in between the two, and I think will speed up
development a lot.


To put this another way, I'm neither proposing Commit-Then-Review or
Review-Then-Commit, but rather Stage-Then-Review-Then-Merge. The
branch can be seen as a "staging ground" for a patch series eventually
targeted at trunk, just like many people use local git branches today.
The difference here is that it's making use of ASF infrastructure like
SVN and JIRA so that review and history is easier. Additionally, prior
to merge, it will be easier to do some QA since there will be an SVN
branch to point folks to.

I understand this breaks with "tradition" a bit, but we need to get
this project done, and the way it was going before just wasn't
progressing fast enough. Let's try this as an experiment, and if it
fails miserably, or the other committers don't trust the code quality,
we can always vote not to merge it into trunk.

Thanks
-Todd

On Mon, Mar 28, 2011 at 11:18 PM, Todd Lipcon <to...@cloudera.com> wrote:
> I've created this branch "HDFS-1073" as discussed. In this branch there's a
> new CHANGES.HDFS-1073.txt file -- please add entries into this file rather
> than CHANGES.txt if you commit stuff to the branch. This will simplify our
> merges later on, I think. We'll transfer the changes back into CHANGES.txt
> and delete this file when we do the final merge into trunk.
> I've also created an associated Fix Version on JIRA.
>
> Regarding review process, turnaround time in reviews has made this work
> progress slower than I'd like. For example, HDFS-1538 has had patches
> available with no comments for several months. Of course, I've had a few
> bouts where I've been quite slow to respond to feedback as well -- sorry for
> that. This is now my top priority so I should be making rapid progress on
> development from here on out.
> I agree with Jakob that this is tricky code and deserves thorough review,
> and also that it's easier to review small patches than a giant merge. So, I
> will make sure that the work going into the branch is associated with
> coherent changes, each associated with a JIRA, to aid reviewers in
> understanding the changes so much as is practical. But, I think something
> closer to commit-then-review will allow us to make progress much faster. As
> a compromise, I'll be following the following policy:
> - A patch will be uploaded to the JIRA for review like usual
> - If another committer provides a +1, it may be committed at that point,
> just like usual.
> - If no committer provides +1 (or a review asking for changes) within 24
> business hours, it will be committed to the branch under "commit then
> review" policy.
> Of course if any committer feels that code needs to be amended, he or she
> should feel free to open a new JIRA against the branch including the review
> comments, and they will be addressed before the merge into trunk. And just
> like with any branch merge, ample time will be given for the community to
> review both the large merge commit as well as the individual historical
> commits of the branch, before it goes into trunk.
> Thanks!
> -Todd
> On Mon, Mar 28, 2011 at 9:24 PM, Konstantin Shvachko <sh...@gmail.com>
> wrote:
>>
>> +1 for creating a branch.
>> Agree with Jakob this should not mean less intensive reviewing.
>> --Konstantin
>>
>> On Mon, Mar 28, 2011 at 4:39 PM, Jakob Homan <jg...@gmail.com> wrote:
>>
>> > Doing this work on a branch makes sense.  +1.
>> >
>> > However, the patches that have been committed so far required
>> > extensive review and revision, and in one case, an addendum patch.
>> > Additionally, the patches in related JIRAs such as HDFS-1557 and
>> > HDFS-1572 have required multiple revisions and corrections.  While
>> > it's up to each committer which +1 they're willing to accept, I don't
>> > see any harm in keeping our current standards for review, considering
>> > the importance and difficulty of this work.  In fact, since moving the
>> > work to a branch will also move it off of many reviewers' radar, it
>> > may even be reasonable to increase the scrutiny.  Reviewing giant
>> > branch merges is difficult and, I believe, more error-prone than
>> > reviewing smaller packets of work.  So far these patches have received
>> > timely reviews, if this does not turn out to be the case, let other
>> > committers know so we can do our part.
>> > -jg
>> >
>> >
>> > On Mon, Mar 28, 2011 at 1:40 PM, Dhruba Borthakur <dh...@gmail.com>
>> > wrote:
>> > > +1. I think this will be very helpful in moving the design forward
>> > quickly.
>> > >
>> > > -dhruba
>> > >
>> > >
>> > > On Mon, Mar 28, 2011 at 1:14 PM, Todd Lipcon <to...@cloudera.com>
>> > > wrote:
>> > >
>> > >> Hi all,
>> > >>
>> > >> I discussed this with a couple folks over the weekend who are
>> > >> involved
>> > in
>> > >> the project, but wanted to let the dev community at large know:
>> > >>
>> > >> I'm planning on creating a new SVN branch for HDFS-1073 and its
>> > subtasks.
>> > >> For those not aware, HDFS-1073 is a rethinking of how the NN, 2NN,
>> > >> and
>> > >> BackupNode store images and edit logs on disk. This will help make HA
>> > more
>> > >> manageable down the line and has a lot of operational benefits as
>> > described
>> > >> in the JIRA. The "related work" is the addition of transaction IDs to
>> > the
>> > >> persistent storage of the NN, and some refactoring in the edit log
>> > >> subsystem.
>> > >>
>> > >> The reasoning behind creating a branch is that, since this is a
>> > >> fairly
>> > >> large
>> > >> change, it is easier to develop through a number of subtasks. But at
>> > some
>> > >> of
>> > >> the intermediate points, various components will be temporarily
>> > >> broken.
>> > >> Developing on a branch will allow us to make incremental progress
>> > without
>> > >> worrying about keeping all tests green after every change. We will of
>> > >> course
>> > >> make sure all tests pass before merging back into trunk. There will
>> > >> also
>> > be
>> > >> another opportunity to review before the merge into trunk. This is
>> > >> the
>> > same
>> > >> development methodology as was done for the 0.21 Append work and is
>> > >> now
>> > >> being used for Federation.
>> > >>
>> > >> Given that there will be another opportunity to review these changes
>> > before
>> > >> merging into trunk, I would also like to propose that Ivan Kelly be
>> > granted
>> > >> the ability to "+1" patches on this branch despite not being an HDFS
>> > >> committer. Ivan is actively contributing on this project and
>> > >> understands
>> > >> the
>> > >> code well.
>> > >>
>> > >> Unless there are any objections, I will create this branch and an
>> > >> associated
>> > >> Fix Version on JIRA tonight.
>> > >>
>> > >> Thanks!
>> > >> -Todd
>> > >> --
>> > >> Todd Lipcon
>> > >> Software Engineer, Cloudera
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Connect to me at http://www.facebook.com/dhruba
>> > >
>> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Branch for HDFS-1073 and related work

Posted by Todd Lipcon <to...@cloudera.com>.

I've created this branch "HDFS-1073" as discussed. In this branch there's a
new CHANGES.HDFS-1073.txt file -- please add entries into this file rather
than CHANGES.txt if you commit stuff to the branch. This will simplify our
merges later on, I think. We'll transfer the changes back into CHANGES.txt
and delete this file when we do the final merge into trunk.

I've also created an associated Fix Version on JIRA.

Regarding review process, turnaround time in reviews has made this work
progress slower than I'd like. For example, HDFS-1538 has had patches
available with no comments for several months. Of course, I've had a few
bouts where I've been quite slow to respond to feedback as well -- sorry for
that. This is now my top priority so I should be making rapid progress on
development from here on out.

I agree with Jakob that this is tricky code and deserves thorough review,
and also that it's easier to review small patches than a giant merge. So, I
will make sure that the work going into the branch is associated with
coherent changes, each associated with a JIRA, to aid reviewers in
understanding the changes so much as is practical. But, I think something
closer to commit-then-review will allow us to make progress much faster. As
a compromise, I'll be following the following policy:

- A patch will be uploaded to the JIRA for review like usual
- If another committer provides a +1, it may be committed at that point,
just like usual.
- If no committer provides +1 (or a review asking for changes) within 24
business hours, it will be committed to the branch under "commit then
review" policy.

Of course if any committer feels that code needs to be amended, he or she
should feel free to open a new JIRA against the branch including the review
comments, and they will be addressed before the merge into trunk. And just
like with any branch merge, ample time will be given for the community to
review both the large merge commit as well as the individual historical
commits of the branch, before it goes into trunk.

Thanks!

-Todd

On Mon, Mar 28, 2011 at 9:24 PM, Konstantin Shvachko
<sh...@gmail.com>wrote:

> +1 for creating a branch.
> Agree with Jakob this should not mean less intensive reviewing.
> --Konstantin
>
> On Mon, Mar 28, 2011 at 4:39 PM, Jakob Homan <jg...@gmail.com> wrote:
>
> > Doing this work on a branch makes sense.  +1.
> >
> > However, the patches that have been committed so far required
> > extensive review and revision, and in one case, an addendum patch.
> > Additionally, the patches in related JIRAs such as HDFS-1557 and
> > HDFS-1572 have required multiple revisions and corrections.  While
> > it's up to each committer which +1 they're willing to accept, I don't
> > see any harm in keeping our current standards for review, considering
> > the importance and difficulty of this work.  In fact, since moving the
> > work to a branch will also move it off of many reviewers' radar, it
> > may even be reasonable to increase the scrutiny.  Reviewing giant
> > branch merges is difficult and, I believe, more error-prone than
> > reviewing smaller packets of work.  So far these patches have received
> > timely reviews, if this does not turn out to be the case, let other
> > committers know so we can do our part.
> > -jg
> >
> >
> > On Mon, Mar 28, 2011 at 1:40 PM, Dhruba Borthakur <dh...@gmail.com>
> > wrote:
> > > +1. I think this will be very helpful in moving the design forward
> > quickly.
> > >
> > > -dhruba
> > >
> > >
> > > On Mon, Mar 28, 2011 at 1:14 PM, Todd Lipcon <to...@cloudera.com>
> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I discussed this with a couple folks over the weekend who are involved
> > in
> > >> the project, but wanted to let the dev community at large know:
> > >>
> > >> I'm planning on creating a new SVN branch for HDFS-1073 and its
> > subtasks.
> > >> For those not aware, HDFS-1073 is a rethinking of how the NN, 2NN, and
> > >> BackupNode store images and edit logs on disk. This will help make HA
> > more
> > >> manageable down the line and has a lot of operational benefits as
> > described
> > >> in the JIRA. The "related work" is the addition of transaction IDs to
> > the
> > >> persistent storage of the NN, and some refactoring in the edit log
> > >> subsystem.
> > >>
> > >> The reasoning behind creating a branch is that, since this is a fairly
> > >> large
> > >> change, it is easier to develop through a number of subtasks. But at
> > some
> > >> of
> > >> the intermediate points, various components will be temporarily
> broken.
> > >> Developing on a branch will allow us to make incremental progress
> > without
> > >> worrying about keeping all tests green after every change. We will of
> > >> course
> > >> make sure all tests pass before merging back into trunk. There will
> also
> > be
> > >> another opportunity to review before the merge into trunk. This is the
> > same
> > >> development methodology as was done for the 0.21 Append work and is
> now
> > >> being used for Federation.
> > >>
> > >> Given that there will be another opportunity to review these changes
> > before
> > >> merging into trunk, I would also like to propose that Ivan Kelly be
> > granted
> > >> the ability to "+1" patches on this branch despite not being an HDFS
> > >> committer. Ivan is actively contributing on this project and
> understands
> > >> the
> > >> code well.
> > >>
> > >> Unless there are any objections, I will create this branch and an
> > >> associated
> > >> Fix Version on JIRA tonight.
> > >>
> > >> Thanks!
> > >> -Todd
> > >> --
> > >> Todd Lipcon
> > >> Software Engineer, Cloudera
> > >>
> > >
> > >
> > >
> > > --
> > > Connect to me at http://www.facebook.com/dhruba
> > >
> >
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Branch for HDFS-1073 and related work

Posted by Konstantin Shvachko <sh...@gmail.com>.

+1 for creating a branch.
Agree with Jakob this should not mean less intensive reviewing.
--Konstantin

On Mon, Mar 28, 2011 at 4:39 PM, Jakob Homan <jg...@gmail.com> wrote:

> Doing this work on a branch makes sense.  +1.
>
> However, the patches that have been committed so far required
> extensive review and revision, and in one case, an addendum patch.
> Additionally, the patches in related JIRAs such as HDFS-1557 and
> HDFS-1572 have required multiple revisions and corrections.  While
> it's up to each committer which +1 they're willing to accept, I don't
> see any harm in keeping our current standards for review, considering
> the importance and difficulty of this work.  In fact, since moving the
> work to a branch will also move it off of many reviewers' radar, it
> may even be reasonable to increase the scrutiny.  Reviewing giant
> branch merges is difficult and, I believe, more error-prone than
> reviewing smaller packets of work.  So far these patches have received
> timely reviews, if this does not turn out to be the case, let other
> committers know so we can do our part.
> -jg
>
>
> On Mon, Mar 28, 2011 at 1:40 PM, Dhruba Borthakur <dh...@gmail.com>
> wrote:
> > +1. I think this will be very helpful in moving the design forward
> quickly.
> >
> > -dhruba
> >
> >
> > On Mon, Mar 28, 2011 at 1:14 PM, Todd Lipcon <to...@cloudera.com> wrote:
> >
> >> Hi all,
> >>
> >> I discussed this with a couple folks over the weekend who are involved
> in
> >> the project, but wanted to let the dev community at large know:
> >>
> >> I'm planning on creating a new SVN branch for HDFS-1073 and its
> subtasks.
> >> For those not aware, HDFS-1073 is a rethinking of how the NN, 2NN, and
> >> BackupNode store images and edit logs on disk. This will help make HA
> more
> >> manageable down the line and has a lot of operational benefits as
> described
> >> in the JIRA. The "related work" is the addition of transaction IDs to
> the
> >> persistent storage of the NN, and some refactoring in the edit log
> >> subsystem.
> >>
> >> The reasoning behind creating a branch is that, since this is a fairly
> >> large
> >> change, it is easier to develop through a number of subtasks. But at
> some
> >> of
> >> the intermediate points, various components will be temporarily broken.
> >> Developing on a branch will allow us to make incremental progress
> without
> >> worrying about keeping all tests green after every change. We will of
> >> course
> >> make sure all tests pass before merging back into trunk. There will also
> be
> >> another opportunity to review before the merge into trunk. This is the
> same
> >> development methodology as was done for the 0.21 Append work and is now
> >> being used for Federation.
> >>
> >> Given that there will be another opportunity to review these changes
> before
> >> merging into trunk, I would also like to propose that Ivan Kelly be
> granted
> >> the ability to "+1" patches on this branch despite not being an HDFS
> >> committer. Ivan is actively contributing on this project and understands
> >> the
> >> code well.
> >>
> >> Unless there are any objections, I will create this branch and an
> >> associated
> >> Fix Version on JIRA tonight.
> >>
> >> Thanks!
> >> -Todd
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
> >
> >
> > --
> > Connect to me at http://www.facebook.com/dhruba
> >
>

Re: Branch for HDFS-1073 and related work

Posted by Jakob Homan <jg...@gmail.com>.

Doing this work on a branch makes sense.  +1.

However, the patches that have been committed so far required
extensive review and revision, and in one case, an addendum patch.
Additionally, the patches in related JIRAs such as HDFS-1557 and
HDFS-1572 have required multiple revisions and corrections.  While
it's up to each committer which +1 they're willing to accept, I don't
see any harm in keeping our current standards for review, considering
the importance and difficulty of this work.  In fact, since moving the
work to a branch will also move it off of many reviewers' radar, it
may even be reasonable to increase the scrutiny.  Reviewing giant
branch merges is difficult and, I believe, more error-prone than
reviewing smaller packets of work.  So far these patches have received
timely reviews, if this does not turn out to be the case, let other
committers know so we can do our part.
-jg


On Mon, Mar 28, 2011 at 1:40 PM, Dhruba Borthakur <dh...@gmail.com> wrote:
> +1. I think this will be very helpful in moving the design forward quickly.
>
> -dhruba
>
>
> On Mon, Mar 28, 2011 at 1:14 PM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> Hi all,
>>
>> I discussed this with a couple folks over the weekend who are involved in
>> the project, but wanted to let the dev community at large know:
>>
>> I'm planning on creating a new SVN branch for HDFS-1073 and its subtasks.
>> For those not aware, HDFS-1073 is a rethinking of how the NN, 2NN, and
>> BackupNode store images and edit logs on disk. This will help make HA more
>> manageable down the line and has a lot of operational benefits as described
>> in the JIRA. The "related work" is the addition of transaction IDs to the
>> persistent storage of the NN, and some refactoring in the edit log
>> subsystem.
>>
>> The reasoning behind creating a branch is that, since this is a fairly
>> large
>> change, it is easier to develop through a number of subtasks. But at some
>> of
>> the intermediate points, various components will be temporarily broken.
>> Developing on a branch will allow us to make incremental progress without
>> worrying about keeping all tests green after every change. We will of
>> course
>> make sure all tests pass before merging back into trunk. There will also be
>> another opportunity to review before the merge into trunk. This is the same
>> development methodology as was done for the 0.21 Append work and is now
>> being used for Federation.
>>
>> Given that there will be another opportunity to review these changes before
>> merging into trunk, I would also like to propose that Ivan Kelly be granted
>> the ability to "+1" patches on this branch despite not being an HDFS
>> committer. Ivan is actively contributing on this project and understands
>> the
>> code well.
>>
>> Unless there are any objections, I will create this branch and an
>> associated
>> Fix Version on JIRA tonight.
>>
>> Thanks!
>> -Todd
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>

Re: Branch for HDFS-1073 and related work

Posted by Dhruba Borthakur <dh...@gmail.com>.

+1. I think this will be very helpful in moving the design forward quickly.

-dhruba


On Mon, Mar 28, 2011 at 1:14 PM, Todd Lipcon <to...@cloudera.com> wrote:

> Hi all,
>
> I discussed this with a couple folks over the weekend who are involved in
> the project, but wanted to let the dev community at large know:
>
> I'm planning on creating a new SVN branch for HDFS-1073 and its subtasks.
> For those not aware, HDFS-1073 is a rethinking of how the NN, 2NN, and
> BackupNode store images and edit logs on disk. This will help make HA more
> manageable down the line and has a lot of operational benefits as described
> in the JIRA. The "related work" is the addition of transaction IDs to the
> persistent storage of the NN, and some refactoring in the edit log
> subsystem.
>
> The reasoning behind creating a branch is that, since this is a fairly
> large
> change, it is easier to develop through a number of subtasks. But at some
> of
> the intermediate points, various components will be temporarily broken.
> Developing on a branch will allow us to make incremental progress without
> worrying about keeping all tests green after every change. We will of
> course
> make sure all tests pass before merging back into trunk. There will also be
> another opportunity to review before the merge into trunk. This is the same
> development methodology as was done for the 0.21 Append work and is now
> being used for Federation.
>
> Given that there will be another opportunity to review these changes before
> merging into trunk, I would also like to propose that Ivan Kelly be granted
> the ability to "+1" patches on this branch despite not being an HDFS
> committer. Ivan is actively contributing on this project and understands
> the
> code well.
>
> Unless there are any objections, I will create this branch and an
> associated
> Fix Version on JIRA tonight.
>
> Thanks!
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Connect to me at http://www.facebook.com/dhruba