You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Eric Evans <ee...@acunu.com> on 2011/12/28 20:55:35 UTC

Cassandra has moved to Git

While this is something we had talked about for ages, the actual
switch-over happened rather abruptly, and Cassandra's canonical
repository is now hosted in Git.

For instructions on getting started, see
https://git-wip-us.apache.org.  We've also started putting random
administrivia in the wiki at
http://wiki.apache.org/cassandra/GitTransition.

The Github mirror (http://github.com/apache/cassandra) hasn't been
seeing updates since the move, but that will be fixed at some point.
The important thing is that they share identical histories, so new (or
existing forks) are forward-compatible.

There are a few outstanding items being worked on (CI systems for
example), but if you notice something that's been missed don't
hesitate to speak up.  The website will be updated as soon as SVN is
unlocked.

There are also some matters of work-flow or process that we need to
hashed out.  For example, how do we handle reviews now?  Do we
continue to mandate/recommend/allow rebasing?

Thoughts?

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Cassandra has moved to Git

Posted by Eric Evans <ee...@acunu.com>.

On Thu, Dec 29, 2011 at 11:56 AM, Radim Kolar <hs...@sendmail.cz> wrote:
> git://git.apache.org/cassandra.git
>
> this still works?

I'm not sure what the status of this is, or what the future holds for
it.  I would stick with http://git-wip.us.apache.org to be on the
safe-side.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Cassandra has moved to Git

Posted by Radim Kolar <hs...@sendmail.cz>.

git://git.apache.org/cassandra.git

this still works?

Re: Cassandra has moved to Git

Posted by Eric Evans <ee...@acunu.com>.

On Thu, Dec 29, 2011 at 2:30 PM, Eric Evans <ee...@acunu.com> wrote:
> On Thu, Dec 29, 2011 at 9:18 AM, Eric Evans <ee...@acunu.com> wrote:
>> On Thu, Dec 29, 2011 at 12:08 AM, Dave Brosius <db...@apache.org> wrote:
>>> doing
>>>
>>> git clone http://git-wip-us.apache.org/repos/asf/cassandra.git cassandra
>>>
>>> proceeded as a normal clone until the end when i received
>>>
>>> warning: remote HEAD refers to nonexistent ref, unable to checkout.
>>>
>>> any ideas what i'm doing wrong?
>>
>> That's because HEAD points to master by default.  Just checkout trunk
>> and you should be OK.  See:
>> http://wiki.apache.org/cassandra/GitTransition
>>
>> We'll get this fixed.
>
> FYI: https://issues.apache.org/jira/browse/INFRA-4258

Update: This is fixed (last week actually :))

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Cassandra has moved to Git

Posted by Eric Evans <ee...@acunu.com>.

On Thu, Dec 29, 2011 at 9:18 AM, Eric Evans <ee...@acunu.com> wrote:
> On Thu, Dec 29, 2011 at 12:08 AM, Dave Brosius <db...@apache.org> wrote:
>> doing
>>
>> git clone http://git-wip-us.apache.org/repos/asf/cassandra.git cassandra
>>
>> proceeded as a normal clone until the end when i received
>>
>> warning: remote HEAD refers to nonexistent ref, unable to checkout.
>>
>> any ideas what i'm doing wrong?
>
> That's because HEAD points to master by default.  Just checkout trunk
> and you should be OK.  See:
> http://wiki.apache.org/cassandra/GitTransition
>
> We'll get this fixed.

FYI: https://issues.apache.org/jira/browse/INFRA-4258

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Cassandra has moved to Git

Posted by Eric Evans <ee...@acunu.com>.

On Thu, Dec 29, 2011 at 12:08 AM, Dave Brosius <db...@apache.org> wrote:
> On 12/28/2011 02:55 PM, Eric Evans wrote:
>>
>> While this is something we had talked about for ages, the actual
>> switch-over happened rather abruptly, and Cassandra's canonical
>> repository is now hosted in Git.
>>
>> For instructions on getting started, see
>> https://git-wip-us.apache.org.  We've also started putting random
>> administrivia in the wiki at
>> http://wiki.apache.org/cassandra/GitTransition.
>>
>> The Github mirror (http://github.com/apache/cassandra) hasn't been
>> seeing updates since the move, but that will be fixed at some point.
>> The important thing is that they share identical histories, so new (or
>> existing forks) are forward-compatible.
>>
>> There are a few outstanding items being worked on (CI systems for
>> example), but if you notice something that's been missed don't
>> hesitate to speak up.  The website will be updated as soon as SVN is
>> unlocked.
>>
>> There are also some matters of work-flow or process that we need to
>> hashed out.  For example, how do we handle reviews now?  Do we
>> continue to mandate/recommend/allow rebasing?
>>
>> Thoughts?
>>
> doing
>
> git clone http://git-wip-us.apache.org/repos/asf/cassandra.git cassandra
>
> proceeded as a normal clone until the end when i received
>
> warning: remote HEAD refers to nonexistent ref, unable to checkout.
>
> any ideas what i'm doing wrong?

That's because HEAD points to master by default.  Just checkout trunk
and you should be OK.  See:
http://wiki.apache.org/cassandra/GitTransition

We'll get this fixed.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Cassandra has moved to Git

Posted by Dave Brosius <db...@apache.org>.

On 12/28/2011 02:55 PM, Eric Evans wrote:
> While this is something we had talked about for ages, the actual
> switch-over happened rather abruptly, and Cassandra's canonical
> repository is now hosted in Git.
>
> For instructions on getting started, see
> https://git-wip-us.apache.org.  We've also started putting random
> administrivia in the wiki at
> http://wiki.apache.org/cassandra/GitTransition.
>
> The Github mirror (http://github.com/apache/cassandra) hasn't been
> seeing updates since the move, but that will be fixed at some point.
> The important thing is that they share identical histories, so new (or
> existing forks) are forward-compatible.
>
> There are a few outstanding items being worked on (CI systems for
> example), but if you notice something that's been missed don't
> hesitate to speak up.  The website will be updated as soon as SVN is
> unlocked.
>
> There are also some matters of work-flow or process that we need to
> hashed out.  For example, how do we handle reviews now?  Do we
> continue to mandate/recommend/allow rebasing?
>
> Thoughts?
>
doing

git clone http://git-wip-us.apache.org/repos/asf/cassandra.git cassandra

proceeded as a normal clone until the end when i received

warning: remote HEAD refers to nonexistent ref, unable to checkout.

any ideas what i'm doing wrong?

Re: Cassandra has moved to Git

Posted by Eric Evans <ee...@acunu.com>.

On Wed, Dec 28, 2011 at 9:32 PM, Dave Brosius <db...@apache.org> wrote:
> On 12/28/2011 08:54 PM, Eric Evans wrote:
>>
>> On Wed, Dec 28, 2011 at 6:25 PM, Stephen Connolly
>> <st...@gmail.com>  wrote:
>>>
>>> just the question, where do us contributors who are not committers but
>>> have
>>> cla's on file (ie already asf committers) push our changes?
>>
>> To the best of my knowledge, that distinction doesn't matter.  It's up
>> to the committer to make sure the changes they're pushing are intended
>> for the ASF.
>>
>> As to the where people (with or without CLAs) can/should push their
>> changes, that's part of what needs to be discussed, I guess.
>
> I don't suppose there's anyway a pull request on the github mirror could
> make it back to the apache repo?

I'm sure there is nothing stopping any two Github users from
exchanging changesets via pull requests, and then later pushing them
to the ASF repo, but integrating Github pull requests with Jira, or
any other part of ASFs infrastructure, seems like a stretch (I guess
it couldn't hurt to ask though).

I was thinking there might be something we could do with the
service-hooks integration (Admin->Service Hooks).  Either the
Jira-specific one, or the Post-Receive URLs.  Have it match an issue
number in the commit message, and update the ticket with a link to the
topic branch.

I wonder what, if anything, the already migrated ASF projects are doing here?

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Cassandra has moved to Git

Posted by Dave Brosius <db...@apache.org>.

On 12/28/2011 08:54 PM, Eric Evans wrote:
> On Wed, Dec 28, 2011 at 6:25 PM, Stephen Connolly
> <st...@gmail.com>  wrote:
>> just the question, where do us contributors who are not committers but have
>> cla's on file (ie already asf committers) push our changes?
> To the best of my knowledge, that distinction doesn't matter.  It's up
> to the committer to make sure the changes they're pushing are intended
> for the ASF.
>
> As to the where people (with or without CLAs) can/should push their
> changes, that's part of what needs to be discussed, I guess.
>
>

I don't suppose there's anyway a pull request on the github mirror could 
make it back to the apache repo?

Re: Cassandra has moved to Git

Posted by Eric Evans <ee...@acunu.com>.

On Wed, Dec 28, 2011 at 6:25 PM, Stephen Connolly
<st...@gmail.com> wrote:
> just the question, where do us contributors who are not committers but have
> cla's on file (ie already asf committers) push our changes?

To the best of my knowledge, that distinction doesn't matter.  It's up
to the committer to make sure the changes they're pushing are intended
for the ASF.

As to the where people (with or without CLAs) can/should push their
changes, that's part of what needs to be discussed, I guess.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Cassandra has moved to Git

Posted by Jake Luciani <ja...@gmail.com>.

Hi Stephen,

See
http://mail-archives.apache.org/mod_mbox/www-infrastructure-dev/201112.mbox/%3CA603FFCE-623B-43E9-87F8-39BAA51C72D1@gbiv.com%3E

On Wed, Dec 28, 2011 at 7:25 PM, Stephen Connolly <
stephen.alan.connolly@gmail.com> wrote:

> just the question, where do us contributors who are not committers but have
> cla's on file (ie already asf committers) push our changes?
>
> hoping this change will make contributing easier.
>
> - Stephen
>
> ---
> Sent from my Android phone, so random spelling mistakes, random nonsense
> words and other nonsense are a direct result of using swype to type on the
> screen
> On 28 Dec 2011 19:56, "Eric Evans" <ee...@acunu.com> wrote:
>
> > While this is something we had talked about for ages, the actual
> > switch-over happened rather abruptly, and Cassandra's canonical
> > repository is now hosted in Git.
> >
> > For instructions on getting started, see
> > https://git-wip-us.apache.org.  We've also started putting random
> > administrivia in the wiki at
> > http://wiki.apache.org/cassandra/GitTransition.
> >
> > The Github mirror (http://github.com/apache/cassandra) hasn't been
> > seeing updates since the move, but that will be fixed at some point.
> > The important thing is that they share identical histories, so new (or
> > existing forks) are forward-compatible.
> >
> > There are a few outstanding items being worked on (CI systems for
> > example), but if you notice something that's been missed don't
> > hesitate to speak up.  The website will be updated as soon as SVN is
> > unlocked.
> >
> > There are also some matters of work-flow or process that we need to
> > hashed out.  For example, how do we handle reviews now?  Do we
> > continue to mandate/recommend/allow rebasing?
> >
> > Thoughts?
> >
> > --
> > Eric Evans
> > Acunu | http://www.acunu.com | @acunu
> >
>



-- 
http://twitter.com/tjake

Re: Cassandra has moved to Git

Posted by Stephen Connolly <st...@gmail.com>.

just the question, where do us contributors who are not committers but have
cla's on file (ie already asf committers) push our changes?

hoping this change will make contributing easier.

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 28 Dec 2011 19:56, "Eric Evans" <ee...@acunu.com> wrote:

> While this is something we had talked about for ages, the actual
> switch-over happened rather abruptly, and Cassandra's canonical
> repository is now hosted in Git.
>
> For instructions on getting started, see
> https://git-wip-us.apache.org.  We've also started putting random
> administrivia in the wiki at
> http://wiki.apache.org/cassandra/GitTransition.
>
> The Github mirror (http://github.com/apache/cassandra) hasn't been
> seeing updates since the move, but that will be fixed at some point.
> The important thing is that they share identical histories, so new (or
> existing forks) are forward-compatible.
>
> There are a few outstanding items being worked on (CI systems for
> example), but if you notice something that's been missed don't
> hesitate to speak up.  The website will be updated as soon as SVN is
> unlocked.
>
> There are also some matters of work-flow or process that we need to
> hashed out.  For example, how do we handle reviews now?  Do we
> continue to mandate/recommend/allow rebasing?
>
> Thoughts?
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
>

Re: Cassandra has moved to Git

Posted by Eric Evans <ee...@acunu.com>.

On Tue, Jan 3, 2012 at 1:21 PM, paul cannon <pa...@datastax.com> wrote:
> On Wed, Dec 28, 2011 at 1:55 PM, Eric Evans <ee...@acunu.com> wrote:
>
>> There are also some matters of work-flow or process that we need to
>> hashed out.  For example, how do we handle reviews now?  Do we
>> continue to mandate/recommend/allow rebasing?
>>
>
> Surely we'd want to follow normal git practices here: rebasing is almost
> never appropriate once a branch is pushed to a public repo, where other
> people might have gotten it.
>
> Where you might rebase a plain patch series on top of new developments in a
> target SVN branch, you probably just ought to merge the target git branch
> into your topic branch instead.  Same effect, but retains history.

This would be my preference as well.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Cassandra has moved to Git

Posted by Eric Evans <ee...@acunu.com>.

On Mon, Jan 9, 2012 at 9:41 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
>> The other side of the equation is that there is tremendous power to be had
>> from distributed versioning, and this proposed workflow discourages taking
>> advantage of that
>
> Would you mind elaborating, maybe be a little more concrete on that one.
> Because just to be such we agree, I did not propose to rebase public
> branches or something like that.

So imagine you have 3 actors, Alice, Bob, and Carlos.

Alice grabs issue CASSANDRA-5555 and sets out to fix a very annoying
bug, she creates a branch locally called 5555 and proceeds to hack.
When done, she pushes to her Github account (as branch "5555").  At
this point, the version history is sexy because Alice has rebased her
syncs to master, and sqashed any changesets that might be considered
noise.  We are now ready for review.

Bob has been hard at work on a new feature that will change
Everything, but he's struggled due to the buggy behavior caused by
#5555, so he's relieved when he notices Alice's push.  Bob adds
Alice's remote and merges her 5555 branch onto his topic branch.

Carlos picks up CASSANDRA-5555 for review, finds some nits and
provides feedback.  Alice responds by addressing the feedback in a new
changeset that she pushes to her branch 5555.

Bob discovers a corner case not addressed by Alice's branch, he codes
a fix and opens a pull request with Alice.

Alice merges Bob's and pushes again.  Carlos reviews, and gives a +1.
Alice rebases 5555 on to master so as to keep the change history sexy.

Bob attempts to sync up with master, and profanity ensues.

The end result is that Bob (and everyone else), learns to ignore other
people's topic branches because it just isn't worth it, and so we've
effectively lost the distributed benefits of our distributed VCS.
We're back to the same workflow we had when we attached patches, with
the exception that using Git makes it a little easier to download and
apply them.

> Last thing, I did check the kernel source git history (which as far as I
> know is considered as a model of git usage and distribution), and there is
> not a single commit that looks like some nits issue from the review of a
> preceding commit.

It seems counter-intuitive considering where Git came from, but the
Linux kernel is not a good model for anything other than the Linux
kernel.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Cassandra has moved to Git

Posted by paul cannon <pa...@datastax.com>.

On Mon, Jan 9, 2012 at 10:02 AM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> [speaking obviously as non-committer, just offering a perspective]
>
> A potential factor to consider: If one knows that all work in topic
> branches end up merged without anyone first rebasing to clean up, you
> now have a constant trade-off between history readability and
> committing often. I personally dislike anything that causes there to
> be a reason *not* to commit. I'd much rather commit like crazy and
> clean it up before merge, than maintaining additional branches
> privately just for the purpose, or playing around with stashes.
>

For sure- I think everybody agrees that it's best to rebase private
branches for maximum readability/useful history.

If the issue is not rebasing public branches, one can presumably
> always introduce a convention whereby work happens in branch X; and if
> rebasing is needed, you do that in X/rebased/1. If a further iteration
> ends up happening, X/rebased/2. Or something along those lines. This
> would get you:
>

Hmm. I haven't seen this done before, but I think it actually might work
really well.  As a tweak, we could require that discontinued branches (like
X and X/rebased/1 in your example) be relegated to tags instead of
branches, and under some standard hierarchy, i.e.
"refs/tags/discontinued/X", "refs/tags/discontinued/X/rebased/1". The
original branches themselves could then be removed.

* History looks the way you want it to look.
>  * All original history is maintained if you really want to look at it
> (I *do* think it could be useful when diving into a JIRA ticket after
> the fact to figure out what reasoning was).
>

+1 on this!

* You're not rebasing published branches.
>
> The downside I suppose is that the branch count increases.
>

This downside largely goes away if tags are used instead.

A possibly larger downside is that if anyone has been basing some work of
their own off of branch X, then they would need to rebase their own work on
top of the new version, which might not be trivial if they forked off an
earlier commit than the last.  But people basing important+complicated work
on pending ticket work probably should expect some difficulties of this
nature.

p

Re: Cassandra has moved to Git

Posted by Peter Schuller <pe...@infidyne.com>.

[speaking obviously as non-committer, just offering a perspective]

A potential factor to consider: If one knows that all work in topic
branches end up merged without anyone first rebasing to clean up, you
now have a constant trade-off between history readability and
committing often. I personally dislike anything that causes there to
be a reason *not* to commit. I'd much rather commit like crazy and
clean it up before merge, than maintaining additional branches
privately just for the purpose, or playing around with stashes.

I.e., in addition to the effects on history, if people feel it does
make history harder to read, it presumably affects the behavior of
those people in day-to-day work in terms of their propensity to
commit.

If the issue is not rebasing public branches, one can presumably
always introduce a convention whereby work happens in branch X; and if
rebasing is needed, you do that in X/rebased/1. If a further iteration
ends up happening, X/rebased/2. Or something along those lines. This
would get you:

* History looks the way you want it to look.
 * All original history is maintained if you really want to look at it
(I *do* think it could be useful when diving into a JIRA ticket after
the fact to figure out what reasoning was).
* You're not rebasing published branches.

The downside I suppose is that the branch count increases.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Cassandra has moved to Git

Posted by Sylvain Lebresne <sy...@datastax.com>.

Ok, I'll need to stop using Gmail. In the meantime, sorry for the
inconvenience, retrying, hoping that Gmail won't randomly screw
up the formatting this time.

So, I do don't like history that look like the following:

> commit eeee: last nits from reviewer
> commit dddd: oops, typo that prevented compilation
> commit cccc: some more fix found during review
> commit bbbb: refactor half of preceding patch following reviewer comments
> commit aaaa: Do something awesome - patch for #666

To be sure we talk of the same thing, let it be clear that I'm talking about
multiple commits due to the review process, i.e. changeset that is separated
into multiple commits only because that is the history of the mistakes made
by the author and remarked by the reviewer (or even that the author see
after having made it's branch public).

Now I could sum the reason why I think this is a regression compared to the
rather clean history we've had so can be summed up by two things:

1) I cannot see any case where having those details once the ticket is
committed to the project (i.e, has been +1'ed) would be useful, and I
certainly never ever wanted to know those when looking at our history.

2) I do think that keeping those details would be counter-productive:
  - It makes reading the history harder by the sheer fact of the added
    volume of commits. And since those commits are noise once the feature is
    committed (again, when knowing that the initial author had forgot to
    check the tests were not compiling and had to commit another one liner,
    or had tried another approach that has been heavily refactored in the
    next patch because the first attempt was ugly has ever be useful?).
  - It potentially makes bisect harder. Again, I'm talking about commits
    coming from the review process. Those, more or less by definition, come
    to fix mistakes made initially. They will more often than not not be
    compiling, or maybe just have the tests that do not compile, or
    introduce bugs that are fixed in the very next commit. When you bisect,
    those kind of commit are a pain in the ass.
  - It makes it more annoying to find what commits refer to what ticket. Not
    impossible, just a little harder.

Let it be clear that I'm not saying keeping the 'review commits' make
anything *impossible*, but it does make very common tasks more complicated
(and potentially very frustrating in the case of bisect), while adding only
noise.

Of course, we could debate whether those 'review commits' are noise or not,
i.e. if they can have a concrete use ever, but my personal experience on
that is that it is noise.

> The other side of the equation is that there is tremendous power to be had
> from distributed versioning, and this proposed workflow discourages taking
> advantage of that

Would you mind elaborating, maybe be a little more concrete on that one.
Because just to be such we agree, I did not propose to rebase public
branches or something like that.

I proposed basically that once a changeset has been finalized on a branch
xxx of the author repro, instead of merging the branch xxx, the committer
would (locally) extract the content of that changeset and commit is a one
commit to the project repository (say trunk). Given that git is content
based, it "should" work without much problem for other branch based on xxx
to merge trunk in their branch.

Sure that may not be "beautiful" or something, and I you have better to
propose, be my guest.

Last thing, I did check the kernel source git history (which as far as I
know is considered as a model of git usage and distribution), and there is
not a single commit that looks like some nits issue from the review of a
preceding commit.

On Mon, Jan 9, 2012 at 4:38 PM, Sylvain Lebresne <sy...@datastax.com> wrote:
> So, I do don't like history that look like the following:
>> commit eeee: last nits from reviewer> commit dddd: oops, typo that prevented compilation> commit cccc: some more fix found during review> commit bbbb: refactor half of preceding patch following reviewer comments> commit aaaa: Do something awesome - patch for #666
> To be sure we talk of the same thing, let it be clear that I'm talking
> aboutmultiple commits due to the review process, i.e. changeset that
> is separatedinto multiple commits only because that is the history of
> the mistakes madeby the author and remarked by the reviewer (or even
> that the author seeafter having made it's branch public).
> Now I could sum the reason why I think this is a regression compared
> to therather clean history we've had so can be summed up by two
> things:
> 1) I cannot see any case where having those details once the ticket
> iscommitted to the project (i.e, has been +1'ed) would be useful, and
> Icertainly never ever wanted to know those when looking at our
> history.
> 2) I do think that keeping those details would be counter-productive:
> - It makes reading the history harder by the sheer fact of the added
>  volume of commits. And since those commits are noise once the feature
> is    committed (again, when knowing that the initial author had
> forgot to    check the tests were not compiling and had to commit
> another one liner,    or had tried another approach that has been
> heavily refactored in the    next patch because the first attempt was
> ugly has ever be useful?).  - It potentially makes bisect harder.
> Again, I'm talking about commits    coming from the review process.
> Those, more or less by definition, come    to fix mistakes made
> initially. They will more often than not not be    compiling, or maybe
> just have the tests that do not compile, or    introduce bugs that are
> fixed in the very next commit. When you bisect,    those kind of
> commit are a pain in the ass.  - It makes it more annoying to find
> what commits refer to what ticket. Not    impossible, just a little
> harder.
> Let it be clear that I'm not saying keeping the 'review commits'
> makeanything *impossible*, but it does make very common tasks more
> complicated(and potentially very frustrating in the case of bisect),
> while adding onlynoise.
>> The other side of the equation is that there is tremendous power to be had> from distributed versioning, and this proposed workflow discourages taking> advantage of that
> Would you mind elaborating, maybe be a little more concrete on that
> one.Because just to be such we agree, I did not propose to rebase
> publicbranches or something like that.
> I proposed basically that once a changeset has been finalized on a
> branchxxx of the author repo, instead of merging the branch xxx, the
> committerwould (locally) extract the content of that changeset and
> commit is a onecommit to the project repository (say trunk). Given
> that git is contentbased, it "should" work without much problem for
> other branch based on xxxto merge trunk in their branch.
> Sure that may not be "beautiful" or something, and I you have better
> topropose, be my guest.
> Last thing, I did check the kernel source git history (which as far as
> Iknow is considered as a model of git usage and distribution), and
> there isnot a single commit that looks like some nits issue from the
> review of apreceding commit.
> --Sylvain

Re: Cassandra has moved to Git

Posted by Sylvain Lebresne <sy...@datastax.com>.

So, I do don't like history that look like the following:
> commit eeee: last nits from reviewer> commit dddd: oops, typo that prevented compilation> commit cccc: some more fix found during review> commit bbbb: refactor half of preceding patch following reviewer comments> commit aaaa: Do something awesome - patch for #666
To be sure we talk of the same thing, let it be clear that I'm talking
aboutmultiple commits due to the review process, i.e. changeset that
is separatedinto multiple commits only because that is the history of
the mistakes madeby the author and remarked by the reviewer (or even
that the author seeafter having made it's branch public).
Now I could sum the reason why I think this is a regression compared
to therather clean history we've had so can be summed up by two
things:
1) I cannot see any case where having those details once the ticket
iscommitted to the project (i.e, has been +1'ed) would be useful, and
Icertainly never ever wanted to know those when looking at our
history.
2) I do think that keeping those details would be counter-productive:
- It makes reading the history harder by the sheer fact of the added
 volume of commits. And since those commits are noise once the feature
is    committed (again, when knowing that the initial author had
forgot to    check the tests were not compiling and had to commit
another one liner,    or had tried another approach that has been
heavily refactored in the    next patch because the first attempt was
ugly has ever be useful?).  - It potentially makes bisect harder.
Again, I'm talking about commits    coming from the review process.
Those, more or less by definition, come    to fix mistakes made
initially. They will more often than not not be    compiling, or maybe
just have the tests that do not compile, or    introduce bugs that are
fixed in the very next commit. When you bisect,    those kind of
commit are a pain in the ass.  - It makes it more annoying to find
what commits refer to what ticket. Not    impossible, just a little
harder.
Let it be clear that I'm not saying keeping the 'review commits'
makeanything *impossible*, but it does make very common tasks more
complicated(and potentially very frustrating in the case of bisect),
while adding onlynoise.
> The other side of the equation is that there is tremendous power to be had> from distributed versioning, and this proposed workflow discourages taking> advantage of that
Would you mind elaborating, maybe be a little more concrete on that
one.Because just to be such we agree, I did not propose to rebase
publicbranches or something like that.
I proposed basically that once a changeset has been finalized on a
branchxxx of the author repo, instead of merging the branch xxx, the
committerwould (locally) extract the content of that changeset and
commit is a onecommit to the project repository (say trunk). Given
that git is contentbased, it "should" work without much problem for
other branch based on xxxto merge trunk in their branch.
Sure that may not be "beautiful" or something, and I you have better
topropose, be my guest.
Last thing, I did check the kernel source git history (which as far as
Iknow is considered as a model of git usage and distribution), and
there isnot a single commit that looks like some nits issue from the
review of apreceding commit.
--Sylvain

Re: Cassandra has moved to Git

Posted by Eric Evans <ee...@acunu.com>.

On Sat, Jan 7, 2012 at 2:35 AM, Radim Kolar <hs...@sendmail.cz> wrote:
> Dne 5.1.2012 7:22, Peter Schuller napsal(a):
>
>> (And btw, major +1 on the transition to git!)
>
> please fix github mirror already.

https://issues.apache.org/jira/browse/INFRA-4254

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Cassandra has moved to Git

Posted by Radim Kolar <hs...@sendmail.cz>.

Dne 5.1.2012 7:22, Peter Schuller napsal(a):
> (And btw, major +1 on the transition to git!)
please fix github mirror already.

Re: Cassandra has moved to Git

Posted by paul cannon <pa...@datastax.com>.

On Thu, Jan 5, 2012 at 4:50 AM, Sylvain Lebresne <sy...@datastax.com>wrote:

> Also, if committer !=
> reviewer, there is the slight issue of how the committer make sure
> that he commits what has been reviewer (i.e, that author hasn't made
> some last minute, post-review change). But I suppose we can either say
> "don't do that" and trust people, or ask reviewer to comment with
> something like "+1 on 666:<sha1 of last commit>".
>

Signed git tags are a real good fit for this sort of thing.  Or even
unsigned annotated tags, if we have a pretty high default trust level.

p

Re: Cassandra has moved to Git

Posted by Eric Evans <ee...@acunu.com>.

On Thu, Jan 5, 2012 at 10:58 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
>> This discourages collaboration because anyone that might fork
>> github.com/author/666 is sitting on a powder keg.
>
> Alright, but then what is it you're proposing?

That we use rebase on private topic branches as a courtesy to
reviewers (or to streamline the process in general).  That once that
branch has been published, we no longer rebase it.

I also think that a measure of common sense could be applied.  If the
review is particularly arduous, the reviewer and reviewee could agree
to create a new branch (say 666-1) that (privately)
rebases/squashes/etc changesets to complete the final step(s).  Not as
a tool for maintaining some standard for aesthetics in the version
history, but to facilitate review.

>> At best it's yak shaving.  At worst it's going to result in some very
>> frustrated contributors.  This is one of the major reasons why rebase
>> is so contentious, and it's exactly why you hear so many people saying
>> "don't rebase branches that have been published".
>
> Again, I was more talking about the only reasonable solution I saw.
> Because to be clear, if the history for some issue 666 in say trunk looks like:
>
> commit eeee: last nits from reviewer
> commit dddd: oops, typo that prevented commit
> commit cccc: some more fix found during review
> commit bbbb: refactor half of preceding patch following reviewer comments
> commit aaaa: Do something awesome - patch for #666

Which is reality, isn't it?  I know it's a contentious argument, but
this *is* the history of the change.

> then imho that's a big regression from current patch based development.

If it is a goal to maintain some objective target for version history
aesthetics, then yes.  I say aesthetics here because I still don't
understand the arguments about it obscuring history or making bisects
impossible (and because you later go on to refer to it as "look[ing]
like shit" :) ).

The other side of the equation is that there is tremendous power to be
had from distributed versioning, and this proposed workflow
discourages taking advantage of that.

> So basically my question is how do we meld all those commits that will
> necessarily happen due to the nature of distributed reviews so that our
> main history don't look like shit? And if the answer is "we don't" then
> I'm not too fond of that solution.

I would argue that our deliverable should be beautiful code, not a
beautifully formatted change history.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: Cassandra has moved to Git

Posted by Brian O'Neill <bo...@alumni.brown.edu>.

I'm by no means a git guru, but just happened to attend a meeting last
night where the presenter addressed this exact issue.  He has a pretty
slick process that kept the master/trunk clean without rebasing by
squashing a set of commits into a single commit when merged to trunk.
(using git squash?)

I'm CCing the guru, Nicholas Hance.

Nicholas, can you share that process handout from last night?

-brian


On Thu, Jan 5, 2012 at 11:58 AM, Sylvain Lebresne <sy...@datastax.com>wrote:

> > This discourages collaboration because anyone that might fork
> > github.com/author/666 is sitting on a powder keg.
>
> Alright, but then what is it you're proposing?
>
> > At best it's yak shaving.  At worst it's going to result in some very
> > frustrated contributors.  This is one of the major reasons why rebase
> > is so contentious, and it's exactly why you hear so many people saying
> > "don't rebase branches that have been published".
>
> Again, I was more talking about the only reasonable solution I saw.
> Because to be clear, if the history for some issue 666 in say trunk looks
> like:
>
> commit eeee: last nits from reviewer
> commit dddd: oops, typo that prevented commit
> commit cccc: some more fix found during review
> commit bbbb: refactor half of preceding patch following reviewer comments
> commit aaaa: Do something awesome - patch for #666
>
> then imho that's a big regression from current patch based development.
>
> So basically my question is how do we meld all those commits that will
> necessarily happen due to the nature of distributed reviews so that our
> main history don't look like shit? And if the answer is "we don't" then
> I'm not too fond of that solution.
>
> --
> Sylvain
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/

Re: Cassandra has moved to Git

Posted by paul cannon <pa...@datastax.com>.

On Thu, Jan 5, 2012 at 10:58 AM, Sylvain Lebresne <sy...@datastax.com>wrote:

> Again, I was more talking about the only reasonable solution I saw.
> Because to be clear, if the history for some issue 666 in say trunk looks
> like:
>
> commit eeee: last nits from reviewer
> commit dddd: oops, typo that prevented commit
> commit cccc: some more fix found during review
> commit bbbb: refactor half of preceding patch following reviewer comments
> commit aaaa: Do something awesome - patch for #666
>
> then imho that's a big regression from current patch based development.
>

I don't see this as a problematic, given all the tools like git log --graph
and graphical history viewers.  Especially since long nitpick histories
like this are not likely to be very common in practice.  Would you care to
elaborate on the issue?

So basically my question is how do we meld all those commits that will
> necessarily happen due to the nature of distributed reviews so that our
> main history don't look like shit? And if the answer is "we don't" then
> I'm not too fond of that solution.
>

Does "look like shit" here mean "has lots of forks and merges", or "has
lots of commits", or "is not aesthetically pleasing"?

p

Re: Cassandra has moved to Git

Posted by Ted Crossman <te...@gmail.com>.

On 1/5/12 9:06 AM, Jonathan Ellis wrote:
> On Thu, Jan 5, 2012 at 10:58 AM, Sylvain Lebresne<sy...@datastax.com>  wrote:
>> Again, I was more talking about the only reasonable solution I saw.
>> Because to be clear, if the history for some issue 666 in say trunk looks like:
>>
>> commit eeee: last nits from reviewer
>> commit dddd: oops, typo that prevented commit
>> commit cccc: some more fix found during review
>> commit bbbb: refactor half of preceding patch following reviewer comments
>> commit aaaa: Do something awesome - patch for #666
> Don't forget
>
> commit ffff:<merge>  (i.e., resolve conflicts introduced in master post-branch)
>
>> So basically my question is how do we meld all those commits that will
>> necessarily happen due to the nature of distributed reviews so that our
>> main history don't look like shit? And if the answer is "we don't" then
>> I'm not too fond of that solution.
> +1
>
One thing to consider is something like gerrit which allow code reviews 
before you submit to "trunk".

Either way if you work on a branch separate branch form "trunk" you can 
publish changes for people to review.
If they have a nit. You fix it *in the same commit* using git commit 
--amend.

Step # 1: Issue fixer
git checkout -b issue666 origin/trunk
*hack hack*
git commit -a
Send e-mail to reviewers telling them to do pull.

Step #2: Reviewer:
git checkout -b issue666
git pull <person submitting code's git url> issue666
* review review*
* Send e-mail to issue fixer pointing out stuff*

Step #3: Issue fixer
*fix pointed out stuff*
git commit --amend
*Send out email "I've fixed it please review again"

Step #4: Reviewer
git checkout issue666 # assuming they have been working on something else
git reset --hard HEAD^1 # this will get you back to the state before you 
pulled issue666
git pull <person submitting code's git url> issue666
* review review*
* Send e-mail to issue fixer pointing out stuff*

You can do steps #3 and #4 as many times as necessary.

Once everyone is happy, either one of the reviewers or the issue fixer can:
git checkout trunk
git rebase issue666

Also if trunk moves along during the review process the issue creator 
can just:
git checkout issue666
git rebase trunk

and then send things for review again.

Gerrit makes a lot of the "send e-mail to pull" and "send e-mail 
pointing stuff out" a lot easier.


HTH,

tedo

Re: Cassandra has moved to Git

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Jan 5, 2012 at 10:58 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
> Again, I was more talking about the only reasonable solution I saw.
> Because to be clear, if the history for some issue 666 in say trunk looks like:
>
> commit eeee: last nits from reviewer
> commit dddd: oops, typo that prevented commit
> commit cccc: some more fix found during review
> commit bbbb: refactor half of preceding patch following reviewer comments
> commit aaaa: Do something awesome - patch for #666

Don't forget

commit ffff: <merge> (i.e., resolve conflicts introduced in master post-branch)

> So basically my question is how do we meld all those commits that will
> necessarily happen due to the nature of distributed reviews so that our
> main history don't look like shit? And if the answer is "we don't" then
> I'm not too fond of that solution.

+1

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Cassandra has moved to Git

Posted by Sylvain Lebresne <sy...@datastax.com>.

> This discourages collaboration because anyone that might fork
> github.com/author/666 is sitting on a powder keg.

Alright, but then what is it you're proposing?

> At best it's yak shaving.  At worst it's going to result in some very
> frustrated contributors.  This is one of the major reasons why rebase
> is so contentious, and it's exactly why you hear so many people saying
> "don't rebase branches that have been published".

Again, I was more talking about the only reasonable solution I saw.
Because to be clear, if the history for some issue 666 in say trunk looks like:

commit eeee: last nits from reviewer
commit dddd: oops, typo that prevented commit
commit cccc: some more fix found during review
commit bbbb: refactor half of preceding patch following reviewer comments
commit aaaa: Do something awesome - patch for #666

then imho that's a big regression from current patch based development.

So basically my question is how do we meld all those commits that will
necessarily happen due to the nature of distributed reviews so that our
main history don't look like shit? And if the answer is "we don't" then
I'm not too fond of that solution.

--
Sylvain

Re: Cassandra has moved to Git

Posted by Eric Evans <ee...@acunu.com>.

On Thu, Jan 5, 2012 at 4:50 AM, Sylvain Lebresne <sy...@datastax.com> wrote:
> I agree that having merge commits each time a new "patch" is committed
> is a pain and adds no useful information imo (to be clear I'm not
> talking of actual merge (like say cassandra-1.0 -> trunk)), so +1 on
> using git pull --rebase to avoid it.

Personally, the merge commits and extra(neous) history don't bother
me.  I favor rebasing a private topic branch only for purposes of
creating a concise result for review.

> Now, I'm a little slow so I'd like to make sure I understand how we expect
> this to work. Currently the life of a ticket goes more or less like that:
> - author attaches patches: 0001-uberidea.patch and 0002-unit-test.patch.
> - reviewer makes some remarks, maybe some that require fairly intensive changes.
> - author attaches new version: 0001-uberidea-v2.patch and
> 0002-unit-test-v2.patch.
> - reviewer makes a few last small remarks.
> - author attaches patch 0003-fixes-v3.patch. Note that it could also
> attach a v3 of 0001 and 0002 instead, but let's say that those two are
> big and the last remarks are small, so attaching a 0003 makes it
> easier for the reviewer to quickly check how his last remark has been
> addressed.
> - reviewer +1 the patch.
> - committer applies the patches on his local svn/git main branch and
> push upstream.
>
> The way it could translate with git:
> - author says "patches are available at github.com/author/666".
> - reviewer makes remarks.
> - author pushes some more commit on branch 666 (but he does not rebase
> because it's a public branch).
> - reviewer makes last remarks.
> - author pushes one more commit (again, no rebasing).
> - reviewer +1 the patch.
> - Now github.com/author/666 has many commits but we want to commit
> only one upstream because those commits are the back-and-forth of
> review. So committer would be in charge of pulling 666 locally, squash
> the commits, then rebase his local main branch against the result (or
> cherry-pick the now unique commit) and push upstream?

This discourages collaboration because anyone that might fork
github.com/author/666 is sitting on a powder keg.

Imagine the case where Alice pushes a topic branch for a new feature,
and Bob forks that because his feature depends on it and he wants to
test, or doesn't want to wait for it to be committed.  Or perhaps Bob
found a bug in Alice's code, or is taking her feature to the next
stage.  It's possible that this isn't even the only branch Bob has
merged onto his.

Let's hope Bob is on top of things.

> Does that sound like what we all had in mind?
>
> It seem to me it puts a little bit more work on the committer (he has
> to squash commits), but I can live with that. Also, if committer !=
> reviewer, there is the slight issue of how the committer make sure
> that he commits what has been reviewer (i.e, that author hasn't made
> some last minute, post-review change). But I suppose we can either say
> "don't do that" and trust people, or ask reviewer to comment with
> something like "+1 on 666:<sha1 of last commit>".

At best it's yak shaving.  At worst it's going to result in some very
frustrated contributors.  This is one of the major reasons why rebase
is so contentious, and it's exactly why you hear so many people saying
"don't rebase branches that have been published".


-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu