You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Jeff Hammerbacher <je...@gmail.com> on 2007/07/29 07:53:37 UTC

using git for SCM?

hey all,

so we're starting to really use hadoop heavily here at facebook, and we'd
like to start contributing back to the hadoop trunk.  we've been through
this process before (contributing heavily to an actively developed open
source project) with memached, and the guy who owns the trunk of that
project is quite the SCM nut who says that his life has been made infinitely
more pleasant by switching from svn to git for version control with
memcached.

i'm not suggesting the project be moved tomorrow.  i just wanted to throw
the question out there to get a sense of how others feel about git as an SCM
tool.  i have asked our memcached guy to write up the specific advantages of
git versus svn so i will be happy to follow up with details.

apologies if this has been brought up before; i've been reading the mailing
list on and off now for several months and a quick search of the archives
yielded no matches for "git" so hopefully this is a relevant topic.

thanks,
jeff

Re: using git for SCM?

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Enis Soztutar wrote:
> Hi,
> 
> I have switched to git just two days ago, and i am quite happy about it. 
> Although i do not want hadoop to move to git before apache 
> infrastructure have matured enough. As Linus addresses in his talk, many 
> users have been using git+svn in a very innovative way. The central 
> repository is managed by svn and the users checkout the code, build a 
> local git repo and work from there.

This seems like very useful case. When I was working with patch for 
HADOOP-1134 that depended on (changing) patches for other jiras, I was 
working with my own local svn. That implied I could not update from 
central svn. This fixes it.

Raghu.
> Personally i find it *very* 
> convenient to let both svn and git manage the code. What i do is make 
> git ignore svn files, and make svn ignore git files. Then i just branch 
> once for every issue i work on. Committing the changes before switching 
> to the other branch(issue). After I've done with the issue i get the 
> patch with regular svn diff. Anyhow i am telling about this because it 
> enabled me to work on several distinct issues simultaneously using the 
> same code base. I highly suggest this workspace-model to anyone working 
> this way.

Re: using git for SCM?

Posted by Enis Soztutar <en...@gmail.com>.
Hi,

I have switched to git just two days ago, and i am quite happy about it. 
Although i do not want hadoop to move to git before apache 
infrastructure have matured enough. As Linus addresses in his talk, many 
users have been using git+svn in a very innovative way. The central 
repository is managed by svn and the users checkout the code, build a 
local git repo and work from there. Personally i find it *very* 
convenient to let both svn and git manage the code. What i do is make 
git ignore svn files, and make svn ignore git files. Then i just branch 
once for every issue i work on. Committing the changes before switching 
to the other branch(issue). After I've done with the issue i get the 
patch with regular svn diff. Anyhow i am telling about this because it 
enabled me to work on several distinct issues simultaneously using the 
same code base. I highly suggest this workspace-model to anyone working 
this way.



Jeff Hammerbacher wrote:
> hey all,
>
> so we're starting to really use hadoop heavily here at facebook, and we'd
> like to start contributing back to the hadoop trunk.  we've been through
> this process before (contributing heavily to an actively developed open
> source project) with memached, and the guy who owns the trunk of that
> project is quite the SCM nut who says that his life has been made infinitely
> more pleasant by switching from svn to git for version control with
> memcached.
>
> i'm not suggesting the project be moved tomorrow.  i just wanted to throw
> the question out there to get a sense of how others feel about git as an SCM
> tool.  i have asked our memcached guy to write up the specific advantages of
> git versus svn so i will be happy to follow up with details.
>
> apologies if this has been brought up before; i've been reading the mailing
> list on and off now for several months and a quick search of the archives
> yielded no matches for "git" so hopefully this is a relevant topic.
>
> thanks,
> jeff
>
>   

Re: using git for SCM?

Posted by Jim White <ji...@pagesmiths.com>.
Eric Baldeschwieler wrote:

> Interesting topic.
> I'd be interested in learning what would be made infinitely better.

Distributed revision control is definitely superior to the conventional 
centralized revision control plus patching rigmarole.

I don't think there is anything offering infinite improvement at this 
stage though.

> But SVN shortcomings aren't on my top 10 hadoop issues yet.
> I'd be interested in hearing from folks who think they this would be  
> very valuable.
> 
> But I agree with doug.  Too much leverage in using Apache's  
> infrastructure to make this appealing.
> ...

Yes, indeed.

Especially since git is nowhere near suitable for folks who aren't 
Linux-based, command-line loving hackers.  A key thing to remember is 
that many folks use the SCM integration in their IDE, an area where even 
Subversion is still working to catch up with CVS.

BitKeeper (the commercial distributed revision control tool git was 
written to replace) is at least cross-platform, but is not free.  Nor 
are it's benefits over the current system worth the total cost of 
operation for ordinary projects like Hadoop.

Linus extols git's virtues (and Subversion's worthlessness) in his 
inimitable style here:

http://www.youtube.com/watch?v=4XpnKHJAok8

Since we're discussing SCM, I'll mention darcs.  Even though it may not 
be as fast as git, it is OSS and implemented in Haskell, making it an 
promising starting point for more creative applications.  Links to that 
and other interesting tools here:

http://www.ifcx.org/wiki/RevisionControl.html

Jim


Re: using git for SCM?

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
Interesting topic.
I'd be interested in learning what would be made infinitely better.

But SVN shortcomings aren't on my top 10 hadoop issues yet.
I'd be interested in hearing from folks who think they this would be  
very valuable.

But I agree with doug.  Too much leverage in using Apache's  
infrastructure to make this appealing.

On Jul 30, 2007, at 1:28 PM, Doug Cutting wrote:

> Jeff Hammerbacher wrote:
> > so we're starting to really use hadoop heavily here at facebook,  
> and we'd
> > like to start contributing back to the hadoop trunk.  we've been  
> through
> > this process before (contributing heavily to an actively  
> developed open
> > source project) with memached, and the guy who owns the trunk of  
> that
> > project is quite the SCM nut who says that his life has been made  
> infinitely
> > more pleasant by switching from svn to git for version control with
> > memcached.
> >
> > i'm not suggesting the project be moved tomorrow.  i just wanted  
> to throw
> > the question out there to get a sense of how others feel about  
> git as an SCM
> > tool.
>
> The project has a fair amount of culture built around subversion.   
> Also,
> Apache's infrastructure currently supports subversion, not git, so we
> would have to either manage our own repository, which is not  
> encouraged,
> or make git a supported part of Apache's infrastructure, which is  
> rather
> outside the scope of Hadoop.  So we could potentially move to a
> different SCM, but someone would have to make a very compelling case
> before that would become a project priority.
>
> Doug
>
>


Re: using git for SCM?

Posted by Torsten Curdt <tc...@apache.org>.
On 30.07.2007, at 22:28, Doug Cutting wrote:

> Jeff Hammerbacher wrote:
>> so we're starting to really use hadoop heavily here at facebook,  
>> and we'd
>> like to start contributing back to the hadoop trunk.  we've been  
>> through
>> this process before (contributing heavily to an actively developed  
>> open
>> source project) with memached, and the guy who owns the trunk of that
>> project is quite the SCM nut who says that his life has been made  
>> infinitely
>> more pleasant by switching from svn to git for version control with
>> memcached.
>> i'm not suggesting the project be moved tomorrow.  i just wanted  
>> to throw
>> the question out there to get a sense of how others feel about git  
>> as an SCM
>> tool.
>
> The project has a fair amount of culture built around subversion.   
> Also, Apache's infrastructure currently supports subversion, not  
> git, so we would have to either manage our own repository, which is  
> not encouraged, or make git a supported part of Apache's  
> infrastructure, which is rather outside the scope of Hadoop.  So we  
> could potentially move to a different SCM, but someone would have  
> to make a very compelling case before that would become a project  
> priority.

Well, there is still git-svn as a two-way bridge. Having said that -  
I have no clue how feasible this would be ...but Linus' talk got me  
interested in git, too

  http://youtube.com/watch?v=4XpnKHJAok8

cheers
--
Torsten

Re: using git for SCM?

Posted by Doug Cutting <cu...@apache.org>.
Jeff Hammerbacher wrote:
> so we're starting to really use hadoop heavily here at facebook, and we'd
> like to start contributing back to the hadoop trunk.  we've been through
> this process before (contributing heavily to an actively developed open
> source project) with memached, and the guy who owns the trunk of that
> project is quite the SCM nut who says that his life has been made infinitely
> more pleasant by switching from svn to git for version control with
> memcached.
> 
> i'm not suggesting the project be moved tomorrow.  i just wanted to throw
> the question out there to get a sense of how others feel about git as an SCM
> tool.

The project has a fair amount of culture built around subversion.  Also, 
Apache's infrastructure currently supports subversion, not git, so we 
would have to either manage our own repository, which is not encouraged, 
or make git a supported part of Apache's infrastructure, which is rather 
outside the scope of Hadoop.  So we could potentially move to a 
different SCM, but someone would have to make a very compelling case 
before that would become a project priority.

Doug


Re: using git for SCM?

Posted by Prafulla Tekawade <pr...@gmail.com>.
Just post this comparison here,when that "memcached guy" is done with it...
I would like to read it....

>
>  i have asked our memcached guy to write up the specific advantages of
> git versus svn so i will be happy to follow up with details.
>
>
>