You are viewing a plain text version of this content. The canonical link for it is here.
Posted to infrastructure-dev@apache.org by Jukka Zitting <ju...@gmail.com> on 2008/12/31 09:17:40 UTC

Next steps with git (Was: Added a simple tutorial on Git cloning)

Hi,

On Tue, Dec 30, 2008 at 10:25 PM, Grzegorz Kossakowski
<gk...@apache.org> wrote:
> Since we have something working and we have gained enough experience to sort out most of the
> problems that may arise maybe we should let others know about our experiments? I have in mind
> sending an e-mail to committers@ list informing about Git activity at Apache.

I guess we're still some way from informing committers@, but you're
right in that it's now time to move forward with this setup.

Here's what I think we should do:

a) Set up the mirrors on Apache hardware. We could for example request
a Solaris zone like git.zones.apache.org for this. It would be good to
have at least two or three administrators to avoid making me a
bottleneck.

b) Clean up and document the mirror maintenance scripts (currently at
[1]) and move them to an appropriate location under
repos/asf/infrastructure. It should be possible for a new
administrator to get up to speed with just some pointers to
documentation.

c) Improve and extend the documentation we now have in the wiki and
move it to an appropriate location under www.apache.org/dev.

d) Start using the INFRA project in Jira for git tasks like setting up
a new mirror.

There's also the open issue of how to best handle contributions made
via git. Should we always insist on patches or would a pull request be
OK? It would be good to have some documented best practice for such
cases.

Another issue to think about is our approach to people publishing
their clones on places like github. On one hand it's good when people
do that as making your working copy public is one area where git
really helps collaboration. On the other hand we'll want to make sure
that development efforts won't splinter to other forums.

[1] http://github.com/jukka/apache-git-mirrors/tree/master

BR,

Jukka Zitting

Re: Next steps with git (Was: Added a simple tutorial on Git cloning)

Posted by Aidan Skinner <ai...@gmail.com>.
On Mon, Jan 5, 2009 at 3:19 PM, Jukka Zitting <ju...@gmail.com> wrote:

> More a philosophical question. I'm pretty sure that there'll be cases
> where people using git will be pushing the boundaries of traditional
> Apache-style development.

We already kind of did this in Qpid when we had a github mirror. There
were two feature branches by (at the time) non-comitters of OS ports
which were tracking trunk but weren't integrated. They used git to
collaborate with each other directly, without having to mail patches
around by hand. There was some concern about this expressed since the
repo was essentially under one persons control, and it was felt that
the code provenance wasn't clear.

Having said that, I think a lot of those problems were because it was
essentially a forked repo and we could avoid both of those issues if
it was more closely integrated into the Apache infrastructure.

The main thing I think we need to figure out are centrally located
topic branches that comitters can push too, and a mechanism for
pulling from non-comitters trees and preserving authorship of patches.

> The conventional wisdom says that all code changes should go through
> svn or as patches sent to an issue tracker or a mailing list. Should
> we stick to that guideline or embrace the new workflow enabled by git?

I suspect this already happens, I think cheap branches would be likely
to make this happen more openly. Few people really want to do this
kind of behind-closed-doors stuff, it mostly happens because it's
cheaper than SVN branches.

> I guess only time will tell, and my question was mostly meant as an
> indicator that these are the sorts of issues we'll likely encounter.
> Perhaps we could come up with some general guidelines on how to
> approach such issues.

I think this is defiantly something that would benefit from some
thinking before it becames an issue.

- Aidan

-- 
Apache Qpid - World Domination through Advanced Message Queueing
http://qpid.apache.org

Re: Next steps with git (Was: Added a simple tutorial on Git cloning)

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Sun, Jan 4, 2009 at 3:32 PM, Grzegorz Kossakowski
<gk...@apache.org> wrote:
> Jukka Zitting pisze:
>> For example I've been thinking of changing the svn.eu.apache.org in
>> the commit logs to svn.apache.org.
>
> Is there any reason for doing that? I thought that both svn.eu.a.o and svn.a.o are considered as
> official even if, technically, the first one is a mirror.

Two reasons:

a) As reported, dcommit works better with svn.apache.org.

b) There's nothing EU-specific about the version histories, so having
the name of the regional mirror embedded in the git commit logs feels
wrong.

>> I have a dedicated email address git@jukka.zitting.name invokes
>> email-update.sh whenever a new commit message is received.
>
> Ok, but I was asking more about how you do it in terms of setting up appropriate
> Linux tools. I have no experience with processing e-mails at Linux.

I have a standard sendmail installation (from RHEL) and I've just
added the git@ address as an entry in the aliases file. Even a
"|email-update.sh" entry in a .forward file should do the trick if you
want to experiment with it.

> It looks like I misunderstood your intentions previously. I thought you would like to
> ask infra just for a zone so we can try to setup everything in usual Apache infrastructure
> environment but still keep it highly experimental.

I think we've already solved all the major technical issues, so I
think we should start turning the git mirrors from an experiment to
something that people could confidently use as a part of their
standard workflow.

> If we are going to stay closer to infra team with our effort and at the same time make
> it less experimental then I guess it would be helpful that infra folks speak up now.
> I would like to know what kind of requirements we would have to fulfill in order to
> become a part of infra team to some limited extent.

I was actually made a member of the Infra team recently based on the
git work. :-)

> To make it more clear: I would like to know what do we have to do in order to be
> ready to migrate to Apache's hardware.

Based on discussions in the last ApacheCon I think the Infra team is
happy with the direction we've taken with the git mirrors and that it
shouldn't be a problem to get a Solaris zone or something similar for
this purpose. Beyond that we just need to document the setup
reasonably well and make sure that there's sufficient interest so that
the mirrors will remain maintained even when some of us focus our
interest elsewhere.

> > Also, how do we decide whether a potential new
> > development pattern enabled by Git tools is beneficial or not?
>
> I fail to understand your last question. Why do we need to decide? I guess that
> if most committers (or better PMC members) feel something works better for
> them then they should use it. Do you have anything specific in mind?

More a philosophical question. I'm pretty sure that there'll be cases
where people using git will be pushing the boundaries of traditional
Apache-style development.

For example what happens if a pair of committers decide to push and
pull directly from each other when working on a feature branch instead
of going through svn? What if we have proper notifications of all the
exchanged changes going to the appropriate mailing list?

The conventional wisdom says that all code changes should go through
svn or as patches sent to an issue tracker or a mailing list. Should
we stick to that guideline or embrace the new workflow enabled by git?

I guess only time will tell, and my question was mostly meant as an
indicator that these are the sorts of issues we'll likely encounter.
Perhaps we could come up with some general guidelines on how to
approach such issues.

BR,

Jukka Zitting

Re: Next steps with git (Was: Added a simple tutorial on Git cloning)

Posted by Grzegorz Kossakowski <gk...@apache.org>.
Jukka Zitting pisze:
> Hi,
> 
> On Wed, Dec 31, 2008 at 6:08 PM, Grzegorz Kossakowski
> <gk...@apache.org> wrote:
>> My idea of informing committers@ was to make people aware of our effort and to
>> gather some feedback bigger set of people.
>>
>> I didn't have any official announcements in mind but I may be wrong on the purpose
>> of committers@ list.
> 
> I think a better approach for now would be to use community@ and the
> dev@ lists people are subscribed on  to spread news about the git
> mirrors. It would be nice to have more people trying them out but for
> now we should still warn them that the mirrors may well need to be
> regenerated (and histories broken) before they become a part of the
> official ASF infra.

Sure. On the other hand, since our focus is to push changes back to svn as soon as it makes sense I
don't think that regenerated repositories would be more than a little inconvenience.

I've been migrating my local branches from my own copy of Cocoon repository to the one generated by
you and it wasn't that hard to migrate them. Tools like git format-patch and git am and sed helped
me to do the whole task within twenty minutes including rewriting Author field that was broken in my
repository for my patches.

> For example I've been thinking of changing the svn.eu.apache.org in
> the commit logs to svn.apache.org.

Is there any reason for doing that? I thought that both svn.eu.a.o and svn.a.o are considered as
official even if, technically, the first one is a mirror.

> The current set of mirrors takes about 4GB of disk space, and I'm
> currently serving about 5GB of git data over the net per month (up
> from 2GB four months ago). The CPU load is negligible, the average
> load of the server is just 2% of a single CPU.

Ah, so not that much. Since the whole svn repository is something like 27GB I guess we won't grow
that much. The same goes for average load which might be high only on initial cloning but after that
there won't be any load for obvious reasons (Git is DVCS).

> Cool, thanks. The most common administration tasks would likely be
> setting up new git mirrors or changing the configuration of existing
> ones (for example due to a project graduating from the incubator).

Ok, this is doable for me.

> I have a dedicated email address git@jukka.zitting.name invokes
> email-update.sh whenever a new commit message is received.

Ok, but I was asking more about how you do it in terms of setting up appropriate Linux tools. I have
no experience with processing e-mails at Linux.

> I've been thinking of modifying the script so that it automatically
> detects which mirror needs to be updated based on the svn paths in the
> message and the git-svn settings in each mirror. This way project
> admins could just subscribe the address to their commit mailing lists
> to enable automatic git updates without me having to manually update
> the script.

I could have a look into it as soon as you answer my previous question.

> Once we do move the mirrors to Apache infra then I think it makes
> sense to also start posting the documentation under /dev/. Same goes
> for the INFRA Jira.
> 
> It's not something that I think we should do today, but I wouldn't be
> surprised if we were ready to take that step sometime during this
> year.

It looks like I misunderstood your intentions previously. I thought you would like to ask infra just
for a zone so we can try to setup everything in usual Apache infrastructure environment but still
keep it highly experimental.

If we are going to stay closer to infra team with our effort and at the same time make it less
experimental then I guess it would be helpful that infra folks speak up now. I would like to know
what kind of requirements we would have to fulfill in order to become a part of infra team to some
limited extent.

To make it more clear: I would like to know what do we have to do in order to be ready to migrate to
Apache's hardware.

> I'm not sure if we can or should enforce such rules. In the end it's
> up to the committer that commits a change to be reasonably certain
> about the origin and legal status of the code he or she is committing.

I was talking about guidelines only. Obviously, it's very easy to commit a patch contributed by
someone who *only* claims that is an author of that patch and there is no automated way to check if
it's true. We should establish a good practice just to protect committers from making mistakes too
easily.

The rest stays the same as it was with plain svn.

>> The good thing about GitHub is that it does not provide any communication means.
>> I think that as long as communication happens on Apache mailing list and the final
>> result is being committed into svn we shouldn't worry to much about GitHub and
>> similar sites.
> 
> But GitHub does provide such communication means. There are commit
> feeds and comments, pull requests, wikis, etc.

You are right. I had in mind a mailing lists which is my main mean of communication but obviously
not the only one.

> Given that we can't prevent people from using those mechanisms, how do
> we make sure that the key principles of Apache-style development are
> still followed? Also, how do we decide whether a potential new
> development pattern enabled by Git tools is beneficial or not?

I think that the only difference between commenting on patches at GitHub and commenting on patches
at JIRA is that we administrate JIRA and have some helper stuff set up like notification e-mails.

Not that much of a difference IMHO. Moreover, there are already services like FishEye actively used
across Apache we are not controlled by Apache.

I fail to understand your last question. Why do we need to decide? I guess that if most committers
(or better PMC members) feel something works better for them then they should use it. Do you have
anything specific in mind?

-- 
Best regards,
Grzegorz Kossakowski

Re: Next steps with git (Was: Added a simple tutorial on Git cloning)

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Dec 31, 2008 at 6:08 PM, Grzegorz Kossakowski
<gk...@apache.org> wrote:
> My idea of informing committers@ was to make people aware of our effort and to
> gather some feedback bigger set of people.
>
> I didn't have any official announcements in mind but I may be wrong on the purpose
> of committers@ list.

I think a better approach for now would be to use community@ and the
dev@ lists people are subscribed on  to spread news about the git
mirrors. It would be nice to have more people trying them out but for
now we should still warn them that the mirrors may well need to be
regenerated (and histories broken) before they become a part of the
official ASF infra.

For example I've been thinking of changing the svn.eu.apache.org in
the commit logs to svn.apache.org.

> Jukka Zitting pisze:
>> a) Set up the mirrors on Apache hardware. We could for example request
>> a Solaris zone like git.zones.apache.org for this. It would be good to
>> have at least two or three administrators to avoid making me a
>> bottleneck.
>
> Yep, good idea but I wonder how much of resources we are going to need. Can you show us some
> statistics of your server? At least consumed bandwidth and disk space occupied Git mirrors.
> What about a load that Git is putting on server? I guess it's not that much?

The current set of mirrors takes about 4GB of disk space, and I'm
currently serving about 5GB of git data over the net per month (up
from 2GB four months ago). The CPU load is negligible, the average
load of the server is just 2% of a single CPU.

> When it comes to administration you can count on me even if I don't qualify as a very skilled
> Linux/Solaris administrator. Anyway, I can resolve most of problems with Git itself.
>
> Also, I can allot my time over weekends mostly.

Cool, thanks. The most common administration tasks would likely be
setting up new git mirrors or changing the configuration of existing
ones (for example due to a project graduating from the incubator).

>> b) Clean up and document the mirror maintenance scripts (currently at
>> [1]) and move them to an appropriate location under
>> repos/asf/infrastructure. It should be possible for a new
>> administrator to get up to speed with just some pointers to
>> documentation.
>
> Agreed. BTW. How email-update.sh is triggered?

I have a dedicated email address git@jukka.zitting.name invokes
email-update.sh whenever a new commit message is received.

I've been thinking of modifying the script so that it automatically
detects which mirror needs to be updated based on the svn paths in the
message and the git-svn settings in each mirror. This way project
admins could just subscribe the address to their commit mailing lists
to enable automatic git updates without me having to manually update
the script.

>> c) Improve and extend the documentation we now have in the wiki and
>> move it to an appropriate location under www.apache.org/dev.
>
> Don't you think that it's too early to move our documentation to official Apache website? I think
> this would make an impression that Git has received an official "blessing" from Apache which didn't
> happen, right?

Once we do move the mirrors to Apache infra then I think it makes
sense to also start posting the documentation under /dev/. Same goes
for the INFRA Jira.

It's not something that I think we should do today, but I wouldn't be
surprised if we were ready to take that step sometime during this
year.

> If we are going to allow exchanging Git trees (repositories) instead of plain patches
> then we should establish a policy that non-committers are considered as a leaf developers.
> This implies that contributor can send a pull request for a tree that contains shes own patches or
> patches coming from committers but not from other contributors. Basically, contributors should be
> allowed to merge from committers only.

I'm not sure if we can or should enforce such rules. In the end it's
up to the committer that commits a change to be reasonably certain
about the origin and legal status of the code he or she is committing.

> The good thing about GitHub is that it does not provide any communication means.
> I think that as long as communication happens on Apache mailing list and the final
> result is being committed into svn we shouldn't worry to much about GitHub and
> similar sites.

But GitHub does provide such communication means. There are commit
feeds and comments, pull requests, wikis, etc.

Given that we can't prevent people from using those mechanisms, how do
we make sure that the key principles of Apache-style development are
still followed? Also, how do we decide whether a potential new
development pattern enabled by Git tools is beneficial or not?

BR,

Jukka Zitting

Re: Next steps with git (Was: Added a simple tutorial on Git cloning)

Posted by Grzegorz Kossakowski <gk...@apache.org>.
Jukka Zitting pisze:
> Hi,
> 
> On Tue, Dec 30, 2008 at 10:25 PM, Grzegorz Kossakowski
> <gk...@apache.org> wrote:
>> Since we have something working and we have gained enough experience to sort out most of the
>> problems that may arise maybe we should let others know about our experiments? I have in mind
>> sending an e-mail to committers@ list informing about Git activity at Apache.
> 
> I guess we're still some way from informing committers@, but you're
> right in that it's now time to move forward with this setup.

My idea of informing committers@ was to make people aware of our effort and to gather some feedback
bigger set of people.

I didn't have any official announcements in mind but I may be wrong on the purpose of committers@ list.

> Here's what I think we should do:
> 
> a) Set up the mirrors on Apache hardware. We could for example request
> a Solaris zone like git.zones.apache.org for this. It would be good to
> have at least two or three administrators to avoid making me a
> bottleneck.

Yep, good idea but I wonder how much of resources we are going to need. Can you show us some
statistics of your server? At least consumed bandwidth and disk space occupied Git mirrors.
What about a load that Git is putting on server? I guess it's not that much?

When it comes to administration you can count on me even if I don't qualify as a very skilled
Linux/Solaris administrator. Anyway, I can resolve most of problems with Git itself.

Also, I can allot my time over weekends mostly.

> b) Clean up and document the mirror maintenance scripts (currently at
> [1]) and move them to an appropriate location under
> repos/asf/infrastructure. It should be possible for a new
> administrator to get up to speed with just some pointers to
> documentation.

Agreed. BTW. How email-update.sh is triggered?

> c) Improve and extend the documentation we now have in the wiki and
> move it to an appropriate location under www.apache.org/dev.

Don't you think that it's too early to move our documentation to official Apache website? I think
this would make an impression that Git has received an official "blessing" from Apache which didn't
happen, right?

> d) Start using the INFRA project in Jira for git tasks like setting up
> a new mirror.

The same as above.

> There's also the open issue of how to best handle contributions made
> via git. Should we always insist on patches or would a pull request be
> OK? It would be good to have some documented best practice for such
> cases.

I think pull requests should be allowed. On the other hand this will force committers willing to
merge contributions to use Git. Not sure how people will react on such situation.

If we are going to allow exchanging Git trees (repositories) instead of plain patches then we should
establish a policy that non-committers are considered as a leaf developers.
This implies that contributor can send a pull request for a tree that contains shes own patches or
patches coming from committers but not from other contributors. Basically, contributors should be
allowed to merge from committers only.

This way committers willing to handle pull request have an easy job when it comes to verification if
all changes are covered by ICLA.

This also addresses the concern (message id: 393918.154.qm@web54406.mail.yahoo.com and the rest of
the thread) that Joe has raised at members@ mailing list about addressing authorship in a reliable way.

> Another issue to think about is our approach to people publishing
> their clones on places like github. On one hand it's good when people
> do that as making your working copy public is one area where git
> really helps collaboration. On the other hand we'll want to make sure
> that development efforts won't splinter to other forums.

The good thing about GitHub is that it does not provide any communication means. I think that as
long as communication happens on Apache mailing list and the final result is being committed into
svn we shouldn't worry to much about GitHub and similar sites.

-- 
Best regards,
Grzegorz Kossakowski

Re: Next steps with git (Was: Added a simple tutorial on Git cloning)

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Wed, Jan 7, 2009 at 4:07 AM, Grzegorz Kossakowski
<gk...@apache.org> wrote:
> Do you have in mind the problem that they are having three incompatible Git repositories so they
> cannot merge anything or maybe the problem with the big code dump?

Both.  =)  -- justin

Re: Next steps with git (Was: Added a simple tutorial on Git cloning)

Posted by Grzegorz Kossakowski <gk...@apache.org>.
Hello Justin,

Justin Erenkrantz pisze:
> On Wed, Dec 31, 2008 at 12:17 AM, Jukka Zitting <ju...@gmail.com> wrote:
>> c) Improve and extend the documentation we now have in the wiki and
>> move it to an appropriate location under www.apache.org/dev.
> 
> Looking at some of the stuff memcached is going through with git might
> be useful to think about as well:
> 
> http://groups.google.com/group/memcached/browse_thread/thread/1c5833a50df4cea1

Do you have in mind the problem that they are having three incompatible Git repositories so they
cannot merge anything or maybe the problem with the big code dump?

-- 
Best regards,
Grzegorz Kossakowski

Re: Next steps with git (Was: Added a simple tutorial on Git cloning)

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Wed, Dec 31, 2008 at 12:17 AM, Jukka Zitting <ju...@gmail.com> wrote:
> c) Improve and extend the documentation we now have in the wiki and
> move it to an appropriate location under www.apache.org/dev.

Looking at some of the stuff memcached is going through with git might
be useful to think about as well:

http://groups.google.com/group/memcached/browse_thread/thread/1c5833a50df4cea1

(I hope the link works.  The subject is: 'facebook memcached on github')

HTH.  -- justin