You are viewing a plain text version of this content. The canonical link for it is here.
Posted to infrastructure-dev@apache.org by Paul Querna <pa...@querna.org> on 2009/03/30 00:24:52 UTC

Subversion Server Plan

Right now eris.apache.org is pretty stressed, and has roughly doubled
the traffic compared to what it was doing a year ago with the same
hardware. At the same time Harmonia, the machine that hosts
svn.eu.apache.org is barely being worked at all.

Related to this is upgrading to Subverison 1.6 on the server, and
reloading the repository, which should net us a ~20% saving on space,
and performance increases from the packing of the repository, but this
project is currently stalled as we don't have enough spare IO ops on
eris to run the conversion.  I will look into finishing the upgrade in
the next week or two.

Bellow is the rough set of steps I'd like to go through to hopefully
massively increase subversion performance and latency for everyone at
the ASF:

- Add dns 'svn-master.apache.org' which points to eris. [done]

- Update all svnsync scripts to all use svn-master instead of svn.a.o

- Mirror all repositories, not just the public one, so one base URL
works for everyone.

- finish upgrading minotaur aka people.apache.org to FreeBSD 7.2, and
move all data to the new /x2 array.

- detach /x1 array from minotaur, attach to thor (powervault 220s, 14x
146gb scsi)

- Create svn-mirror zone on thor, give access to disk array, create 14
disk raid2z. (yay spindles)

- Seutp svn-mirror to reverse proxy & act as a svn slave.

- Add DNS svn.us.apache.org pointing at svn-mirror zone

- Setup svn.apache.org with GeoIP based resolving:
   - svn.us.apache.org and svn.eu.apache.org

- For git-svn / dcommit, have it hit git-master.apache.org directly,
but try to avoid publishing this URL for most users. (thoughts?)

- Investigate adding more Subversion mirrors as time goes on. (Asia or
Australia is like a good place to be looking, maybe even a Virtualized
box at a hosting company to get us started -- though it would have
careful as we would have private repositories on it )


I would like to figure out a way to make dcommit work with the reverse
proxy slaves, but I think we would need to work closer with the
Subversion developers on improving how the slaves work in general.

Re: Subversion Server Plan

Posted by Paul Querna <pa...@querna.org>.
> - Setup svn.apache.org with GeoIP based resolving:
>   - svn.us.apache.org and svn.eu.apache.org

We have setup svn.geo.apache.org, which currently is geo balanced
using pgeodns[1].

If you can try it out, and lemme know if you see any bad issues.

Currently only has the main asf repository, but I'm hopping to add the
rest of the private repositories in the next few days.

If its working good, I'd like to consider CNAME'ing svn.apache.org to
it next week.

[1] - http://geo.bitnames.com/

Re: Subversion Server Plan

Posted by Paul Querna <pa...@querna.org>.
On Mon, Mar 30, 2009 at 1:56 PM, Kevin Menard <ni...@gmail.com> wrote:
> On Mon, Mar 30, 2009 at 7:49 AM, sebb <se...@gmail.com> wrote:
>> On 30/03/2009, Paul Querna <pa...@querna.org> wrote:
>>> On Mon, Mar 30, 2009 at 6:55 AM, Sander Temme <sc...@apache.org> wrote:
>
>> Git should be more patient with proxies - it's not yet an old dog, so
>> can perhaps be taught some new tricks ;-)
>
> Perhaps, but we've seen propagation delays of several hours for very
> large commits.

For the most part the only times it has been that bad was due to a
svnadmin load/import from a new incubator project.

Re: Subversion Server Plan

Posted by Kevin Menard <ni...@gmail.com>.
On Mon, Mar 30, 2009 at 7:49 AM, sebb <se...@gmail.com> wrote:
> On 30/03/2009, Paul Querna <pa...@querna.org> wrote:
>> On Mon, Mar 30, 2009 at 6:55 AM, Sander Temme <sc...@apache.org> wrote:

> Git should be more patient with proxies - it's not yet an old dog, so
> can perhaps be taught some new tricks ;-)

Perhaps, but we've seen propagation delays of several hours for very
large commits.

-- 
Kevin

Re: Subversion Server Plan

Posted by Paul Querna <pa...@querna.org>.
On Mon, Mar 30, 2009 at 10:39 PM, Santiago Gala <sa...@gmail.com> wrote:
> El lun, 30-03-2009 a las 12:49 +0100, sebb escribió:
>> On 30/03/2009, Paul Querna <pa...@querna.org> wrote:
>> > On Mon, Mar 30, 2009 at 6:55 AM, Sander Temme <sc...@apache.org> wrote:
>> >  >
>> >  > On Mar 29, 2009, at 3:24 PM, Paul Querna wrote:
>> >  >
>> >  >> - Setup svn.apache.org with GeoIP based resolving:
>> >  >>  - svn.us.apache.org and svn.eu.apache.org
>> >  >>
>> >  >> - For git-svn / dcommit, have it hit git-master.apache.org directly,
>> >  >> but try to avoid publishing this URL for most users. (thoughts?)
>> >  >
>> >  >
>> >  > Reverse proxy it from the above?  How would Git be told to behave
>> >  > differently?
>> >
>> >
>> > Joe explained it much better than I could:
>> >  """" The problem appears whenever someone wants to check-in a sequence
>> >  of commits made with git and they're using a mirror for that purpose.
>> >  They run git dcommit and that generates a bunch of svn transactions
>> >  that basically look like svn commit, svn up, svn commit, svn up, etc.
>> >  The svn up part chokes of course, because the mirror hasn't picked up
>> >  the commit yet, causing git to throw an error and tell the user to
>> >  "rebase".
>> >  """"
>> >
>>
>> Git should be more patient with proxies - it's not yet an old dog, so
>> can perhaps be taught some new tricks ;-)
>
> Subversion was sold to us (and probably to the git-svn developers) as
> something with a feature called "atomic commits"... if git-svn commits
> and updates what it committed later on, as sanity check (and maybe
> because of keyword expansion or other nasty tricks that can spoil the
> crypto hash), and the reload does not contain the commit it just did, I
> think panicking is the only reasonable behaviour.
>
> If the mirroring of svn leaks [1], and we all know in network
> programming that all abstractions do leak, I don't think it will be easy
> to work around it in git-svn.
>
> A svn mirror should not return a revision as committed until it can
> provide it to clients, or else should block on updates to "hot"
> revisions (those committed through it) until they become available. I
> mean, a HTTP proxy passing POSTs through, while allowing GETs of the
> stale resources would be considered as broken, I think.
>

It is possible to make replication somewhat-blocking to the origin
slave, although it would mean some commits would take a very long
time, but any incentive for people to make smaller commits might be
good :P

(blocking in the post-commit hook for it to finish, as the svn client
doesn't get the okay until post-commit finishes)

Re: Subversion Server Plan

Posted by Paul Querna <pa...@querna.org>.
On Tue, Mar 31, 2009 at 3:14 AM, Roy T. Fielding <fi...@gbiv.com> wrote:
> On Mar 30, 2009, at 1:39 PM, Santiago Gala wrote:
>>
>> Subversion was sold to us (and probably to the git-svn developers) as
>> something with a feature called "atomic commits"... if git-svn commits
>> and updates what it committed later on, as sanity check (and maybe
>> because of keyword expansion or other nasty tricks that can spoil the
>> crypto hash), and the reload does not contain the commit it just did, I
>> think panicking is the only reasonable behaviour.
>
> Keyword expansion and "other nasty tricks" are irrelevant to the
> process of applying a sequential set of patches, so doing an update
> between each commit is not only stupid but recklessly poor behavior
> for a client hitting a shared service.  This is a bug in git-svn,
> since it isn't doing what its own docs say (the update is only
> supposed to occur after all the patches are applied).
>
> I am curious what happens when someone does
>
>   git svn dcommit --no-rebase
>
> Does it still puke?

The problem isn't isolated to just git.

start at wc and server at revision 100

echo "a" > foo
echo "b" > bar
svn add foo
svn add bar
svn commit foo && svn commit bar

This should work fine if you were hitting the master directly, but if
you are hitting a slave, and the second svn commit starts before the
slave has a copy of the new revision (r101) added by the first commit,
then the svn client will error out, because your working copy is at
r101, but the slave only knows about r100.

I believe it would be possible to improve the server side svn proxy to
just proxy anything it doesn't know about to the master, but you would
need to buffer the request body on the server side on disk, until you
figure out if you are knowledgable about that revisions, then you need
to unwind out of deep into svn stuff, and proxy the entire request to
the master.

Better of course would the subversion client to have 'native' ro
mirror support -- a few properties on the root of the repository with
addresses of ro mirrors, exchanged by capabilities, and the serf
client would just do all of its bulk IO to these mirrors......

Re: Subversion Server Plan

Posted by Luciano Resende <lu...@gmail.com>.
On Mon, Mar 30, 2009 at 6:14 PM, Roy T. Fielding <fi...@gbiv.com> wrote:
> On Mar 30, 2009, at 1:39 PM, Santiago Gala wrote:
>>
>> Subversion was sold to us (and probably to the git-svn developers) as
>> something with a feature called "atomic commits"... if git-svn commits
>> and updates what it committed later on, as sanity check (and maybe
>> because of keyword expansion or other nasty tricks that can spoil the
>> crypto hash), and the reload does not contain the commit it just did, I
>> think panicking is the only reasonable behaviour.
>
> Keyword expansion and "other nasty tricks" are irrelevant to the
> process of applying a sequential set of patches, so doing an update
> between each commit is not only stupid but recklessly poor behavior
> for a client hitting a shared service.  This is a bug in git-svn,
> since it isn't doing what its own docs say (the update is only
> supposed to occur after all the patches are applied).
>
> I am curious what happens when someone does
>
>   git svn dcommit --no-rebase
>
> Does it still puke?

Yes, --no-rebase does not seem to help at all in this scenario.

>
> ....Roy
>
>



-- 
Luciano Resende
Apache Tuscany, Apache PhotArk
http://people.apache.org/~lresende
http://lresende.blogspot.com/

Re: Subversion Server Plan

Posted by Santiago Gala <sa...@gmail.com>.
El lun, 30-03-2009 a las 18:14 -0700, Roy T. Fielding escribió: 
> On Mar 30, 2009, at 1:39 PM, Santiago Gala wrote:
> > Subversion was sold to us (and probably to the git-svn developers) as
> > something with a feature called "atomic commits"... if git-svn commits
> > and updates what it committed later on, as sanity check (and maybe
> > because of keyword expansion or other nasty tricks that can spoil the
> > crypto hash), and the reload does not contain the commit it just  
> > did, I
> > think panicking is the only reasonable behaviour.
> 
> Keyword expansion and "other nasty tricks" are irrelevant to the
> process of applying a sequential set of patches, so doing an update

It is not irrelevant for git, see below.

> between each commit is not only stupid but recklessly poor behavior
> for a client hitting a shared service.  This is a bug in git-svn,
> since it isn't doing what its own docs say (the update is only
> supposed to occur after all the patches are applied).
> 

If one bit changes, the git commit changes. Those are the rules for
git-as-a-filesystem: the name of the commit is the SHA-1 of the commit.
And the commit includes the tree, and the SHA-1 of the parent(s)... So
if there is a risk that the server changes *any* bit, the commit chain
would be broken unless git-svn gets the new one "as the server sees it",
and rebases to it before committing the next one.

git gives integrity of commits, at the cost of needing to know all the
bits in every commit before doing the next one.

> I am curious what happens when someone does
> 
>     git svn dcommit --no-rebase
> 
> Does it still puke?
> 

>>From the sources, it will print:

      warn "Attempting to commit more than one change while",
           "--no-rebase is enabled.\n",
           "If these changes depend on each other, re-running ",
           "without --no-rebase may be required."

I guess it will break, as pquerna said, because the mirror does not know
about the base revision for the commit, irrespective of the rebase
taking place or not.

Regards
Santiago

> ....Roy
> 


Re: Subversion Server Plan

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On Mar 30, 2009, at 1:39 PM, Santiago Gala wrote:
> Subversion was sold to us (and probably to the git-svn developers) as
> something with a feature called "atomic commits"... if git-svn commits
> and updates what it committed later on, as sanity check (and maybe
> because of keyword expansion or other nasty tricks that can spoil the
> crypto hash), and the reload does not contain the commit it just  
> did, I
> think panicking is the only reasonable behaviour.

Keyword expansion and "other nasty tricks" are irrelevant to the
process of applying a sequential set of patches, so doing an update
between each commit is not only stupid but recklessly poor behavior
for a client hitting a shared service.  This is a bug in git-svn,
since it isn't doing what its own docs say (the update is only
supposed to occur after all the patches are applied).

I am curious what happens when someone does

    git svn dcommit --no-rebase

Does it still puke?

....Roy


Re: Subversion Server Plan

Posted by Santiago Gala <sa...@gmail.com>.
El lun, 30-03-2009 a las 12:49 +0100, sebb escribió:
> On 30/03/2009, Paul Querna <pa...@querna.org> wrote:
> > On Mon, Mar 30, 2009 at 6:55 AM, Sander Temme <sc...@apache.org> wrote:
> >  >
> >  > On Mar 29, 2009, at 3:24 PM, Paul Querna wrote:
> >  >
> >  >> - Setup svn.apache.org with GeoIP based resolving:
> >  >>  - svn.us.apache.org and svn.eu.apache.org
> >  >>
> >  >> - For git-svn / dcommit, have it hit git-master.apache.org directly,
> >  >> but try to avoid publishing this URL for most users. (thoughts?)
> >  >
> >  >
> >  > Reverse proxy it from the above?  How would Git be told to behave
> >  > differently?
> >
> >
> > Joe explained it much better than I could:
> >  """" The problem appears whenever someone wants to check-in a sequence
> >  of commits made with git and they're using a mirror for that purpose.
> >  They run git dcommit and that generates a bunch of svn transactions
> >  that basically look like svn commit, svn up, svn commit, svn up, etc.
> >  The svn up part chokes of course, because the mirror hasn't picked up
> >  the commit yet, causing git to throw an error and tell the user to
> >  "rebase".
> >  """"
> >
> 
> Git should be more patient with proxies - it's not yet an old dog, so
> can perhaps be taught some new tricks ;-)

Subversion was sold to us (and probably to the git-svn developers) as
something with a feature called "atomic commits"... if git-svn commits
and updates what it committed later on, as sanity check (and maybe
because of keyword expansion or other nasty tricks that can spoil the
crypto hash), and the reload does not contain the commit it just did, I
think panicking is the only reasonable behaviour.

If the mirroring of svn leaks [1], and we all know in network
programming that all abstractions do leak, I don't think it will be easy
to work around it in git-svn.

A svn mirror should not return a revision as committed until it can
provide it to clients, or else should block on updates to "hot"
revisions (those committed through it) until they become available. I
mean, a HTTP proxy passing POSTs through, while allowing GETs of the
stale resources would be considered as broken, I think.

[1] http://www.joelonsoftware.com/articles/LeakyAbstractions.html

Regards
Santiago


Re: Subversion Server Plan

Posted by sebb <se...@gmail.com>.
On 30/03/2009, Paul Querna <pa...@querna.org> wrote:
> On Mon, Mar 30, 2009 at 6:55 AM, Sander Temme <sc...@apache.org> wrote:
>  >
>  > On Mar 29, 2009, at 3:24 PM, Paul Querna wrote:
>  >
>  >> - Setup svn.apache.org with GeoIP based resolving:
>  >>  - svn.us.apache.org and svn.eu.apache.org
>  >>
>  >> - For git-svn / dcommit, have it hit git-master.apache.org directly,
>  >> but try to avoid publishing this URL for most users. (thoughts?)
>  >
>  >
>  > Reverse proxy it from the above?  How would Git be told to behave
>  > differently?
>
>
> Joe explained it much better than I could:
>  """" The problem appears whenever someone wants to check-in a sequence
>  of commits made with git and they're using a mirror for that purpose.
>  They run git dcommit and that generates a bunch of svn transactions
>  that basically look like svn commit, svn up, svn commit, svn up, etc.
>  The svn up part chokes of course, because the mirror hasn't picked up
>  the commit yet, causing git to throw an error and tell the user to
>  "rebase".
>  """"
>

Git should be more patient with proxies - it's not yet an old dog, so
can perhaps be taught some new tricks ;-)

Re: Subversion Server Plan

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mar 29, 2009, at 3:24 PM, Paul Querna wrote:
> - For git-svn / dcommit, have it hit git-master.apache.org directly,
> but try to avoid publishing this URL for most users. (thoughts?)

The mirrors at git.apache.org now publicly point to svn.apache.org but
internally use svn.eu.apache.org when pulling changes from svn. It's a
reasonably straightforward process to change the address of that
master svn server.

On Mon, Mar 30, 2009 at 8:25 AM, Paul Querna <pa...@querna.org> wrote:
> Joe explained it much better than I could:
> """" The problem appears whenever someone wants to check-in a sequence
> of commits made with git and they're using a mirror for that purpose.
> They run git dcommit and that generates a bunch of svn transactions
> that basically look like svn commit, svn up, svn commit, svn up, etc.
> The svn up part chokes of course, because the mirror hasn't picked up
> the commit yet, causing git to throw an error and tell the user to
> "rebase".
> """"

I think we can (should) address that issue by educating users to keep
their offline commit sequences short. If you're online and you have
svn commit karma, there aren't many (any?) cases cases where it would
make sense to pile up more than a single local commit.

One notable issue with that is that creating and managing development
branches is much easier in git, which may be one reason why people are
opting to keep such branches in their git repositories and only
dcommit the changes to svn trunk once they're done. It would be useful
to document how to best interact with svn in such cases.

BR,

Jukka Zitting

Re: Subversion Server Plan

Posted by Paul Querna <pa...@querna.org>.
On Mon, Mar 30, 2009 at 6:55 AM, Sander Temme <sc...@apache.org> wrote:
>
> On Mar 29, 2009, at 3:24 PM, Paul Querna wrote:
>
>> - Setup svn.apache.org with GeoIP based resolving:
>>  - svn.us.apache.org and svn.eu.apache.org
>>
>> - For git-svn / dcommit, have it hit git-master.apache.org directly,
>> but try to avoid publishing this URL for most users. (thoughts?)
>
>
> Reverse proxy it from the above?  How would Git be told to behave
> differently?

Joe explained it much better than I could:
"""" The problem appears whenever someone wants to check-in a sequence
of commits made with git and they're using a mirror for that purpose.
They run git dcommit and that generates a bunch of svn transactions
that basically look like svn commit, svn up, svn commit, svn up, etc.
The svn up part chokes of course, because the mirror hasn't picked up
the commit yet, causing git to throw an error and tell the user to
"rebase".
""""

Re: Subversion Server Plan

Posted by Sander Temme <sc...@apache.org>.
On Mar 29, 2009, at 3:24 PM, Paul Querna wrote:

> - Setup svn.apache.org with GeoIP based resolving:
>   - svn.us.apache.org and svn.eu.apache.org
>
> - For git-svn / dcommit, have it hit git-master.apache.org directly,
> but try to avoid publishing this URL for most users. (thoughts?)


Reverse proxy it from the above?  How would Git be told to behave  
differently?

S. (my git-fu is weak)

-- 
Sander Temme
sctemme@apache.org
PGP FP: 51B4 8727 466A 0BC3 69F4  B7B8 B2BE BC40 1529 24AF