You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Carmen Alzzer <ca...@gisit.dk> on 2020/04/06 22:05:49 UTC

Preferred Subversion 1.13 MPM

hi guys,

so im starting to tune a Subversion installation, that runs slow on 
merges. So im starting looking into apache MPM - current its setup to 
event MPM in Apache2.4, but i figure it perhaps should be preforked. 
Whats your recommmandations ?

the installation is 80GBcode in 150+ repos - apache 2.4.43 and im 
upgrading subversion from 1.12.2 to 1.13 tomorrow - it runs on FreeBSD 
11.3 and theres on LAN traffic going through http, and traffic is on an 
average day 700.000 requests to the webserver all inclusive :)

The developers are complaining that merging takes way too long and theyd 
rather shift for git enterprise bling bling. any help is much 
appreciated :)


cheers and thanks in advance
Carmen

Re: Preferred Subversion 1.13 MPM

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Apr 07, 2020 at 09:43:40AM +0200, Stefan Sperling wrote:
> On Tue, Apr 07, 2020 at 12:05:49AM +0200, Carmen Alzzer wrote:
> > hi guys,
> > 
> > so im starting to tune a Subversion installation, that runs slow on merges.
> > So im starting looking into apache MPM - current its setup to event MPM in
> > Apache2.4, but i figure it perhaps should be preforked. Whats your
> > recommmandations ?
> 
> What makes you think changing the MPM would solve the problem?

To clarify, I was asking this to understand your reasoning, not trying
to question your competence (I"m sorry if it came across that way).

And if none of the suggestion I made apply to your situation, please
provide more details so we can try to look for more suggestions :)

Re: Preferred Subversion 1.13 MPM

Posted by Johan Corveleyn <jc...@gmail.com>.
On Tue, Apr 14, 2020 at 10:25 AM Carmen Alzzer <ca...@gisit.dk> wrote:
>
> helo friends,
> while this is still hot on here - can you please tip in how your MPM are
> handled? also if you have thoughts upon reading my setup, i would be
> delighted to hear from you :)
>
> cheers
> Carmen

Hi Carmen,

[ Sidenote: we prefer "bottom-posting" or "inline replying" on this list. ]

I have never looked at tweaking the MPM myself, and I have not yet
seen much discussion about it here. We're using the defaults of our
distribution (we use a CollabNet package of SVN, which includes httpd
2.2.23), which is prefork (with default settings). We're running it on
a VM with 4 Intel Xeon CPU's and 8 GB RAM, running Ubuntu 16.04. I
think the MPM will not be a big performance factor in most cases. IMHO
the things with the most impact on performance are not MPM-related
(and some are more client related). So, following Stefan, I would
suggest first making sure the "known performance knobs" are in order.
I'll try to give you "my list" here below.

In my experience, the most important things for SVN performance are:

Server side:
    - httpd config:
        MaxKeepAliveRequests 10000 (or higher, we use 20000) [1]
        SVNInMemoryCacheSize 131072 (=128 MB -- or higher if you have
enough RAM), or some suitable value [2]
        Cache authentication results (LDAP caching) (though I believe
this is less important if KeepAlive works well)
    - If you don't need path-based authorization, disable it
(SVNPathAuthz off). If you have to use path-based authz, try
"SVNPathAutzh short_circuit" [3].
    - FSFS back-end must be on fast storage, accessible by the server
as a local disk (not NFS / SMB mounted -- that's way too slow)
    - Use FSFS format 8 with lz4 compression, and do a full dump &
load to this format if you can. (see [4] for a step-by-step guide)
    - 'svnadmin pack' your repositories regularly (we use a nighly
cron job for this)
    - If you have problems with commit performance, check your hook
scripts (start-commit, pre-commit, post-commit) for any custom code
that could be slowing things down too much (be very careful if you
loop over all changes in a commit, and do $stuff for every modified /
added / deleted file ... this scales with the size of your commit, so
make sure $stuff is as fast as possible, and avoid unnecessary work
(early-exit wherever you can)). If you're not sure, try to inject some
measurement code in your hook scripts [5].

Client side:
    - Make sure they use recent clients! At least 1.10, preferably
even the latest (1.13).
    - If a working copy acts really slow, ask them to run "svn
cleanup" (cleaning up the pristine store, and most importantly fixing
the timestamps of the files in the svn metadata). This is especially
important if they have copied over their (large) working copy from one
disk to another (depending on how this is done, this gives all files a
new lastmod-time, which makes the working copy very slow because svn
cannot use its metadata anymore as a shortcut to see if a file was
modified ... it has to read + checksum every file on many operations).
    - Avoid using working copies on networked drives (NFS, CIFS, ...).
This can be really slow.
    - If you really must use a networked location for your working
copy, try if it helps if you put "exclusive-locking = true" in the
[working-copy] section of the ~/.subversion/config (*nix) or
%APPDATA%/Subversion/config (Windows). [6]

Client-Server network:
    - Be very wary of proxies etc. If you're running a corporate
installation, try to arrange that your clients can connect directly to
the svn server.
    - On a LAN, of course wired works best (1 Gbit, or even 100 Mbit).
But wireless (if enough capacity) works fine too. These days, most of
our users are working from home over VPN, and that works very well
too.

[1] https://subversion.apache.org/docs/release-notes/1.8.html#neon-deleted
[2] https://subversion.apache.org/docs/release-notes/1.7.html#server-performance-tuning
[3] http://svnbook.red-bean.com/nightly/en/svn.serverconfig.httpd.html#svn.serverconfig.httpd.ref.mod_dav_svn
[4] https://subversion.apache.org/faq.html#dumpload
[5] I'm using this helper code in my pre-commit hook (in Bourne shell):

[[[
DEBUG=1
logStart()
{
    if [ $DEBUG ]
    then
        START_DATE=`date +'%F %T'`
        START=`/usr/local/perl/bin/perl -MTime::HiRes -e 'print
Time::HiRes::gettimeofday.""'`
        AUTHOR=`$SVNLOOK author $REPOS -t $TXN`
        CHANGED=`$SVNLOOK changed $REPOS -t $TXN`
        export CHANGED     # Export so it can be reused (other scripts
check for this variable)
        NR_CHANGES=`echo "$CHANGED" | wc -l`
    fi
}

# logEndAndExit $exitCode
# Logs the end of the script with timing information, and exits with $exitCode
logEndAndExit()
{
    EXIT_CODE=$1
    if [ $DEBUG ]
    then
        ELAPSED=`/usr/local/perl/bin/perl -MTime::HiRes -e 'printf
("%.3f", Time::HiRes::gettimeofday - $ARGV[0])' $START`
        # start-datetime end-datetime author nr-changes txn-key
exit-code elapsed-millis
        echo "$START_DATE\t`date +'%F %T'`\t$AUTHOR
\t$NR_CHANGES\t$TXN\t$EXIT_CODE\t$ELAPSED" >>
$REPOS/hooks/logs/pre-commit.log
    fi
    exit $EXIT_CODE
}

logStart

# Do useful stuff. If exiting with an error, invoke logEndAndExit $errorcode

logEndAndExit 0
]]]

(of course, make sure the $REPOS/hooks/logs directory exists and is
writable by the server process)

In the post-commit hook I have similar code, except using $REV instead
of $TXN, and writing to post-commit.log instead of pre-commit.log of
course.
In the start-commit log I only have this simple line to track when it
was run (the contents of the script are negligible):
    START_DATE=`date +'%F %T'`
    echo "$START_DATE\t`date +'%F %T'`\t$USER" >>
$REPOS/hooks/logs/start-commit.log

You can use the users and timestamps, and $TXN and $REV numbers to
correlate things to each other, so you can see when a particular user
entered the start-commit phase, and started and left the pre-commit
and the post-commit. That's useful when investigating concrete
complaints. Further: the time between start-commit and pre-commit is
also quite interesting, as that's the point when file contents are
being transferred and a transaction is built up on the server; and
time between pre-commit and post-commit is the time the server needed
to make an actual revision out of the transaction.


[6] https://subversion.apache.org/docs/release-notes/1.8.html#exclusivelocking


PS: If someone wants to add this to some (wiki) page, faq, ... go
right ahead :-).

-- 
Johan

Re: Preferred Subversion 1.13 MPM

Posted by Carmen Alzzer <ca...@gisit.dk>.
helo friends,
while this is still hot on here - can you please tip in how your MPM are 
handled? also if you have thoughts upon reading my setup, i would be 
delighted to hear from you :)

cheers
Carmen

On 2020-04-08 00:37, Carmen Alzzer wrote:
> hello friends,
> 
> Thanks for replying - no hard feelings for getting fast up from the 
> seat :)
> 
> I was thinking the MPM might have a saying, since the process
> management of a certain worker might be optimal for the way Subversion
> operates - i notice that many requests are sent to the server from
> even smaller operations in an SVN client.
> 
> What i have picked up from your bag of tricks - is LDAPcaching - now i
> referred to apaches docs for mod_ldap and defined cache TTL to 1 hour
> which is reasonable for my herd of users. It seems that it might have
> had an impact, but i have to have a users verify that it shortened
> runtime, and how much it possibly was.
> 
> But im chasing anything that i can on this, and surely i have a lot to
> learn about how to properly operate Subversion for production
> purposes. So what MPM have you guys configured for apache? -and what
> sort of other information could you be interested in Stefan?
> Currently i am using event MPM..
> As background, yes i authenticate on LDAP, no encryption towards AD,
> and i interface AD directly on IP, so no delay looking up DNS. Also i
> only host the service via mod_dav_svn on http for the local network,
> so theres no https encryption in play in the webserver. I have no per
> directory restrictions, if a user has access to / then he has access
> to all repos underneath - now then i just host several vhosts with
> different having different restrictions.. so in many ways its a very
> simple solution.
> 
> 
> cheers
> Carmen
> 
> 
> 
> On 2020-04-07 18:20, Nathan Hartman wrote:
>> On Tue, Apr 7, 2020 at 3:44 AM Stefan Sperling <st...@elego.de> wrote:
>>> Keep in mind most of the work performed during merges occurs 
>>> client-side.
>>> If people are running old clients, ask them to upgrade.
>>> 
>>> There are some server-configuration factors which can increase 
>>> latency
>>> between client and servers, or which affect server-side performance 
>>> in
>>> general, and those could be relevant:
>> 
>> (snip)
>> 
>> This is an important question, and an important thread.
>> 
>> I'm not aware of a consolidated list of client/server performance tips
>> in our documentation, so I'd like to address that by saving the above
>> list and whatever else we learn here. (Perhaps as a FAQ.) I'll wait
>> for now, in case more suggestions come up. (e.g., delays due to DNS)
>> 
>> Nathan

Re: Preferred Subversion 1.13 MPM

Posted by Carmen Alzzer <ca...@gisit.dk>.
hello friends,

Thanks for replying - no hard feelings for getting fast up from the seat 
:)

I was thinking the MPM might have a saying, since the process management 
of a certain worker might be optimal for the way Subversion operates - i 
notice that many requests are sent to the server from even smaller 
operations in an SVN client.

What i have picked up from your bag of tricks - is LDAPcaching - now i 
referred to apaches docs for mod_ldap and defined cache TTL to 1 hour 
which is reasonable for my herd of users. It seems that it might have 
had an impact, but i have to have a users verify that it shortened 
runtime, and how much it possibly was.

But im chasing anything that i can on this, and surely i have a lot to 
learn about how to properly operate Subversion for production purposes. 
So what MPM have you guys configured for apache? -and what sort of other 
information could you be interested in Stefan?
Currently i am using event MPM..
As background, yes i authenticate on LDAP, no encryption towards AD, and 
i interface AD directly on IP, so no delay looking up DNS. Also i only 
host the service via mod_dav_svn on http for the local network, so 
theres no https encryption in play in the webserver. I have no per 
directory restrictions, if a user has access to / then he has access to 
all repos underneath - now then i just host several vhosts with 
different having different restrictions.. so in many ways its a very 
simple solution.


cheers
Carmen



On 2020-04-07 18:20, Nathan Hartman wrote:
> On Tue, Apr 7, 2020 at 3:44 AM Stefan Sperling <st...@elego.de> wrote:
>> Keep in mind most of the work performed during merges occurs 
>> client-side.
>> If people are running old clients, ask them to upgrade.
>> 
>> There are some server-configuration factors which can increase latency
>> between client and servers, or which affect server-side performance in
>> general, and those could be relevant:
> 
> (snip)
> 
> This is an important question, and an important thread.
> 
> I'm not aware of a consolidated list of client/server performance tips
> in our documentation, so I'd like to address that by saving the above
> list and whatever else we learn here. (Perhaps as a FAQ.) I'll wait
> for now, in case more suggestions come up. (e.g., delays due to DNS)
> 
> Nathan

Re: Preferred Subversion 1.13 MPM

Posted by Nathan Hartman <ha...@gmail.com>.
On Tue, Apr 7, 2020 at 3:44 AM Stefan Sperling <st...@elego.de> wrote:
> Keep in mind most of the work performed during merges occurs client-side.
> If people are running old clients, ask them to upgrade.
>
> There are some server-configuration factors which can increase latency
> between client and servers, or which affect server-side performance in
> general, and those could be relevant:

(snip)

This is an important question, and an important thread.

I'm not aware of a consolidated list of client/server performance tips
in our documentation, so I'd like to address that by saving the above
list and whatever else we learn here. (Perhaps as a FAQ.) I'll wait
for now, in case more suggestions come up. (e.g., delays due to DNS)

Nathan

Re: Preferred Subversion 1.13 MPM

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Apr 07, 2020 at 12:05:49AM +0200, Carmen Alzzer wrote:
> hi guys,
> 
> so im starting to tune a Subversion installation, that runs slow on merges.
> So im starting looking into apache MPM - current its setup to event MPM in
> Apache2.4, but i figure it perhaps should be preforked. Whats your
> recommmandations ?

What makes you think changing the MPM would solve the problem?
 
> the installation is 80GBcode in 150+ repos - apache 2.4.43 and im upgrading
> subversion from 1.12.2 to 1.13 tomorrow - it runs on FreeBSD 11.3 and theres
> on LAN traffic going through http, and traffic is on an average day 700.000
> requests to the webserver all inclusive :)
> 
> The developers are complaining that merging takes way too long and theyd
> rather shift for git enterprise bling bling. any help is much appreciated :)

You're not providing much details but I'll write a list of common problems
below.

Keep in mind most of the work performed during merges occurs client-side.
If people are running old clients, ask them to upgrade.

There are some server-configuration factors which can increase latency
between client and servers, or which affect server-side performance in
general, and those could be relevant:

- Make sure that authentication is fast. You're not providing any details,
  but if you're using a centralized service like LDAP or AD make sure that
  the SVN server will get a response from this service as fast as possible.
  Avoid configurations which result in per-request round-trips between SVN
  server and another server, and if possible even avoid such round-trips
  altogether. Cache authorized credentials! (e.g. use LDAP cache in mod_ldap)

- If the SVN server is talking SSL/TLS to any other service for any purpose,
  ensure that certificates are valid and avoid certificate revocation
  mechanisms that require sending additional requests like OCSP.
  (Especially if OCSP info in certificates somewhere in the chain points at
  a non-existent hostname it will add a lot of overhead; I have seen this
  in the wild with certs generated from AD)

- Configure high TCP Keepalive limits to avoid too many authentication
  handshakes (i.e. bump MaxKeepAliveRequests, see:
  https://subversion.apache.org/docs/release-notes/1.8.html#neon-deleted)

- Make sure that path-based-authorization rules with mod_authz_svn are used
  only to the minimum extent necessary. Lookups can be very expensive so
  ideally you'd only put path-based authorization rules in place if you
  need to *hide* something from the view of some users and otherwise let
  anyone who manages to authenticate read anywhere in the repository.

- If your performance problems are due to disk reads (i.e. you see a very
  heavy I/O load on the SVN server) tune your server's in-memory caching:
  https://subversion.apache.org/docs/release-notes/1.7.html#server-performance-tuning
  And consider a dump/load of large repositories that aren't FSFS v7/v8 yet:
  https://subversion.apache.org/docs/release-notes/1.9.html#fsfs-improvements
  And make sure to enable lz4 with FSFS v8 if you do a full dump/load cycle:
  https://subversion.apache.org/docs/release-notes/1.10.html#lz4