You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Steve Seremeth <su...@seremeth.com> on 2005/03/15 21:34:45 UTC

Best Practices: Performance on Large Repositories?

Hello -

We have a repository of about 11000 files totalling about 1200 MB.  A 
commit or a status done at the trunk dir level takes as much as 30 
minutes whether using the svn command line (Windows XP or AIX) or 
Tortoise -- and seems to use very little CPU.  Also, the server (RH ES 3 
on 2.4 Ghz Xeon using apache/berkeley db) just idles most of the time 
and seems to be doing nothing during any of these actions.

What is the best way to improve performance?  Since the server is idling 
I'm assuming it's not a software problem on the repository server.  Are 
there benchmarks somewhere that we can look at to see how far off our 
setup is or is there anything we can do to semi-officially test 
performance against known good data?

We end up doing a lot of "time svn commit trunk".

Any advice would be greatly appreciated.  If we're doing something 
wrong, that would be welcome news.  If we are not doing something wrong, 
perhaps relative performance information should be posted in the FAQ?

TIA -

Steve

P.S.  I know we would be doing better if we were working on smaller 
pieces of the repository at any one time.  No one interested in "cvs 
modules"-style functionality?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Best Practices: Performance on Large Repositories?

Posted by Eric Gillespie <ep...@pretzelnet.org>.
Steve Seremeth <su...@seremeth.com> writes:

> We have a repository of about 11000 files totalling about 1200 MB.

The repository is perfectly happy with data sets far larger than
that.  The issue is with large working copies.  Crawling a
working copy is very expensive.

> A commit or a status done at the trunk dir level takes as much
> as 30 minutes whether using the svn command line (Windows XP or
> AIX)

We have working copies of 70,000 files, and it "only" takes 10 -
15 minutes to crawl them.  Maybe the difference is accounted for
by different operating systems (Linux, NetBSD, and FreeBSD here).

> What is the best way to improve performance?

Don't let svn crawl your working copy; tell it what to commit
('svn commit foo/bar/baz.c' vs. 'svn commit').

--  
Eric Gillespie <*> epg@pretzelnet.org

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Best Practices: Performance on Large Repositories?

Posted by Steve Seremeth <su...@seremeth.com>.
Followup: this issue has been somewhat resolved.

It seems that if not much else is running on a Windows system that is 
using TortoiseSVN or the svn command line interface -- we have been able 
to see major performance improvements.  Sounds obvious, but please allow 
me to elaborate.

We had been seeing 20-30 minute times to "check for modifications" (i.e. 
status) when run against the top of our working copy tree of about 
11,000 version-controlled files (plus another 44,000 svn control files) 
on certain systems.  We thought everyone was feeling this pain here, but 
that was not the case -- only a few of us.  Also, the amount of data in 
the working copy is closer to 2 GB on disk instead of the 1200 MB we 
thought we had.

Ok - so back to the details (or what we _do_ know).  We would regularly 
kick off a "Check for modifications" at the top of the tree and see 
TortoiseProc just linger around in Task Manager's process list not using 
much cpu _at all_.  We would see similar things with svn.exe when run at 
the command line.  When sorting on cpu use, often times the only thing 
in the process list taking more (ok, almost all) the cpu was "System 
Idle Process".  Very frustrating.

On one system, we killed a vmware instance that was lingering around and 
got status checks generally down to about 12 seconds a shot -- VERY 
liveable performance.  We still have a couple systems experiencing very 
slow Tortoise/Subversion performance, but clearly it's some other 
process on those boxes causing the issue.  We have seen McAffee AV 
causing pain from time to time, but we have not concluded where the real 
devil is.  Some of us have Windows Indexing on and some of have it off 
with mixed performance degradation.

So while we're not positive what is causing our problems, we are 
starting to decide:

1.  It's not Subversion related (although it's clear TortoiseSVN doesn't 
help with screen draws in Windows Explorer).

2.  Subversion is fine performance-wise even with our fairly large 
repository.

Is there a way to run some sort of debug build under windows to see what 
the slowness is (like wait i/o, etc.)?  Maybe if it could generate a log 
that had timestamps for each part of a transaction?

Thanks for everyone's input.

Regards,

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Best Practices: Performance on Large Repositories?

Posted by Erik Huelsmann <e....@gmx.net>.
> Hello -
> 
> We have a repository of about 11000 files totalling about 1200 MB.  A 
> commit or a status done at the trunk dir level takes as much as 30 
> minutes whether using the svn command line (Windows XP or AIX) or 
> Tortoise -- and seems to use very little CPU.  Also, the server (RH ES 3 
> on 2.4 Ghz Xeon using apache/berkeley db) just idles most of the time 
> and seems to be doing nothing during any of these actions.
> 
> What is the best way to improve performance?  Since the server is idling 
> I'm assuming it's not a software problem on the repository server.  Are 
> there benchmarks somewhere that we can look at to see how far off our 
> setup is or is there anything we can do to semi-officially test 
> performance against known good data?
> 
> We end up doing a lot of "time svn commit trunk".
> 
> Any advice would be greatly appreciated.  If we're doing something 
> wrong, that would be welcome news.  If we are not doing something wrong, 
> perhaps relative performance information should be posted in the FAQ?

The time required to do a commit also has a lot to do with filesystem
performance of the filesystem on which your working copy is located.


bye,


Erik.

-- 
DSL Komplett von GMX +++ Superg�nstig und stressfrei einsteigen!
AKTION "Kein Einrichtungspreis" nutzen: http://www.gmx.net/de/go/dsl

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org