You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Nico Kadel-Garcia <nk...@gmail.com> on 2010/07/13 12:00:34 UTC

Poor performance for large software repositories downloading to CIFS shares

I've got some colleagues with a rather large Subversion repository
whose trunk includes over 10,000 files and over 500 Meg of actual
content for various reasons. What we're finding is that checking it
out it on a Windows client to a local hard drive takes perhaps 3
minutes. Downloading it to a mounted Windows (CIFS) share takes
roughly half an hour.

Linux clients operate just fine, checking out in less than 3 minutes
to a local or NFS mounted directory. Using TortoiseSVN or CygWin's
"svn" comman dmakes little difference, it's the downloading to a
Windows share, and the shares perform well for many other operations,
such as ordinary file transfers. And creating a working copy on a
Linux server, and then copying over that working copy, seems to work
quite well and takes only the 3 minutes of the Linux downloads . This
leaves me with questions:

* Is this poor CIFS performance normal for large repositories being
checked out? I assume it is because CIFS is so chatty, and that people
don't normally notice because they don't usually do such large
repositories and would hardly have noticed as the entertaining
hourglasses and text reporting of TortoiseSVN lets them know the
operation is occurring well.

* How bad are the risks of screwing up my checkouts if I use a
post-commit to keep a central working copy updated, and have people
simply copy that over instead of checking out the trunk directly? My
concerin is that the checkout process isn't really designed for that,
and may fail to do a checkout in a clean and atomic state, and the
checked out copy may therefore be corrupted by being in the midst of
an update operation.

Re: Poor performance for large software repositories downloading to CIFS shares

Posted by Les Mikesell <le...@gmail.com>.
On 7/14/2010 9:44 AM, David Weintraub wrote:
> 2010/7/14 Ulrich Eckhardt<ec...@satorlaser.com>:
>> On Wednesday 14 July 2010, David Weintraub wrote:
>>> * Use Samba on a Local Linux system to mount this CIFS Windows share
>>> on a Linux machine. Then, do your checkout on the Linux system. This
>>> eliminates the Subversion Windows client. This will eliminate any
>>> problems with the Windows version of the Subversion client..
>>
>> Beware, this will produce different results when you have any files where
>> the "svn:eol-style" property is set to "native".
>
> Absolutely correct -- if you change files and recommit them. I was
> mainly advising this as a test to see why the CIFS checkout is taking
> so long.
>
> This will use the CIFS share, but eliminate the role the local Windows
> Subversion client might be playing. If the checkout is substantially
> faster, the issue is with the local Windows machine and not
> necessarily be the Windows server itself.
>
> My hunch is that this won't be that much faster since the Windows
> server is still running an anti-virus scan on the share.

I think all the responses so far including mine have missed the point 
that the original poster has repeatedly said that copying an existing 
working copy to the CIFS share is much faster than doing the checkout 
there directly.  I think that should rule out all of the differences 
except the way that copy and svn handle the file creation and writing - 
or perhaps some other operations that svn is doing that takes longer on 
CIFS than local filesystems.  Unless the anti-virus scan is somehow 
bypassed in the copy operation.

-- 
   Les Mikesell
    lesmikesell@gmail.com

Re: Poor performance for large software repositories downloading to CIFS shares

Posted by David Weintraub <qa...@gmail.com>.
2010/7/14 Ulrich Eckhardt <ec...@satorlaser.com>:
> On Wednesday 14 July 2010, David Weintraub wrote:
>> * Use Samba on a Local Linux system to mount this CIFS Windows share
>> on a Linux machine. Then, do your checkout on the Linux system. This
>> eliminates the Subversion Windows client. This will eliminate any
>> problems with the Windows version of the Subversion client..
>
> Beware, this will produce different results when you have any files where
> the "svn:eol-style" property is set to "native".

Absolutely correct -- if you change files and recommit them. I was
mainly advising this as a test to see why the CIFS checkout is taking
so long.

This will use the CIFS share, but eliminate the role the local Windows
Subversion client might be playing. If the checkout is substantially
faster, the issue is with the local Windows machine and not
necessarily be the Windows server itself.

My hunch is that this won't be that much faster since the Windows
server is still running an anti-virus scan on the share.

-- 
David Weintraub
qazwart@gmail.com

Re: Poor performance for large software repositories downloading to CIFS shares

Posted by Ulrich Eckhardt <ec...@satorlaser.com>.
On Wednesday 14 July 2010, David Weintraub wrote:
> * Use Samba on a Local Linux system to mount this CIFS Windows share
> on a Linux machine. Then, do your checkout on the Linux system. This
> eliminates the Subversion Windows client. This will eliminate any
> problems with the Windows version of the Subversion client..

Beware, this will produce different results when you have any files where 
the "svn:eol-style" property is set to "native".

Uli

-- 
ML: http://subversion.tigris.org/mailing-list-guidelines.html
FAQ: http://subversion.tigris.org/faq.html
Docs: http://svnbook.red-bean.com/

Sator Laser GmbH, Fangdieckstraße 75a, 22547 Hamburg, Deutschland
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
Sator Laser GmbH, Fangdieckstraße 75a, 22547 Hamburg, Deutschland
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht verantwortlich.
**************************************************************************************

Re: Poor performance for large software repositories downloading to CIFS shares

Posted by David Weintraub <qa...@gmail.com>.
I've seen poor performance on Windows for Subversion checkouts due to
anti-virus checking. When Subversion does a checkout, it also creates
a second copy of each and every file that is checked out. Plus, it
updates other files in the .svn directory. Some anti-virus files scan
everything created and downloaded by the Subversion client, slowing
down checkouts to a crawl. What takes a few minutes on a Linux or Mac
takes longer on Windows machines thanks to an overly agressive
anti-virus program.

Since the CIFS is a Windows share, both the Windows Server that owns
the share and the Windows Client that mounted the share may be running
anti-virus programs that aggressively "protect" the share from
malware. You might not normally notice this with one or two files, but
when you do a Subversion checkout with 10,000 files in it, you'll
start to notice the delays.

There are several experiments you could try:

* Instead of a checkout, do an export. Exports don't create the .svn
directory and all of their overhead. See if this substantially speeds
up the operation. Yes, I know this isn't what you want to do, but
it'll help determine the heart of the problem. This will transfer just
as much data over the network, but it won't create all of those .svn
directories and the files they contain. A much faster checkout could
point out that the problem is with the Subversion client attempting to
create files on the server.

* Use Samba on a Local Linux system to mount this CIFS Windows share
on a Linux machine. Then, do your checkout on the Linux system. This
eliminates the Subversion Windows client. This will eliminate any
problems with the Windows version of the Subversion client..

* Perforce likes to claim it is a very efficient checkout system.
Download Perforce, and add the HEAD revision of the Trunk of your
Subversion repository into a Perforce repository. Now try the checkout
on the CIFS using Perforce. This entirely eliminates Subversion from
the equation. If there are still issues, it lies solely with CIFS on
that Windows server.

I know things have changed recently, but CIFS has a reputation for
being slow and chatty. See http://bit.ly/aNUF2k.

-- 
David Weintraub
qazwart@gmail.com

Re: Poor performance for large software repositories downloading to CIFS shares

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Tue, Jul 13, 2010 at 4:57 PM, Les Mikesell <le...@gmail.com> wrote:
> On 7/13/2010 2:50 PM, David Brodbeck wrote:
>>
>> On Jul 13, 2010, at 5:50 AM, Les Mikesell wrote:

>> This could be a lot of it if a substantial number of files are in one flat
>> subdirectory.  CIFS really, really does not deal with large directories
>> well.  Neither does NFS, but the way Windows handles directories tends to
>> make it worse.
>
> I think CIFS is just the network protocol.  The real issue is on the
> physical filesystem side.  When you open a file for writing, the underlying
> system has to determine if that file name already exists and if not, find or
> create a new filename slot to create it. And this has to be done atomically,
> since other processes might be trying to create the same file at the same
> time and only one can succeed.  This is bad enough in a large directory when
> you let the OS deal with exact matches, but if you are faking case
> insensitivity you have to do much more work in user space to find the
> potential collisions with everything locked for longer times.

No, this is clearly not the only problem. Copying the working copy
from one CIFS share to another, or even back to the same CIFS share,
takes only a few minutes.

What, precisely, do you mean by "faking case insensitivity"? CIFS
servers normally have standard approaches that prevent CIFS files from
overwriting files with mismatched case names, it's true, but this
can't actually be turned off. It is possible for the CIFS server to
map all names to lower case, which is not occurring in this case.

Re: Poor performance for large software repositories downloading to CIFS shares

Posted by Yves Martin <ym...@free.fr>.
On Tue, 2010-07-13 at 20:40 -0400, Nico Kadel-Garcia wrote:

> Well, yes, except that updating an "export" can't be done since it
> will lack the rest of the .svn information. The point is that they can
> download an up-to-date working copy directly, rather than over the
> poor performance of the CIFS share.

So why are your users unable to access directly to the Subversion
repository either with http(s) or svn protocols ?

> > I have seen 1 Gb working copy properly checkouted on a local disk.
> > When the working copy is there, just use "update" and "switch" to limit
> > transfer and disk writes... Why doing a new checkout each time ?
> 
> And that actually works. There are problems with this approach: this
> local disk is inaccessible from other working systems without serious
> crossmounting craziness, is not workable for high availability
> services, and causes any local modifications that haven't been checked
> in to be lost when switching to another system.

Do I guess you try to prevent a work-day job loss by such a complex
system ? I think it is cheaper and more comfortable to setup RAID-1
disks on workstation...

If you want your user to commit to the repository regularly (twice a day
for instance even when code does not compile), maybe an option is to
make them commit their work in individual branches which are merged when
job is over.



Re: Poor performance for large software repositories downloading to CIFS shares

Posted by Yves Martin <ym...@free.fr>.
On Tue, 2010-07-13 at 15:57 -0500, Les Mikesell wrote:
> On 7/13/2010 2:50 PM, David Brodbeck wrote:
> >
> > On Jul 13, 2010, at 5:50 AM, Les Mikesell wrote:
> >
> >> Nico Kadel-Garcia wrote:
> >>> I've got some colleagues with a rather large Subversion repository
> >>> whose trunk includes over 10,000 files and over 500 Meg of actual
> >>> content for various reasons. What we're finding is that checking it
> >>> out it on a Windows client to a local hard drive takes perhaps 3
> >>> minutes. Downloading it to a mounted Windows (CIFS) share takes
> >>> roughly half an hour.
> >>
> >> What's the server on the CIFS side?  If it is Linux/samba, it may be the overhead of making a case sensitive filesystem look case insensitive (consider what has to happen when you create a new file in a large directory and have to check if the name already exists).
> >
> > This could be a lot of it if a substantial number of files are in one flat subdirectory.  CIFS really, really does not deal with large directories well.  Neither does NFS, but the way Windows handles directories tends to make it worse.
> 
> I think CIFS is just the network protocol.  The real issue is on the 
> physical filesystem side.  When you open a file for writing, the 
> underlying system has to determine if that file name already exists and 
> if not, find or create a new filename slot to create it. And this has to 
> be done atomically, since other processes might be trying to create the 
> same file at the same time and only one can succeed.  This is bad enough 
> in a large directory when you let the OS deal with exact matches, but if 
> you are faking case insensitivity you have to do much more work in user 
> space to find the potential collisions with everything locked for longer 
> times.

You're right CIFS is just the protocol. And Samba implements it
efficiently... Windows Explorer (XP version) often transfers slower that
Linux "smbclient" command line (measures on a single large file)
And Windows often runs anti-virus !

Nico, let's try with a "smbmount" point on Linux to compare with NFS.

I do not understand the option to publish a "ready-to-use" checkout.
- authentication information like username may be included in repository
URL, so it must be modified in order to commit 
- such a working copy contents twice its volume
To publish a clean state, you should prefer an "export" in that case

I have seen 1 Gb working copy properly checkouted on a local disk.
When the working copy is there, just use "update" and "switch" to limit
transfer and disk writes... Why doing a new checkout each time ?

With large working copy checkout, older TortoiseSVN versions have
troubles/bugs. You should use latest version available - or native win32
"svn.exe" command line binaries.


Re: Poor performance for large software repositories downloading to CIFS shares

Posted by Les Mikesell <le...@gmail.com>.
On 7/13/2010 2:50 PM, David Brodbeck wrote:
>
> On Jul 13, 2010, at 5:50 AM, Les Mikesell wrote:
>
>> Nico Kadel-Garcia wrote:
>>> I've got some colleagues with a rather large Subversion repository
>>> whose trunk includes over 10,000 files and over 500 Meg of actual
>>> content for various reasons. What we're finding is that checking it
>>> out it on a Windows client to a local hard drive takes perhaps 3
>>> minutes. Downloading it to a mounted Windows (CIFS) share takes
>>> roughly half an hour.
>>
>> What's the server on the CIFS side?  If it is Linux/samba, it may be the overhead of making a case sensitive filesystem look case insensitive (consider what has to happen when you create a new file in a large directory and have to check if the name already exists).
>
> This could be a lot of it if a substantial number of files are in one flat subdirectory.  CIFS really, really does not deal with large directories well.  Neither does NFS, but the way Windows handles directories tends to make it worse.

I think CIFS is just the network protocol.  The real issue is on the 
physical filesystem side.  When you open a file for writing, the 
underlying system has to determine if that file name already exists and 
if not, find or create a new filename slot to create it. And this has to 
be done atomically, since other processes might be trying to create the 
same file at the same time and only one can succeed.  This is bad enough 
in a large directory when you let the OS deal with exact matches, but if 
you are faking case insensitivity you have to do much more work in user 
space to find the potential collisions with everything locked for longer 
times.

-- 
   Les Mikesell
    lesmikesell@gmail.com


Re: Poor performance for large software repositories downloading to CIFS shares

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Tue, Jul 13, 2010 at 3:50 PM, David Brodbeck <br...@u.washington.edu> wrote:
>
> On Jul 13, 2010, at 5:50 AM, Les Mikesell wrote:
>
>> Nico Kadel-Garcia wrote:
>>> I've got some colleagues with a rather large Subversion repository
>>> whose trunk includes over 10,000 files and over 500 Meg of actual
>>> content for various reasons. What we're finding is that checking it
>>> out it on a Windows client to a local hard drive takes perhaps 3
>>> minutes. Downloading it to a mounted Windows (CIFS) share takes
>>> roughly half an hour.
>>
>> What's the server on the CIFS side?  If it is Linux/samba, it may be the overhead of making a case sensitive filesystem look case insensitive (consider what has to happen when you create a new file in a large directory and have to check if the name already exists).
>
> This could be a lot of it if a substantial number of files are in one flat subdirectory.  CIFS really, really does not deal with large directories well.  Neither does NFS, but the way Windows handles directories tends to make it worse.

I've tried several CIFS servers, including NetApps, and Samba servers
ranging from 3.3.x to 3.5.2. The CIFS servers are about as good as
they're going to get, based on the performance of a direct copy of a
working copy from one share to another, which only takes a few minutes
for the same working copy.

Re: Poor performance for large software repositories downloading to CIFS shares

Posted by David Brodbeck <br...@u.washington.edu>.
On Jul 13, 2010, at 5:50 AM, Les Mikesell wrote:

> Nico Kadel-Garcia wrote:
>> I've got some colleagues with a rather large Subversion repository
>> whose trunk includes over 10,000 files and over 500 Meg of actual
>> content for various reasons. What we're finding is that checking it
>> out it on a Windows client to a local hard drive takes perhaps 3
>> minutes. Downloading it to a mounted Windows (CIFS) share takes
>> roughly half an hour.
> 
> What's the server on the CIFS side?  If it is Linux/samba, it may be the overhead of making a case sensitive filesystem look case insensitive (consider what has to happen when you create a new file in a large directory and have to check if the name already exists).

This could be a lot of it if a substantial number of files are in one flat subdirectory.  CIFS really, really does not deal with large directories well.  Neither does NFS, but the way Windows handles directories tends to make it worse.


-- 

David Brodbeck
System Administrator, Linguistics
University of Washington




Re: Poor performance for large software repositories downloading to CIFS shares

Posted by Les Mikesell <le...@gmail.com>.
Nico Kadel-Garcia wrote:
> I've got some colleagues with a rather large Subversion repository
> whose trunk includes over 10,000 files and over 500 Meg of actual
> content for various reasons. What we're finding is that checking it
> out it on a Windows client to a local hard drive takes perhaps 3
> minutes. Downloading it to a mounted Windows (CIFS) share takes
> roughly half an hour.

What's the server on the CIFS side?  If it is Linux/samba, it may be the 
overhead of making a case sensitive filesystem look case insensitive (consider 
what has to happen when you create a new file in a large directory and have to 
check if the name already exists).

> * Is this poor CIFS performance normal for large repositories being
> checked out?

I doesn't sound normal to me.

> * How bad are the risks of screwing up my checkouts if I use a
> post-commit to keep a central working copy updated, and have people
> simply copy that over instead of checking out the trunk directly?

Copying a checked out directory isn't bad by itself, but you can't have 
something modifying the source during the copy - which sounds likely to happen.

> My
> concerin is that the checkout process isn't really designed for that,
> and may fail to do a checkout in a clean and atomic state, and the
> checked out copy may therefore be corrupted by being in the midst of
> an update operation.

Yes, I'd expect things to break.  Does everyone really need a complete copy or 
could you break it into components that each person needs to update?  Or can 
everyone just check out once (or copy a workspace that doesn't update 
automatically while people are working) and subsequently do updates - or are 
they just as bad?

-- 
   Les Mikesell
    lesmikesell@gmail.com