You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Thorsten Schöning <ts...@am-soft.de> on 2020/02/24 17:19:22 UTC

How to improve search performance for moved directories and files?

Hi all,

I have a repo with 178'000 revisions and am accessing that using
OpenVPN and my home-DSL with 28/2 MBit/s. Most of the revisions
originate in branches I'm not interested of and are created
automatically by some software.

I have two unrelated feature branches based on trunk which I need to
sync regularly. The problem is that feature2 is about refactoring
directory layout of feature1, especially moving directories around.
During merges this regularly leads to conflicts which TortoiseSVN
tries to resolve by searching the repo for new merge targets and that
search is incredibly slow if executed remotely.

I tried to do the same merge using 2 URL-merges with a local copy of
the repo and that was a lot faster. What is interesting is that it
seems things were CPU-bound within the TSVN-process, which makes sense
when accessing the repo locally using "file:///...". I didn't
recognize much disk I/O and have a SATA-SSD anyway.

That makes me wonder because when doing the same remotely, I see
almost no CPU-usage nor disk-I/O on the remote server. I don't see any
heavy uploads or downloads on my network interfaces as well. This
sounds like whatever is done is done using lots of roundtrips to
contact the server and suffers from the latency of my somewhat slow
upload, rendering that feature almost useless in my environment.

Is there anything I can do to optimize that? Something like tell the
SVN-client to upload whatever is needed to the server at once? Or is
there some break configured somewhere? During the operations my
upload is pretty constant at 40 KBit/s for only whatever SVN does.

Thanks!

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail: Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow


Re: How to improve search performance for moved directories and files?

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Feb 25, 2020 at 09:09:14AM +0100, Thorsten Schöning wrote:
> Guten Tag Daniel Shahaf,
> am Montag, 24. Februar 2020 um 18:27 schrieben Sie:
> 
> > If the remote repository uses https://, you could set up mod_dav_svn on
> > localhost in a proxy configuration.  For svn:// the equivalent would be
> > to set up an svnsync mirror and do «svn relocate»'s to and from it by
> > hand.[...]
> 
> Thanks for the suggestions, but I can't expect my coworkers to do
> that. Some of them would simply prefer discussing if to keep using
> SVN at all in favour of ... we all know what. ;-)
> 
> I'm regularly getting the SVN-repos from the remote host using RSYNC
> locally anyway. So while not as correct as using svnsync in theory, I
> can simply do a 2 URL-merge using unrelated file-URIs with my local
> backups of the repos. That at least saves me from the relocate. The
> only thing I'm missing this way is merge tracking and merge recording.
> At least the latter can can be done after merging using the remote
> target again by telling the SVN-client to record the merge only. That
> is fast enough as no conflicts are triggered at all.
> 
> Two additional questions:
> 
> 1. Why does the number of revisions seem to matter that much?
> 
> This kind of merge conflict seems to become slower and slower as the
> number of revisions increases, even if all of those commits belong to
> totally unrelated branches. Additionally, the commits moving the
> directories and triggering the conflicts are not that far in the
> past, only very few hundreds of commits.
> 
> Something like the following: 100 auto-commits in branchA, very few
> commits moving directories in branchB, 100 auto-commits in branchA
> again. I would have expected the SVN-client focussing on branchB and
> finding the possible move targets in that branch pretty early.
> 
> 2. Really no other handbrake somewhere?
> 
> When doing the merge locally, I have a very high CPU-usage, but very
> little I/O, like constantly something around 40 Kbit/s. That doesn't
> matter locally especially in case using a SSD of course, but does
> remotely because of the additional latencies I guess.
> 
> So, is that simply how things work? Lots of small reads in those
> cases introducing lots of latency slowing things down heavily? And
> that can't be easily optimized further by e.g. any setting of the
> SVN-client?

The primary goal of the conflict resolver is not to be fast.

Consider the situation we had before the conflict resolver existed:
Each and every conflict had to be analyzed and resolved by a human, and
it was very easy to make mistakes. This cost literally hours and hours
of human time everywhere SVN was deployed.

The human conflict resolving timeframe is what the design of the conflict
resolver was up against.
The goal was to reduce these hours spent on resolving tree conflicts over
and over to a couple of minutes. The resolver tries to be accurate in its
detection of conflicts, provide sufficient flexibility when resolving
conflicts, and is also designed to be extensible (if there is a conflict
case that is not covered yet but should be, all that needs to be done is
adding about 3 functions, written in C code).

Another constraint is that the resolver should be able to work against old
SVN servers, since clients are more regularly updated to new releases than
already deployed servers. This means the resolver needs to do round-trips.
As it discovers information it keeps going back to the server until it has a
complete picture of the conflict situation. The server has no idea what the
client is really asking it for.

If you're unhappy with the result, I would suggest you become involved in
improving the implementation yourself. There should be room for improvement,
especially if the server was made smarter.

A situation with high latency tunnelling is naturally very hard to improve
with a client<->server roundtrip-heavy design.
For best performance you really want your SVN server on the LAN.

Re: How to improve search performance for moved directories and files?

Posted by Thorsten Schöning <ts...@am-soft.de>.
Guten Tag Daniel Shahaf,
am Montag, 24. Februar 2020 um 18:27 schrieben Sie:

> If the remote repository uses https://, you could set up mod_dav_svn on
> localhost in a proxy configuration.  For svn:// the equivalent would be
> to set up an svnsync mirror and do «svn relocate»'s to and from it by
> hand.[...]

Thanks for the suggestions, but I can't expect my coworkers to do
that. Some of them would simply prefer discussing if to keep using
SVN at all in favour of ... we all know what. ;-)

I'm regularly getting the SVN-repos from the remote host using RSYNC
locally anyway. So while not as correct as using svnsync in theory, I
can simply do a 2 URL-merge using unrelated file-URIs with my local
backups of the repos. That at least saves me from the relocate. The
only thing I'm missing this way is merge tracking and merge recording.
At least the latter can can be done after merging using the remote
target again by telling the SVN-client to record the merge only. That
is fast enough as no conflicts are triggered at all.

Two additional questions:

1. Why does the number of revisions seem to matter that much?

This kind of merge conflict seems to become slower and slower as the
number of revisions increases, even if all of those commits belong to
totally unrelated branches. Additionally, the commits moving the
directories and triggering the conflicts are not that far in the
past, only very few hundreds of commits.

Something like the following: 100 auto-commits in branchA, very few
commits moving directories in branchB, 100 auto-commits in branchA
again. I would have expected the SVN-client focussing on branchB and
finding the possible move targets in that branch pretty early.

2. Really no other handbrake somewhere?

When doing the merge locally, I have a very high CPU-usage, but very
little I/O, like constantly something around 40 Kbit/s. That doesn't
matter locally especially in case using a SSD of course, but does
remotely because of the additional latencies I guess.

So, is that simply how things work? Lots of small reads in those
cases introducing lots of latency slowing things down heavily? And
that can't be easily optimized further by e.g. any setting of the
SVN-client?

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail: Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow


Re: How to improve search performance for moved directories and files?

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Thorsten Schöning wrote on Mon, 24 Feb 2020 18:19 +0100:
> During merges this regularly leads to conflicts which TortoiseSVN
> tries to resolve by searching the repo for new merge targets and that
> search is incredibly slow if executed remotely.
> 
> I tried to do the same merge using 2 URL-merges with a local copy of
> the repo and that was a lot faster.

If the remote repository uses https://, you could set up mod_dav_svn on
localhost in a proxy configuration.  For svn:// the equivalent would be
to set up an svnsync mirror and do «svn relocate»'s to and from it by
hand.  In either case, this approach is a bit of a sledgehammer: it'll
DTRT, but there may be an alternative solution that requires fewer
resources.

Cheers,

Daniel

> What is interesting is that it
> seems things were CPU-bound within the TSVN-process, which makes sense
> when accessing the repo locally using "file:///...". I didn't
> recognize much disk I/O and have a SATA-SSD anyway.
> 
> That makes me wonder because when doing the same remotely, I see
> almost no CPU-usage nor disk-I/O on the remote server. I don't see any
> heavy uploads or downloads on my network interfaces as well. This
> sounds like whatever is done is done using lots of roundtrips to
> contact the server and suffers from the latency of my somewhat slow
> upload, rendering that feature almost useless in my environment.
> 
> Is there anything I can do to optimize that? Something like tell the
> SVN-client to upload whatever is needed to the server at once? Or is
> there some break configured somewhere? During the operations my
> upload is pretty constant at 40 KBit/s for only whatever SVN does.