You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Oliver Jowett <ol...@opencloud.com> on 2004/07/17 01:32:14 UTC
'svnadmin load' & database sync options
Hi all,
I'm currently looking at converting our large (~350mb) CVS repository to
subversion, learning subversion along the way.
cvs2svn happily produces a dumpfile containing ~14000 transactions:
> -rw-r--r-- 1 oliver ocstaff 708319348 Jul 16 18:53 cvs2svn-dump
Loading it via 'svnadmin load' is hideously slow, taking almost 10 hours:
> oliver@cyclone:~/svn-test$ svnadmin create repo-sync
> oliver@cyclone:~/svn-test$ time svnadmin load -q repo-sync <cvs2svn-dump
>
> real 561m59.668s
> user 14m1.379s
> sys 2m4.799s
Ok, so I'll use --bdb-txn-nosync:
> oliver@cyclone:~/svn-test$ svnadmin create --bdb-txn-nosync repo-no-sync
> oliver@cyclone:~/svn-test$ time svnadmin load -q repo-no-sync <cvs2svn-dump
>
> real 146m49.972s
> user 13m3.273s
> sys 1m36.818s
Better but still very disk-bound. Some digging with lsof/strace showed
that some fsync() calls are still done on the DB log files.
I experimented a bit with other DB options and ended up with this:
> oliver@cyclone:~/svn-test$ svnadmin create --bdb-txn-nosync repo-no-log
> oliver@cyclone:~/svn-test$ echo "set_flags DB_TXN_NOT_DURABLE" >>repo-no-log/db/DB_CONFIG
> oliver@cyclone:~/svn-test$ svnadmin recover repo-no-log
> Please wait; recovering the repository may take some time...
>
> Recovery completed.
> The latest repos revision is 0.
> oliver@cyclone:~/svn-test$ time svnadmin load -q repo-no-log <cvs2svn-dump
>
> real 26m40.620s
> user 12m40.711s
> sys 1m9.318s
That's more like what I originally expected!
The system these all ran on (cyclone) is a dual Athlon/MP 2800+, 2GB
RAM. The OS is Debian stable with a 2.6.5 Linux kernel, and subversion
is 1.0.5 as packaged in Debian unstable:
> ||/ Name Version Description
> +++-==============-==============-============================================
> ii subversion 1.0.5-1 Advanced version control system (aka. svn)
> ii libdb4.2 4.2.52-16 Berkeley v4.2 Database Libraries [runtime]
The subversion repositories are on an ext3 filesystem on a commodity IDE
disk with the disk's write-caching disabled.
So, some questions:
1) Is using DB_TXN_NOT_DURABLE during the initial load a sane thing to
do? I don't care about recovery from failures during the load at all --
I'd just restart from scratch if something did go wrong.
2) Is it normal for fsync() to still be called when --bdb-txn-nosync in use?
3) Is an option to use DB_TXN_NOT_DURABLE for the duration of a
'svnadmin load' a good idea?
-O
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: 'svnadmin load' & database sync options
Posted by Oliver Jowett <ol...@opencloud.com>.
Replying to myself..
Oliver Jowett wrote:
>> oliver@cyclone:~/svn-test$ svnadmin create --bdb-txn-nosync repo-no-log
>> oliver@cyclone:~/svn-test$ echo "set_flags DB_TXN_NOT_DURABLE"
>> >>repo-no-log/db/DB_CONFIG oliver@cyclone:~/svn-test$ svnadmin recover
>> repo-no-log
>> Please wait; recovering the repository may take some time...
>>
>> Recovery completed.
>> The latest repos revision is 0.
Don't do this, not even just for the initial load. It seems you can't
safely turn off DB_TXN_NOT_DURABLE once set; 'svnadmin recover' is not
happy.
I found a better solution in the end: put the new repository on a
memory-based filesystem (e.g. tmpfs). fsync() is then essentially free
so 'svnadmin load' is fast. Once it's done, copy the resulting
repository somewhere persistent.
Doing that reduced my load time from 2.5 hours to 15 minutes.
-O
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org