You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Dustin Lang <ds...@astro.princeton.edu> on 2012/05/28 22:17:02 UTC

svnadmin dump: deleted file order differs between original & mirrored repos

Hi,

For backup purposes I keep a mirror of my svn repo.  The mirror is 
modified only by "svnsync", which runs hourly in a cron job.

In order to validate the mirror, I run an "svnadmin dump" on the mirror 
and on the original, and assert that their md5sums are the same.

I am finding that in a few of the revisions in my history, in which a set 
of files are deleted, the svndumps on the original & mirrored repos will 
list the files in different orders, which of course makes the md5sums 
different even though the repos appear to be in the same state.

Both the original & mirror are fsfs-format, and I'm using subversion-1.7.1 
on both sides.  (I just checked that subversion-1.7.5 does the same 
thing.)  The apr and other libs are likely different versions, though.

I tried to create a small test repo that demonstrates this behavior, but I 
haven't been able to trigger it.  Argh.  I've been running this backup 
approach for a long time and never seen this before, but it does show up 
in a few revs in my repo.  (The repo is available at 
http://astrometry.net/svn, and rev 20053 shows this behavior, FWIW)

I added some debugging output to subversion/libsvn_delta/path_driver.c : 
svn_delta_path_driver() and it does visit the deleted files in the same 
order, but my guess is that since the deleted files get added to a hash 
(subversion/libsvn_repos/dump.c : delete_entry(), pb->deleted_entries) and 
then the hash gets iterated over later, in close_directory()), maybe the 
order of hashkeys isn't defined, so the order they actually get written 
out can vary.  But I've spent a total of maybe 15 minutes working with the 
subversionn/apr code so your guess is better than mine.

Suggestions on how to proceed would be appreciated.  My first guess would 
be to sort the deleted entries in close_directory() before writing them 
out, or use a list-like rather than hash-like data structure to store the 
delete entries.

cheers,
dustin



Background: the dump is something like:

${SVNADMIN} dump -q --incremental -r 20000:HEAD ${MIRROR} | \
     grep -v Text-copy-source-md5 | \
     md5sum -

And I do the same on the remote side via ssh.  The 20000 is there to make 
it run faster; I keep archives of the svndumps up to 20k.

(This does mean that if there is a change to the original repo between the 
svnsync and the svndump, the md5sums will come out different.  This is a 
low-traffic repo so I actually like the occasional false alarm: if your 
smoke alarm goes off when you burn toast, at least you know it still 
works.)

Re: svnadmin dump: deleted file order differs between original & mirrored repos

Posted by Dustin Lang <ds...@astro.princeton.edu>.
D'oh, I didn't RTFIssues.  My apologies.
   http://subversion.tigris.org/issues/show_bug.cgi?id=4134

--dstn


On Mon, 28 May 2012, Dustin Lang wrote:

>
> Hi,
>
> For backup purposes I keep a mirror of my svn repo.  The mirror is modified 
> only by "svnsync", which runs hourly in a cron job.
>
> In order to validate the mirror, I run an "svnadmin dump" on the mirror and 
> on the original, and assert that their md5sums are the same.
>
> I am finding that in a few of the revisions in my history, in which a set of 
> files are deleted, the svndumps on the original & mirrored repos will list 
> the files in different orders, which of course makes the md5sums different 
> even though the repos appear to be in the same state.
>
> Both the original & mirror are fsfs-format, and I'm using subversion-1.7.1 on 
> both sides.  (I just checked that subversion-1.7.5 does the same thing.)  The 
> apr and other libs are likely different versions, though.
>
> I tried to create a small test repo that demonstrates this behavior, but I 
> haven't been able to trigger it.  Argh.  I've been running this backup 
> approach for a long time and never seen this before, but it does show up in a 
> few revs in my repo.  (The repo is available at http://astrometry.net/svn, 
> and rev 20053 shows this behavior, FWIW)
>
> I added some debugging output to subversion/libsvn_delta/path_driver.c : 
> svn_delta_path_driver() and it does visit the deleted files in the same 
> order, but my guess is that since the deleted files get added to a hash 
> (subversion/libsvn_repos/dump.c : delete_entry(), pb->deleted_entries) and 
> then the hash gets iterated over later, in close_directory()), maybe the 
> order of hashkeys isn't defined, so the order they actually get written out 
> can vary.  But I've spent a total of maybe 15 minutes working with the 
> subversionn/apr code so your guess is better than mine.
>
> Suggestions on how to proceed would be appreciated.  My first guess would be 
> to sort the deleted entries in close_directory() before writing them out, or 
> use a list-like rather than hash-like data structure to store the delete 
> entries.
>
> cheers,
> dustin
>
>
>
> Background: the dump is something like:
>
> ${SVNADMIN} dump -q --incremental -r 20000:HEAD ${MIRROR} | \
>     grep -v Text-copy-source-md5 | \
>     md5sum -
>
> And I do the same on the remote side via ssh.  The 20000 is there to make it 
> run faster; I keep archives of the svndumps up to 20k.
>
> (This does mean that if there is a change to the original repo between the 
> svnsync and the svndump, the md5sums will come out different.  This is a 
> low-traffic repo so I actually like the occasional false alarm: if your smoke 
> alarm goes off when you burn toast, at least you know it still works.)
>