You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Gunther Mayer <gu...@googlemail.com> on 2012/11/19 15:11:01 UTC

Corrupted revisions - need help

Hi there,

I'm the sysadmin for our small company (8 employees) and we're running 
all our shared files over a subversion server. Some time ago our server 
had faulty memory which resulted in corrupt entries being written to the 
underlying fsfs db, later propagating to backups too. This resulted in 
four corrupt revisions in my /var/svn/myrepo/db/revs/XXXX, one of which 
I managed to fix manually with fsfsverify and a whole lot of 
hacking/fudging. The other three however are beyond me and I can't 
afford to spend days of trying to figure out how to fix it (and 
fsfsverify can't do it either, it keeps choking on the same issue). The 
problem is that I cannot take a full backup of my repository or create a 
new working copy from scratch as any command (e.g. svnadmin verify, svn 
co etc.) that comes across these revisions chokes and dies, always with 
the dreaded "Decompression of svndiff data failed" error.

So, I'm wondering if anyone from the community can help me. I think I 
still have all of the original files which got written or amended during 
the three broken revisions (in one or more working copies), but one of 
these revisions is about 1.5GB so sharing is a bit tricky (the other two 
are 23MB and 1.7MB). I'm even willing to pay somebody to do the job if 
that's what's necessary, I only want to recreate my repository from 
scratch as a LAST resort as I would lose all of my history.

Gunther

Re: Corrupted revisions - need help

Posted by Gunther Mayer <gu...@googlemail.com>.
On 2012/11/20 12:07 PM, Daniel Shahaf wrote:
> Stefan Sperling wrote on Tue, Nov 20, 2012 at 09:32:18 +0100:
>> On Tue, Nov 20, 2012 at 08:14:41AM +0200, Daniel Shahaf wrote:
>>> Stefan Sperling wrote on Mon, Nov 19, 2012 at 21:07:59 +0100:
>>>> Extract these reps from the FSFS data of the temporary repository and
>>>> stitch them into the broken repository at appropriate places, recalculating
>>>> checksums where necessary,
>>> Instead of recalculating, you ought to be able to set them to all-zeroes.
>> Ah, that's neat :)
>>
>> Oh, and make sure to remove rows containing checksums of corrupted reps
>> from rep-cache.db. Alternatively, disable rep-sharing in fsfs.conf.
>> Else, you might end up with new revisions pointing at existing bad reps
> But such pointers might already exist in revisions that were committed
> after the committed revision and before it was fixed, right?
>
> In which case - there is no easy way to find all of them (short of
> exhaustive search), but fixing them should be easy (especially given that
> the sha1 and uniquifier fields in rep reference lines are optional).
>
>> (in particular if people keep trying to add the same file again and
>> again under different names in an attempt to repair it.)

Thanks everyone for your input, I'll try those tips and if I get stuck 
again I'll post here again.

Re: Corrupted revisions - need help

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Stefan Sperling wrote on Tue, Nov 20, 2012 at 09:32:18 +0100:
> On Tue, Nov 20, 2012 at 08:14:41AM +0200, Daniel Shahaf wrote:
> > Stefan Sperling wrote on Mon, Nov 19, 2012 at 21:07:59 +0100:
> > > Extract these reps from the FSFS data of the temporary repository and
> > > stitch them into the broken repository at appropriate places, recalculating
> > > checksums where necessary,
> > 
> > Instead of recalculating, you ought to be able to set them to all-zeroes.
> 
> Ah, that's neat :)
> 
> Oh, and make sure to remove rows containing checksums of corrupted reps
> from rep-cache.db. Alternatively, disable rep-sharing in fsfs.conf.
> Else, you might end up with new revisions pointing at existing bad reps

But such pointers might already exist in revisions that were committed
after the committed revision and before it was fixed, right?

In which case - there is no easy way to find all of them (short of
exhaustive search), but fixing them should be easy (especially given that
the sha1 and uniquifier fields in rep reference lines are optional).

> (in particular if people keep trying to add the same file again and
> again under different names in an attempt to repair it.)

Re: Corrupted revisions - need help

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Nov 20, 2012 at 08:14:41AM +0200, Daniel Shahaf wrote:
> Stefan Sperling wrote on Mon, Nov 19, 2012 at 21:07:59 +0100:
> > Extract these reps from the FSFS data of the temporary repository and
> > stitch them into the broken repository at appropriate places, recalculating
> > checksums where necessary,
> 
> Instead of recalculating, you ought to be able to set them to all-zeroes.

Ah, that's neat :)

Oh, and make sure to remove rows containing checksums of corrupted reps
from rep-cache.db. Alternatively, disable rep-sharing in fsfs.conf.
Else, you might end up with new revisions pointing at existing bad reps
(in particular if people keep trying to add the same file again and
again under different names in an attempt to repair it.)

Re: Corrupted revisions - need help

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Stefan Sperling wrote on Mon, Nov 19, 2012 at 21:07:59 +0100:
> Extract these reps from the FSFS data of the temporary repository and
> stitch them into the broken repository at appropriate places, recalculating
> checksums where necessary,

Instead of recalculating, you ought to be able to set them to all-zeroes.

Re: Corrupted revisions - need help

Posted by Gunther Mayer <gu...@googlemail.com>.
On 2012/11/19 10:07 PM, Stefan Sperling wrote:
> On Mon, Nov 19, 2012 at 04:11:01PM +0200, Gunther Mayer wrote:
>> So, I'm wondering if anyone from the community can help me. I think
>> I still have all of the original files which got written or amended
>> during the three broken revisions (in one or more working copies),
>> but one of these revisions is about 1.5GB so sharing is a bit tricky
>> (the other two are 23MB and 1.7MB). I'm even willing to pay somebody
>> to do the job if that's what's necessary, I only want to recreate my
>> repository from scratch as a LAST resort as I would lose all of my
>> history.
> I've dealt with similar corruption problems in the past, where original
> fulltext file content was still available.
>
> So maybe this hint will help you: You might be able to create good
> representations by committing the fulltext files to a fresh temporary
> repository, possibly in multiple commits in the right order if you have
> more than one version of a file available.
>
> Extract these reps from the FSFS data of the temporary repository and
> stitch them into the broken repository at appropriate places, recalculating
> checksums where necessary, and tweaking offsets and maybe adding some padding
> if necessary. In case the good reps use less space than the bad ones, or
> the exact same amount, they can be made to work fairly easily.
> If they end up being larger things gets a bit more tricky. Note that
> due to the way FSFS revisions are parsed by Subversion (it looks at the
> end of the file for the changed-path data section offset first) you can
> move the changed-path data section further down to create more space in
> an existing revision file -- but you cannot move any other existing sections
> by even a single bit!
>
> I've managed to fix several corrupt revisions like this. There was a similar
> problem at the time, an elego customer's the SVN server was running in a VM
> and when the host computer unexpectedly lost power revision data in several
> FSFS files didn't get saved to the physical disks on time... oops!
> They were able to get some fulltext files from working copies which we could
> use to recreate some of the lost reps.
>
> Some related reading material (read in given order):
> https://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_base/notes/fs-history
> https://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_fs/structure
>
> Good luck!
>

Thanks again for the advice. I ended up fixing each of the three 
revisions by repeatedly running fsfsverify.py until it gets stuck, then 
simply truncating the affected node, and repeat. Eventually it was 
clean, I then re-added all truncated files again from a working copy 
backup as a brand new revision. I was lucky - every single node I 
truncated was a "leaf" or "terminal" node, i.e. it never again was 
modified thereafter, so I was able to do this without repercussions in 
the rest of the respository history.

I automated the entire process which worked like a charm even on my 
1.5GB revision, I'm pasting the code below in the hope that one day it 
will help somebody with a similar predicament (run it with the revision 
in question as the only argument, after making a backup copy of it of 
course):

#!/bin/bash

rev=$1
#dir=/backup/svn/main/db/revs/*/
f=progress_r${rev}_repair
i=0
same_count=1
same_max=10 # maximum number of attempts during which we tolerate 
encountering the SAME error
while true; do
     let i++
     echo Fix attempt $i >> $f
     if ! ./fsfsverify.py $rev > last; then # something's wrong, try to 
fix it
         tail -n3 last >> $f
         cur_md5=$(tail last | md5sum)
         if [ "$cur_md5" = "$old_md5" ]; then
             let same_count++
             if [ $same_count -ge $same_max ]; then # encountered the 
SAME error too many times, give up and truncate the offending node
                 noderevid=$(grep NodeRev last | tail -n1 | egrep -o 
'[-0-9.a-z]+/[0-9]+')
                 cpath=$(grep cpath last | tail -n1 | egrep -o '/.*$')
                 echo -e "Encountered same error $same_max times, giving 
up and truncating the following node:\n$cpath" >> $f
                 ./fsfsverify.py -t $noderevid $rev >> $f 2>&1
                 same_count=1
                 continue # skip the fix attempt below
             fi
         else # reset
             same_count=1
         fi
         ./fsfsverify.py -f $rev >/dev/null
         old_md5="$cur_md5"
     else # no errors from fsfsverify.py, we're done
         break
     fi
done
echo done >> $f


Re: Corrupted revisions - need help

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Nov 19, 2012 at 04:11:01PM +0200, Gunther Mayer wrote:
> So, I'm wondering if anyone from the community can help me. I think
> I still have all of the original files which got written or amended
> during the three broken revisions (in one or more working copies),
> but one of these revisions is about 1.5GB so sharing is a bit tricky
> (the other two are 23MB and 1.7MB). I'm even willing to pay somebody
> to do the job if that's what's necessary, I only want to recreate my
> repository from scratch as a LAST resort as I would lose all of my
> history.

I've dealt with similar corruption problems in the past, where original
fulltext file content was still available.

So maybe this hint will help you: You might be able to create good
representations by committing the fulltext files to a fresh temporary
repository, possibly in multiple commits in the right order if you have
more than one version of a file available.

Extract these reps from the FSFS data of the temporary repository and
stitch them into the broken repository at appropriate places, recalculating
checksums where necessary, and tweaking offsets and maybe adding some padding
if necessary. In case the good reps use less space than the bad ones, or
the exact same amount, they can be made to work fairly easily.
If they end up being larger things gets a bit more tricky. Note that
due to the way FSFS revisions are parsed by Subversion (it looks at the
end of the file for the changed-path data section offset first) you can
move the changed-path data section further down to create more space in
an existing revision file -- but you cannot move any other existing sections
by even a single bit!

I've managed to fix several corrupt revisions like this. There was a similar
problem at the time, an elego customer's the SVN server was running in a VM
and when the host computer unexpectedly lost power revision data in several
FSFS files didn't get saved to the physical disks on time... oops!
They were able to get some fulltext files from working copies which we could
use to recreate some of the lost reps.

Some related reading material (read in given order):
https://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_base/notes/fs-history
https://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_fs/structure

Good luck!

Re: Corrupted revisions - need help

Posted by Gunther Mayer <gu...@googlemail.com>.
On 2012/11/20 4:54 AM, Nico Kadel-Garcia wrote:
> On Mon, Nov 19, 2012 at 9:11 AM, Gunther Mayer
> <gu...@googlemail.com> wrote:
>> Hi there,
>>
>> I'm the sysadmin for our small company (8 employees) and we're running all
>> our shared files over a subversion server. Some time ago our server had
>> faulty memory which resulted in corrupt entries being written to the
>> underlying fsfs db, later propagating to backups too. This resulted in four
>> corrupt revisions in my /var/svn/myrepo/db/revs/XXXX, one of which I managed
>> to fix manually with fsfsverify and a whole lot of hacking/fudging. The
>> other three however are beyond me and I can't afford to spend days of trying
>> to figure out how to fix it (and fsfsverify can't do it either, it keeps
>> choking on the same issue). The problem is that I cannot take a full backup
>> of my repository or create a new working copy from scratch as any command
>> (e.g. svnadmin verify, svn co etc.) that comes across these revisions chokes
>> and dies, always with the dreaded "Decompression of svndiff data failed"
>> error.
> Which version of Subversion are you running with? Do you have the
> latest revisions, to use the latest repair tools? And can you do an
> export of the current contents, set aside the old repository, and
> switch people to the new repo with the necessary tags, but without the
> corrupted history? This is an approach I've used successfully for
> people migrating among source control systems or cleaning up projects
> where inappropriae data was in the primary repository. (Spurious DVD
> images and files with passwords were particularly common problems.)
>

I'm still using svn version 1.6.18 (r1303927), haven't bothered yet to 
upgrade to 1.7 because I haven't seen the need. Correct me if I'm wrong 
but I thought a corrupted revision will choke svn 1.7 just as much as 
1.6 as the underlying fsfs structure hasn't changed across versions. I 
did ensure, however, that I'm using the latest version of fsfsverify.

You're right, I can export all current contents and create a new 
repository from it but then I lose all the history which is exactly what 
I don't want. I might end up using a hybrid approach though - fixing the 
smaller two corrupt revisions and starting from scratch for the big one 
(1.5GB) as it's very early in my history (r6).

Re: Corrupted revisions - need help

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Mon, Nov 19, 2012 at 9:11 AM, Gunther Mayer
<gu...@googlemail.com> wrote:
> Hi there,
>
> I'm the sysadmin for our small company (8 employees) and we're running all
> our shared files over a subversion server. Some time ago our server had
> faulty memory which resulted in corrupt entries being written to the
> underlying fsfs db, later propagating to backups too. This resulted in four
> corrupt revisions in my /var/svn/myrepo/db/revs/XXXX, one of which I managed
> to fix manually with fsfsverify and a whole lot of hacking/fudging. The
> other three however are beyond me and I can't afford to spend days of trying
> to figure out how to fix it (and fsfsverify can't do it either, it keeps
> choking on the same issue). The problem is that I cannot take a full backup
> of my repository or create a new working copy from scratch as any command
> (e.g. svnadmin verify, svn co etc.) that comes across these revisions chokes
> and dies, always with the dreaded "Decompression of svndiff data failed"
> error.

Which version of Subversion are you running with? Do you have the
latest revisions, to use the latest repair tools? And can you do an
export of the current contents, set aside the old repository, and
switch people to the new repo with the necessary tags, but without the
corrupted history? This is an approach I've used successfully for
people migrating among source control systems or cleaning up projects
where inappropriae data was in the primary repository. (Spurious DVD
images and files with passwords were particularly common problems.)