You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@subversion.apache.org by Keith Johnson <k3...@gmail.com> on 2015/02/25 18:48:37 UTC

Trying to restore a corrupted repo

Hello folks.  I'm not currently a subscriber, and therefore would
appreciate cc's.  I will of course respond back to the list.

Here is the gist of my situation.  I have a relatively small repo, a little
over 1000 commits, 6G total or so (fair amount of binary data), very few
users (3).

When having a 4th person do a checkout recently, the process (via
tortoisesvn) bombed out with a path to a revs file (db/revs/0/586 or
something) and input/output error.

It became evident very quickly that this was a result of bad sectors, and
maybe 6 total files were corrupt.  I had backups for all but 1 of them
(r772).  It later became evident that even my backup for one of them (r390)
was corrupt.  Copied everything to a new drive, and attempted to start
putting everything back together.

The normal process for trying to salvage these situations is to dump
skipping over the bad revisions such as:

svnadmin dump /svn -r 1:389 > dump_0001_0389
svnadmin dump /svn -r 391:771 --incremental > dump_0391_0771
svnadmin dump /svn -r 773:head --incremental > dump_0773_head

The problem is that the 2nd command fails because 390 is fubar.  (The gist
is that I think 390 got truncated somehow because common error messages are
thing like "lacks trailing newline" or "node id missing" - forgive me I'm
not directly at the computer at the moment.)  In all my searching and
reading the past few days, I've never really encountered anyone complaining
that this process wouldn't work, I guess that's why I'm getting pretty
confounded.

As further weirdness, if I leave out the --incremental flag, the dump will
actually work (and produce a 64G or so file), and complain about r390 at
the very end.  The problem as you might expect with this is that svnadmin
load won't be able to load it because it wants to create everything again
the first time it encounters it, and obviously that's useless (bombs out
immediately that some item already exists).

The original server in question was on ubuntu 12.04 which was running
1.6.17(? definitely some version of 1.6).  New disk I made was with 14.04
which runs 1.8 something.  The problems seem to happen with both versions
of svnadmin.

Also, please spare me the backup lecture; believe me, I know.  I'm just a
programmer trying to clean it up now.

If anyone has seen anything like this before or has any suggestions for
getting around any of this, that would be great.  I would love to be at the
point where I could just get some valid dumps and then do what I can to
recreate the missing revs, but I can't even get past the dump stage which
is exceedingly frustrating.

keith

Re: Trying to restore a corrupted repo

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Keith Johnson wrote on Wed, Feb 25, 2015 at 13:14:56 -0600:
> On Wed, Feb 25, 2015 at 12:33 PM, Andreas Stieger <an...@gmx.de>
> wrote:
> 
> > Hi,
> >
> > Am 25. Februar 2015 18:48:37 MEZ, schrieb Keith Johnson <k33f3r@gmail.com
> > >:
> >
> > >When having a 4th person do a checkout recently, the process (via
> > >tortoisesvn) bombed out with a path to a revs file (db/revs/0/586 or
> > >something) and input/output error.
> > >
> > >It became evident very quickly that this was a result of bad sectors,
> > >and
> > >maybe 6 total files were corrupt.  I had backups for all but 1 of them
> > >(r772).  It later became evident that even my backup for one of them
> > >(r390)
> > >was corrupt.  Copied everything to a new drive, and attempted to start
> > >putting everything back together.
> > >
> > >The normal process for trying to salvage these situations is to dump
> > >skipping over the bad revisions such as:
> > >
> > >svnadmin dump /svn -r 1:389 > dump_0001_0389
> > >svnadmin dump /svn -r 391:771 --incremental > dump_0391_0771
> > >svnadmin dump /svn -r 773:head --incremental > dump_0773_head
> > >
> > >The problem is that the 2nd command fails because 390 is fubar.  (The
> > >gist
> > >is that I think 390 got truncated somehow because common error messages
> > >are
> > >thing like "lacks trailing newline" or "node id missing" - forgive me
> > >I'm
> > >not directly at the computer at the moment.)  In all my searching and
> > >reading the past few days, I've never really encountered anyone
> > >complaining
> > >that this process wouldn't work, I guess that's why I'm getting pretty
> > >confounded.
> > >
> > >As further weirdness, if I leave out the --incremental flag, the dump
> > >will
> > >actually work (and produce a 64G or so file), and complain about r390
> > >at
> > >the very end.  The problem as you might expect with this is that
> > >svnadmin
> > >load won't be able to load it because it wants to create everything
> > >again
> > >the first time it encounters it, and obviously that's useless (bombs
> > >out
> > >immediately that some item already exists).
> > >
> > >The original server in question was on ubuntu 12.04 which was running
> > >1.6.17(? definitely some version of 1.6).  New disk I made was with
> > >14.04
> > >which runs 1.8 something.  The problems seem to happen with both
> > >versions
> > >of svnadmin.
> > >
> > >Also, please spare me the backup lecture; believe me, I know.  I'm just
> > >a
> > >programmer trying to clean it up now.
> > >
> > >If anyone has seen anything like this before or has any suggestions for
> > >getting around any of this, that would be great.  I would love to be at
> > >the
> > >point where I could just get some valid dumps and then do what I can to
> > >recreate the missing revs, but I can't even get past the dump stage
> > >which
> > >is exceedingly frustrating.
> >
> > Make a backup of all existing working copies including the pristine
> > content cache under ".svn".
> >
> > When I last recovered a zeroed out block for someone I recreated the
> > broken revison N by committing an indentical change into a repository with
> > 1..N-1 loaded, with content from working copies and partial backups. The
> > remaining incremental dumps then applied cleanly. The fixed rev file could
> > be dropped back into the production area as it was.
> >
> 
> Hi Andreas, thanks for the response.
> 
> The revision in question is over a year old, so I'm not sure I can put it
> back exactly as it was (guess all I can do is try my best).  I assume
> there's no way to actually get historical data from pristine - that's just
> a cache of current documents, correct?

In theory, yes.  In practice, the cache may contain everything since the
last time you ran 'svn cleanup'.  See:

http://subversion.apache.org/docs/release-notes/1.7#wc-pristines

RE: Trying to restore a corrupted repo

Posted by Geoff Field <Ge...@aapl.com.au>.

[snip]
> First step is to *write lock* the old repository, to avoid 
> accumulating history on broken history.

Absolutely - and make sure the lock stays locked forever more.

> You will be *breaking history* due to the corrupted repo. You 
> should ideally create a new repo to do the repairs in, 
> explain the situation,and tell your user community to set 
> aside their working copies and make clean checkouts when 
> you're done. The old repo should be locked, and the new repo 
> should have a new name to avoid any confusion.

What I've done in the past is to:
a) Write-lock the old repository;
b) Create and populate a new repository by whatever means are required;
c) Ensure nobody is trying to access either repository for the next few steps (pause Apache, or whatever);
d) Re-name the OLD repository to reflect its broken/BDB/whatever status;
e) Rename the NEW repository to the old name; and, finally:
f) Re-enable access to the repositories.

I even created a batch file to do this when we upgraded SVN recently and could no longer access the BDB repositories.

> This is one of the difficulties with the absolute central 
> repository approach of Subversion and its spiritual ancestor, 
> CVS. If anything happens to the central repository, you have 
> to re-establish the effectively server/client relationship 
> correctly or be at risk of corrupting your history, *again*.

This is why your servers should have an effective backup and restoration system.  Ours uses a SAN with off-site backup.  Then, if we catch the corruption early enough we've only lost a day's history.

> It can happen with more distributed systems as well, it's 
> just more likely when one particular repository is considered 
> canonical in your particular workflow. If the history is 
> replaced or corrupted behind your back, you can be in real trouble.

I think this is true of any single-site storage system - electronic or otherwise.  If there's a single point of failure, and you rely on the information/items, there's potential for (business) disaster.

Regards,

Geoff

-- 
Apologies for the auto-generated legal boilerplate added by our IT department:



- The contents of this email, and any attachments, are strictly private
and confidential.
- It may contain legally privileged or sensitive information and is intended
solely for the individual or entity to which it is addressed.
- Only the intended recipient may review, reproduce, retransmit, disclose,
disseminate or otherwise use or take action in reliance upon the information
contained in this email and any attachments, with the permission of
Australian Arrow Pty. Ltd.
- If you have received this communication in error, please reply to the sender
immediately and promptly delete the email and attachments, together with
any copies, from all computers.
- It is your responsibility to scan this communication and any attached files
for computer viruses and other defects and we recommend that it be
subjected to your virus checking procedures prior to use.
- Australian Arrow Pty. Ltd. does not accept liability for any loss or damage
of any nature, howsoever caused, which may result
directly or indirectly from this communication or any attached files.

Re: Trying to restore a corrupted repo

Posted by Nico Kadel-Garcia <nk...@gmail.com>.

On Wed, Feb 25, 2015 at 2:14 PM, Keith Johnson <k3...@gmail.com> wrote:

> Hi Andreas, thanks for the response.
>
> The revision in question is over a year old, so I'm not sure I can put it
> back exactly as it was (guess all I can do is try my best).  I assume
> there's no way to actually get historical data from pristine - that's just a
> cache of current documents, correct?
>
> Basically what you are saying to do is recreate up to the crash, try to
> check in a close-as-possible replacement for r390, put that back in the copy
> of the crashed repo, then dump further from there and import back in?
> Sounds like a reasonable thing to try.  Will report back later tonight.

First step is to *write lock* the old repository, to avoid
accumulating history on broken history.

You will be *breaking history* due to the corrupted repo. You should
ideally create a new repo to do the repairs in, explain the
situation,and tell your user community to set aside their working
copies and make clean checkouts when you're done. The old repo should
be locked, and the new repo should have a new name to avoid any
confusion.

This is one of the difficulties with the absolute central repository
approach of Subversion and its spiritual ancestor, CVS. If anything
happens to the central repository, you have to re-establish the
effectively server/client relationship correctly or be at risk of
corrupting your history, *again*.

It can happen with more distributed systems as well, it's just more
likely when one particular repository is considered canonical in your
particular workflow. If the history is replaced or corrupted behind
your back, you can be in real trouble.

Re: Trying to restore a corrupted repo

Posted by Keith Johnson <k3...@gmail.com>.

On Wed, Feb 25, 2015 at 12:33 PM, Andreas Stieger <an...@gmx.de>
wrote:

> Hi,
>
> Am 25. Februar 2015 18:48:37 MEZ, schrieb Keith Johnson <k33f3r@gmail.com
> >:
>
> >When having a 4th person do a checkout recently, the process (via
> >tortoisesvn) bombed out with a path to a revs file (db/revs/0/586 or
> >something) and input/output error.
> >
> >It became evident very quickly that this was a result of bad sectors,
> >and
> >maybe 6 total files were corrupt.  I had backups for all but 1 of them
> >(r772).  It later became evident that even my backup for one of them
> >(r390)
> >was corrupt.  Copied everything to a new drive, and attempted to start
> >putting everything back together.
> >
> >The normal process for trying to salvage these situations is to dump
> >skipping over the bad revisions such as:
> >
> >svnadmin dump /svn -r 1:389 > dump_0001_0389
> >svnadmin dump /svn -r 391:771 --incremental > dump_0391_0771
> >svnadmin dump /svn -r 773:head --incremental > dump_0773_head
> >
> >The problem is that the 2nd command fails because 390 is fubar.  (The
> >gist
> >is that I think 390 got truncated somehow because common error messages
> >are
> >thing like "lacks trailing newline" or "node id missing" - forgive me
> >I'm
> >not directly at the computer at the moment.)  In all my searching and
> >reading the past few days, I've never really encountered anyone
> >complaining
> >that this process wouldn't work, I guess that's why I'm getting pretty
> >confounded.
> >
> >As further weirdness, if I leave out the --incremental flag, the dump
> >will
> >actually work (and produce a 64G or so file), and complain about r390
> >at
> >the very end.  The problem as you might expect with this is that
> >svnadmin
> >load won't be able to load it because it wants to create everything
> >again
> >the first time it encounters it, and obviously that's useless (bombs
> >out
> >immediately that some item already exists).
> >
> >The original server in question was on ubuntu 12.04 which was running
> >1.6.17(? definitely some version of 1.6).  New disk I made was with
> >14.04
> >which runs 1.8 something.  The problems seem to happen with both
> >versions
> >of svnadmin.
> >
> >Also, please spare me the backup lecture; believe me, I know.  I'm just
> >a
> >programmer trying to clean it up now.
> >
> >If anyone has seen anything like this before or has any suggestions for
> >getting around any of this, that would be great.  I would love to be at
> >the
> >point where I could just get some valid dumps and then do what I can to
> >recreate the missing revs, but I can't even get past the dump stage
> >which
> >is exceedingly frustrating.
>
> Make a backup of all existing working copies including the pristine
> content cache under ".svn".
>
> When I last recovered a zeroed out block for someone I recreated the
> broken revison N by committing an indentical change into a repository with
> 1..N-1 loaded, with content from working copies and partial backups. The
> remaining incremental dumps then applied cleanly. The fixed rev file could
> be dropped back into the production area as it was.
>

Hi Andreas, thanks for the response.

The revision in question is over a year old, so I'm not sure I can put it
back exactly as it was (guess all I can do is try my best).  I assume
there's no way to actually get historical data from pristine - that's just
a cache of current documents, correct?

Basically what you are saying to do is recreate up to the crash, try to
check in a close-as-possible replacement for r390, put that back in the
copy of the crashed repo, then dump further from there and import back in?
Sounds like a reasonable thing to try.  Will report back later tonight.

keith


> Andreas
>
>

Re: Trying to restore a corrupted repo

Posted by Andreas Stieger <an...@gmx.de>.

Hi,

Am 25. Februar 2015 18:48:37 MEZ, schrieb Keith Johnson <k3...@gmail.com>:

>When having a 4th person do a checkout recently, the process (via
>tortoisesvn) bombed out with a path to a revs file (db/revs/0/586 or
>something) and input/output error.
>
>It became evident very quickly that this was a result of bad sectors,
>and
>maybe 6 total files were corrupt.  I had backups for all but 1 of them
>(r772).  It later became evident that even my backup for one of them
>(r390)
>was corrupt.  Copied everything to a new drive, and attempted to start
>putting everything back together.
>
>The normal process for trying to salvage these situations is to dump
>skipping over the bad revisions such as:
>
>svnadmin dump /svn -r 1:389 > dump_0001_0389
>svnadmin dump /svn -r 391:771 --incremental > dump_0391_0771
>svnadmin dump /svn -r 773:head --incremental > dump_0773_head
>
>The problem is that the 2nd command fails because 390 is fubar.  (The
>gist
>is that I think 390 got truncated somehow because common error messages
>are
>thing like "lacks trailing newline" or "node id missing" - forgive me
>I'm
>not directly at the computer at the moment.)  In all my searching and
>reading the past few days, I've never really encountered anyone
>complaining
>that this process wouldn't work, I guess that's why I'm getting pretty
>confounded.
>
>As further weirdness, if I leave out the --incremental flag, the dump
>will
>actually work (and produce a 64G or so file), and complain about r390
>at
>the very end.  The problem as you might expect with this is that
>svnadmin
>load won't be able to load it because it wants to create everything
>again
>the first time it encounters it, and obviously that's useless (bombs
>out
>immediately that some item already exists).
>
>The original server in question was on ubuntu 12.04 which was running
>1.6.17(? definitely some version of 1.6).  New disk I made was with
>14.04
>which runs 1.8 something.  The problems seem to happen with both
>versions
>of svnadmin.
>
>Also, please spare me the backup lecture; believe me, I know.  I'm just
>a
>programmer trying to clean it up now.
>
>If anyone has seen anything like this before or has any suggestions for
>getting around any of this, that would be great.  I would love to be at
>the
>point where I could just get some valid dumps and then do what I can to
>recreate the missing revs, but I can't even get past the dump stage
>which
>is exceedingly frustrating.

Make a backup of all existing working copies including the pristine content cache under ".svn".

When I last recovered a zeroed out block for someone I recreated the broken revison N by committing an indentical change into a repository with 1..N-1 loaded, with content from working copies and partial backups. The remaining incremental dumps then applied cleanly. The fixed rev file could be dropped back into the production area as it was.

Andreas