You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Philippe DEMOUSTIER <pd...@groupe-atlantic.com> on 2021/03/03 08:03:08 UTC

Restoring svn database

Hi all,

Following an issue on our servers, we lost approximatively 30% of our svn database.
Admin dump fails so we're trying to restore some data manually.

How can we restore data between SVN and ENDREP tags ?

DELTA 30834 15564 155
SVN xxxxxxxxxxxxxxxxxxxxxx ENDREP

Many thanks for your help,
Philippe


Re: Restoring svn database

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Jean-Baptiste DUBOIS wrote on Tue, Mar 09, 2021 at 12:02:51 +0000:
> I have one last question regarding how svn tags are managed internally in fsfs.
> Is svn tag considered as a PLAIN data independant of previous revs or not ?

As far as FSFS is concerned, there's no such thing as a tag.  What FSFS
deals with is a zero-indexed array of interrelated directory trees.

By "tag" people normally refer to a child of a well-known directory;
specifically, to its contents and copyfrom information.  About these:

- The contents of any given file are stored in that file's `Last Changed
  Revision` rev file or in an older rev file.  The latter case is only
  possible when rep-sharing is enabled, which in your case it's not,
  because you use format 2 ("f2").  Watch:
  .
     1	% svn info --show-item=last-changed-revision file://$PWD/iota{0,1}@86
     2	86         file://$PWD/iota0
     3	85         file://$PWD/iota1
     4	% svnlook tree --show-ids -r 86 ./
     5	/ <0.0.r86/428>
     6	 iota0 <2.0.r86/210>
     7	 iota1 <1.0.r85/210>
     8	% xxd -s 210 -l 4096 -p db/revs/85 | xxd -r -p | sed -e '/^$/q' 
     9	id: 1.0.r85/210
    10	type: file
    11	pred: 1.0.r83/111
    12	count: 42
    13	text: 85 0 188 2340 b6cff4dd559998419b5210d8de7c7130
    14	cpath: /iota1
    15	copyroot: 0 /
    16	

  iota1@86's node-rev header is in r85 (line 7).  Its rep is also in r85
  (line 13), but in a rep-sharing situation, it might've been in an
  older rev file.

  Line 8 was written for f2.  For a variant supporting f3, f4, and f6, see
  <https://mail-archives.apache.org/mod_mbox/subversion-dev/201110.mbox/%3C20111002182635.GA11238%40daniel3.local%3E>.

  [For the curious, the bug manifesting in that post is #4129.  It was
  finally reproduced the following March:
  <https://mail-archives.apache.org/mod_mbox/subversion-dev/201203.mbox/%3C20120319122433.GA507%40daniel3.local%3E>.]

- The copyfrom information is stored in the node-rev header, so it'll be
  stored in the rev file of the revision the tag was created in.
  Under f2, which doesn't deltify directory reps (see
  SVN_FS_FS__MIN_DELTIFICATION_FORMAT), the node-rev itself will be
  accessible via the API, as discussed upthread.

Finally, note that "data independent of previous revs" and "PLAIN data"
aren't synonyms, since self-deltas are a thing. 

Cheers,

Daniel

RE: Restoring svn database

Posted by Jean-Baptiste DUBOIS <jb...@groupe-atlantic.com>.
Hi Daniel,

Thank you for you explanation.
I wrote a data extractor python script that call 'svnlook changed' and 'svnlook cat' on valid revs files.
I sucessfully get back all data that do not depends on previous commits lost.
Thank you for your help.

I have one last question regarding how svn tags are managed internally in fsfs.
Is svn tag considered as a PLAIN data independant of previous revs or not ?

BR,
Jean-Baptiste.

-----Message d'origine-----
De : Daniel Shahaf <d....@daniel.shahaf.name> 
Envoyé : vendredi, mars 5, 2021 17:34
À : Jean-Baptiste DUBOIS <jb...@groupe-atlantic.com>
Cc : users@subversion.apache.org; Philippe DEMOUSTIER <pd...@groupe-atlantic.com>
Objet : Re: Restoring svn database

Be careful: This message was sent from a sender outside of Groupe Atlantic.
Please do not click links or open attachments unless you recognize the source of this email and know the content is safe.
Jean-Baptiste DUBOIS wrote on Fri, Mar 05, 2021 at 11:04:01 +0000:
> Inside the 'db' folder we have 'rev', 'revprops' 'transactions' folder contains files.

revs/ is where file content lives.  See subversion/libsvn_fs_fs/structure.
(You can read it in trunk too: it covers all FSFS formats, not only the newest one.)

> Some files are missing but the company told us that all files 
> recovered are fully recovered (ie: file integrity is OK).

That's good.  It's also a very different question than the one Philippe asked.

> 
> I know that we could not restore the entire database, but can we 
> extract some 'SVN plain data' from file in 'revs' folder
> 
> Hereunder a view of thoses files (ordered by Size) ...
> 
> [revs]

For future reference, text is preferable to images.  Copying the output of `dir` would have been easier for us to consume.

> Is it possible to detect plain data with no dependency on previous revs inside these 'revs' files and extract them ?

Yes.  In your case it'll actually be a four-liner loop in the scripting language of your choice, but I'll give the full answer.

Let's take an example format-2 revision file:

[[[
% rm -rf r
% svnadmin create r --compatible-version=1.4 % cat r/db/format
2
% for i in {1..$(wc -l < ${fn::=~/src/svn/trunk/README})} ; do svnmucc -mm -U file://$(pwd)/r put =(head -$i $fn) iota$((i % 2)) ; done > /dev/null % cat_the_youngest_revision_file() { < r/db/revs/*(.om[1]) LC_ALL=C sed -e 's/[^ -~]/X/g' } # translate any octets other than printable ASCII to "X"
% cat_the_youngest_revision_file | nl -ba
     1	DELTA 82 0 331
     2	SVNXXXXX%XX(XXXXXXX&X&      Finally, be sure to see Appendix B in the Subversion Book.  It
     3	      contains a very quick overview of the major differences between
     4	      CVS and Subversion.
     5	
     6	ENDREP
     7	id: 2.0.r86/210
     8	type: file
     9	pred: 2.0.r84/183
    10	count: 42
    11	text: 86 0 188 2341 c09759b8da81bf3e23647f4c517abbe0
    12	cpath: /iota0
    13	copyroot: 0 /
    14	
    15	PLAIN
    16	K 5
    17	iota0
    18	V 16
    19	file 2.0.r86/210
    20	K 5
    21	iota1
    22	V 16
    23	file 1.0.r85/210
    24	END
    25	ENDREP
    26	id: 0.0.r86/428
    27	type: dir
    28	pred: 0.0.r85/428
    29	count: 86
    30	text: 86 347 68 68 d3f8fb2002e019614ec0e47c79e2ac4c
    31	cpath: /
    32	copyroot: 0 /
    33	
    34	2.0.t85-1 modify true false /iota0
    35	
    36	
    37	428 558
%
]]]

(Aside: As an f8 repository, the svndiff delta would contain only the last line of README, rather than the last three lines as in this
example.)

The parts you're interested in are:

- "DELTA %ld %ld %ld" lines (e.g., the «82» on line 1)

- "text:" and "props:" lines (e.g., the «86» on line 11)

- "DELTA\n" lines (without numbers)

- "PLAIN\n" lines

In the first two cases, the first number on the line identifies the revision number in which depended-on data is found.  See the aforementioned «structure» file for details.  The last two cases identify data that's present inline.

Using this information, you could build a DAG of reachable reps (a "rep"
is the thing between the "DELTA" or "PLAIN" line and the "ENDREP" line) and extract them.  However, since you're on format 2, there's an easier way.

Format 2 doesn't support rep-sharing and doesn't deltify directory reps, so simply running `svnlook changed -r 86` and then `svnlook cat -r 86` against each file printed thereby should extract everything extractable.
Any given `svn cat` invocation might fail if a DELTA line refers to a revision whose rev files has been lost.  ("text:" and "props:" lines will always point into the rev file they themselves are in.)

Use `svnlook propget` in addition to `svn cat` to extract versioned properties.  svn_hash_read2() will parse the format.  (It's a public API and likely available via the bindings as well, but if needed, I happen to have posted a pure Python implementation of that last week:
<https://mail-archives.apache.org/mod_mbox/subversion-dev/202102.mbox/%3C20210226173525.GA24828%40tarpaulin.shahaf.local2%3E>.)

Note: that's `svnlook cat -r`, not `svn cat -r`.  The difference
matters: -r to svnlook denotes a peg revision, not an operational revision.  (Also, using svnlook(1) bypasses several layers of API.)

In newer formats, where directory reps may be deltified, it's possible to get a case such as .
    r10: mkdir /A
    r20: add /A/foo
    r30: add /A/bar
.
with r20 lost.  In this case, if the rep of /A in r30 happened to depend on the rep of /A in r20, `svn ls ^/A@30` and `svn cat` of files thereunder would both fail.  However, if one figured out the location of /A/bar's node-rev header or rep, one could still read those directly, using the appropriate internal APIs.

Cheers,

Daniel

> I try to used and patched the fsfsverify.py script without success.
> 
> 
> 
> BR,
> 
> Jean-Baptiste.
> 
> 
> 
> -----Message d'origine-----
> De : Daniel Shahaf <d....@daniel.shahaf.name> Envoyé : mercredi, mars 3, 
> 2021 22:22 À : users@subversion.apache.org Cc : Philippe DEMOUSTIER 
> <pd...@groupe-atlantic.com>; Jean-Baptiste DUBOIS 
> <jb...@groupe-atlantic.com> Objet : Re: Restoring svn database
> 
> 
> 
> Stefan Sperling wrote on Wed, Mar 03, 2021 at 11:15:26 +0100:
> 
> > On Wed, Mar 03, 2021 at 08:03:08AM +0000, Philippe DEMOUSTIER wrote:
> 
> > > Following an issue on our servers, we lost approximatively 30% of our svn database.
> 
> > > Admin dump fails so we're trying to restore some data manually.
> 
> 
> 
> _Which_ 30% did you lose?
> 
> 
> 
> > > How can we restore data between SVN and ENDREP tags ?
> 
> > >
> 
> > > DELTA 30834 15564 155
> 
> > > SVN xxxxxxxxxxxxxxxxxxxxxx ENDREP
> 
> > >
> 
> > > Many thanks for your help,
> 
> > > Philippe
> 
> >
> 
> > Hi Philippe,
> 
> >
> 
> > I am afraid the short answer is that you will need to restore your 
> > repository
> 
> > from backups. If you cannot do that, then you have now learned the 
> > hard way
> 
> > that an important SVN repository needs to be backed up. The data is gone.
> 
> >
> 
> > It is virtually impossible to restore the missing data manually.
> 
> 
> 
> Well, it depends.
> 
> 
> 
> If the data is part of a directory rep, it may be possible to
> 
> reconstruct the directory rep from the node-rev headers in the same 
> rev
> 
> file.  Moreover, even if the directory rep isn't reconstructed at all,
> 
> so long as the file content reps are intact, extracting those into 
> files
> 
> might still be useful, even though that wouldn't restore filenames.  
> For
> 
> filenames, one could consult the changed-paths section of the rev 
> file,
> 
> commit notifications, TortoiseSVN log caches (if those cache 
> changed-paths
> 
> info?), log messages, git-svn(1) mirrors, and even nearby cpath 
> node-rev
> 
> headers.
> 
> 
> 
> If the data is part of a property rep, it may be acceptable to accept
> 
> the loss of those properties and restore the remaining data.
> 
> 
> 
> Even if the data is part of a content rep, that isn't necessarily game
> 
> over insofar as later revisions of the same file are concerned: it's
> 
> likely that some later revisions will be able to be reconstructed
> 
> despite the loss of one interim rep.  (Because of skip-deltas and,
> 
> IIUC, max-linear-deltification)
> 
> 
> 
> > This data is sometimes full text, sometimes deltified content.
> 
> 
> 
> Philippe specified that the data starts with "SVN", so it's unlikely 
> to
> 
> be a full text.  It might be a self-delta, but it's most probably just
> 
> a regular delta.
> 
> 
> 
> > The only way to generate
> 
> > equivalent data is to replay all the commits that have occurred 
> > throughout
> 
> > repository history. The tool 'svnadmin load' can do this, provided 
> > you have
> 
> > previously saved a dump file of the repository to backup storage 
> > when the
> 
> > repository was still in a healthy condition.
> 
> 
> 
> It's not that simple.  Nothing guarantees that the lost rep and the 
> rep
> 
> resulting from the load would be the same length.  One would have to
> 
> check the lengths manually and deal with the differences.
> 
> 
> 
> > Going forward, you should consider saving repository dump files to 
> > backup
> 
> > storage, or saving backup copies of your repositories with tools 
> > such as
> 
> > 'svnadmin hotcopy' or 'svnsync'  For more information, see this page:
> 
> > http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn
> > .reposadmin.maint.backup
> 
> 
> 
> +1
> 
> 
> 
> Cheers,
> 
> 
> 
> Daniel



Re: Restoring svn database

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Fri, 05 Mar 2021 16:33 +00:00:
> Format 2 doesn't support rep-sharing and doesn't deltify directory reps,
> so simply running `svnlook changed -r 86` and then `svnlook cat -r 86`
> against each file printed thereby should extract everything extractable.
> Any given `svn cat` invocation might fail if a DELTA line refers to

s/svn cat/svnlook cat/

> a revision whose rev files has been lost.  ("text:" and "props:" lines
> will always point into the rev file they themselves are in.)
> 
> Use `svnlook propget` in addition to `svn cat` to extract versioned

s/svn cat/svnlook cat/

> properties.  svn_hash_read2() will parse the format.  (It's a public
> API and likely available via the bindings as well, but if needed,

svn_hash_read2() isn't needed in the format-2 case.  It may, however, be
needed in the general case to parse directory reps or property reps.


Re: Restoring svn database

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Jean-Baptiste DUBOIS wrote on Fri, Mar 05, 2021 at 11:04:01 +0000:
> Inside the 'db' folder we have 'rev', 'revprops' 'transactions' folder contains files.

revs/ is where file content lives.  See subversion/libsvn_fs_fs/structure.
(You can read it in trunk too: it covers all FSFS formats, not only the
newest one.)

> Some files are missing but the company told us that all files
> recovered are fully recovered (ie: file integrity is OK).

That's good.  It's also a very different question than the one Philippe
asked.

> 
> I know that we could not restore the entire database, but can we extract some 'SVN plain data' from file in 'revs' folder
> 
> Hereunder a view of thoses files (ordered by Size) ...
> 
> [revs]

For future reference, text is preferable to images.  Copying the output
of `dir` would have been easier for us to consume.

> Is it possible to detect plain data with no dependency on previous revs inside these 'revs' files and extract them ?

Yes.  In your case it'll actually be a four-liner loop in the scripting
language of your choice, but I'll give the full answer.

Let's take an example format-2 revision file:

[[[
% rm -rf r 
% svnadmin create r --compatible-version=1.4 
% cat r/db/format
2
% for i in {1..$(wc -l < ${fn::=~/src/svn/trunk/README})} ; do svnmucc -mm -U file://$(pwd)/r put =(head -$i $fn) iota$((i % 2)) ; done > /dev/null 
% cat_the_youngest_revision_file() { < r/db/revs/*(.om[1]) LC_ALL=C sed -e 's/[^ -~]/X/g' } # translate any octets other than printable ASCII to "X"
% cat_the_youngest_revision_file | nl -ba
     1	DELTA 82 0 331
     2	SVNXXXXX%XX(XXXXXXX&X&      Finally, be sure to see Appendix B in the Subversion Book.  It
     3	      contains a very quick overview of the major differences between
     4	      CVS and Subversion.
     5	
     6	ENDREP
     7	id: 2.0.r86/210
     8	type: file
     9	pred: 2.0.r84/183
    10	count: 42
    11	text: 86 0 188 2341 c09759b8da81bf3e23647f4c517abbe0
    12	cpath: /iota0
    13	copyroot: 0 /
    14	
    15	PLAIN
    16	K 5
    17	iota0
    18	V 16
    19	file 2.0.r86/210
    20	K 5
    21	iota1
    22	V 16
    23	file 1.0.r85/210
    24	END
    25	ENDREP
    26	id: 0.0.r86/428
    27	type: dir
    28	pred: 0.0.r85/428
    29	count: 86
    30	text: 86 347 68 68 d3f8fb2002e019614ec0e47c79e2ac4c
    31	cpath: /
    32	copyroot: 0 /
    33	
    34	2.0.t85-1 modify true false /iota0
    35	
    36	
    37	428 558
% 
]]]

(Aside: As an f8 repository, the svndiff delta would contain only the
last line of README, rather than the last three lines as in this
example.)

The parts you're interested in are:

- "DELTA %ld %ld %ld" lines (e.g., the «82» on line 1)

- "text:" and "props:" lines (e.g., the «86» on line 11)

- "DELTA\n" lines (without numbers)

- "PLAIN\n" lines

In the first two cases, the first number on the line identifies the
revision number in which depended-on data is found.  See the
aforementioned «structure» file for details.  The last two cases
identify data that's present inline.

Using this information, you could build a DAG of reachable reps (a "rep"
is the thing between the "DELTA" or "PLAIN" line and the "ENDREP" line)
and extract them.  However, since you're on format 2, there's an easier
way.

Format 2 doesn't support rep-sharing and doesn't deltify directory reps,
so simply running `svnlook changed -r 86` and then `svnlook cat -r 86`
against each file printed thereby should extract everything extractable.
Any given `svn cat` invocation might fail if a DELTA line refers to
a revision whose rev files has been lost.  ("text:" and "props:" lines
will always point into the rev file they themselves are in.)

Use `svnlook propget` in addition to `svn cat` to extract versioned
properties.  svn_hash_read2() will parse the format.  (It's a public
API and likely available via the bindings as well, but if needed,
I happen to have posted a pure Python implementation of that last week:
<https://mail-archives.apache.org/mod_mbox/subversion-dev/202102.mbox/%3C20210226173525.GA24828%40tarpaulin.shahaf.local2%3E>.)

Note: that's `svnlook cat -r`, not `svn cat -r`.  The difference
matters: -r to svnlook denotes a peg revision, not an operational
revision.  (Also, using svnlook(1) bypasses several layers of API.)

In newer formats, where directory reps may be deltified, it's possible
to get a case such as
.
    r10: mkdir /A
    r20: add /A/foo
    r30: add /A/bar
.
with r20 lost.  In this case, if the rep of /A in r30 happened to depend
on the rep of /A in r20, `svn ls ^/A@30` and `svn cat` of files
thereunder would both fail.  However, if one figured out the location of
/A/bar's node-rev header or rep, one could still read those directly,
using the appropriate internal APIs.

Cheers,

Daniel

> I try to used and patched the fsfsverify.py script without success.
> 
> 
> 
> BR,
> 
> Jean-Baptiste.
> 
> 
> 
> -----Message d'origine-----
> De : Daniel Shahaf <d....@daniel.shahaf.name>
> Envoyé : mercredi, mars 3, 2021 22:22
> À : users@subversion.apache.org
> Cc : Philippe DEMOUSTIER <pd...@groupe-atlantic.com>; Jean-Baptiste DUBOIS <jb...@groupe-atlantic.com>
> Objet : Re: Restoring svn database
> 
> 
> 
> Stefan Sperling wrote on Wed, Mar 03, 2021 at 11:15:26 +0100:
> 
> > On Wed, Mar 03, 2021 at 08:03:08AM +0000, Philippe DEMOUSTIER wrote:
> 
> > > Following an issue on our servers, we lost approximatively 30% of our svn database.
> 
> > > Admin dump fails so we're trying to restore some data manually.
> 
> 
> 
> _Which_ 30% did you lose?
> 
> 
> 
> > > How can we restore data between SVN and ENDREP tags ?
> 
> > >
> 
> > > DELTA 30834 15564 155
> 
> > > SVN xxxxxxxxxxxxxxxxxxxxxx ENDREP
> 
> > >
> 
> > > Many thanks for your help,
> 
> > > Philippe
> 
> >
> 
> > Hi Philippe,
> 
> >
> 
> > I am afraid the short answer is that you will need to restore your repository
> 
> > from backups. If you cannot do that, then you have now learned the hard way
> 
> > that an important SVN repository needs to be backed up. The data is gone.
> 
> >
> 
> > It is virtually impossible to restore the missing data manually.
> 
> 
> 
> Well, it depends.
> 
> 
> 
> If the data is part of a directory rep, it may be possible to
> 
> reconstruct the directory rep from the node-rev headers in the same rev
> 
> file.  Moreover, even if the directory rep isn't reconstructed at all,
> 
> so long as the file content reps are intact, extracting those into files
> 
> might still be useful, even though that wouldn't restore filenames.  For
> 
> filenames, one could consult the changed-paths section of the rev file,
> 
> commit notifications, TortoiseSVN log caches (if those cache changed-paths
> 
> info?), log messages, git-svn(1) mirrors, and even nearby cpath node-rev
> 
> headers.
> 
> 
> 
> If the data is part of a property rep, it may be acceptable to accept
> 
> the loss of those properties and restore the remaining data.
> 
> 
> 
> Even if the data is part of a content rep, that isn't necessarily game
> 
> over insofar as later revisions of the same file are concerned: it's
> 
> likely that some later revisions will be able to be reconstructed
> 
> despite the loss of one interim rep.  (Because of skip-deltas and,
> 
> IIUC, max-linear-deltification)
> 
> 
> 
> > This data is sometimes full text, sometimes deltified content.
> 
> 
> 
> Philippe specified that the data starts with "SVN", so it's unlikely to
> 
> be a full text.  It might be a self-delta, but it's most probably just
> 
> a regular delta.
> 
> 
> 
> > The only way to generate
> 
> > equivalent data is to replay all the commits that have occurred throughout
> 
> > repository history. The tool 'svnadmin load' can do this, provided you have
> 
> > previously saved a dump file of the repository to backup storage when the
> 
> > repository was still in a healthy condition.
> 
> 
> 
> It's not that simple.  Nothing guarantees that the lost rep and the rep
> 
> resulting from the load would be the same length.  One would have to
> 
> check the lengths manually and deal with the differences.
> 
> 
> 
> > Going forward, you should consider saving repository dump files to backup
> 
> > storage, or saving backup copies of your repositories with tools such as
> 
> > 'svnadmin hotcopy' or 'svnsync'  For more information, see this page:
> 
> > http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.reposadmin.maint.backup
> 
> 
> 
> +1
> 
> 
> 
> Cheers,
> 
> 
> 
> Daniel



RE: Restoring svn database

Posted by Jean-Baptiste DUBOIS <jb...@groupe-atlantic.com>.
Hi Daniel, Stephan.

My name is Jean-Baptiste Dubois, I work with Philippe.

First of all, we are grateful for supporting us on our SVN troubles...



Just few words to explain you our current situation:

Our company has been cyber attacked few months ago and both SVN server and the backup (yes we had a backup !) are now encrypted for ever ....

One month before the attack, the physical hard  drive used by our SVN server has been replaced, and the old one has been used for another usage...

We recently worked closely with a company expert on hard drive data recovery, and they recover 70% of the files of SVN database (~20GB)



We are using svn 1.4 with a 'fsfs' backend, format revision 2.

Inside the 'db' folder we have 'rev', 'revprops' 'transactions' folder contains files.

Some files are missing but the company told us that all files recovered are fully recovered (ie: file integrity is OK).



I know that we could not restore the entire database, but can we extract some 'SVN plain data' from file in 'revs' folder

Hereunder a view of thoses files (ordered by Size) ...

[revs]





Is it possible to detect plain data with no dependency on previous revs inside these 'revs' files and extract them ?

I try to used and patched the fsfsverify.py script without success.



BR,

Jean-Baptiste.



-----Message d'origine-----
De : Daniel Shahaf <d....@daniel.shahaf.name>
Envoyé : mercredi, mars 3, 2021 22:22
À : users@subversion.apache.org
Cc : Philippe DEMOUSTIER <pd...@groupe-atlantic.com>; Jean-Baptiste DUBOIS <jb...@groupe-atlantic.com>
Objet : Re: Restoring svn database



Stefan Sperling wrote on Wed, Mar 03, 2021 at 11:15:26 +0100:

> On Wed, Mar 03, 2021 at 08:03:08AM +0000, Philippe DEMOUSTIER wrote:

> > Following an issue on our servers, we lost approximatively 30% of our svn database.

> > Admin dump fails so we're trying to restore some data manually.



_Which_ 30% did you lose?



> > How can we restore data between SVN and ENDREP tags ?

> >

> > DELTA 30834 15564 155

> > SVN xxxxxxxxxxxxxxxxxxxxxx ENDREP

> >

> > Many thanks for your help,

> > Philippe

>

> Hi Philippe,

>

> I am afraid the short answer is that you will need to restore your repository

> from backups. If you cannot do that, then you have now learned the hard way

> that an important SVN repository needs to be backed up. The data is gone.

>

> It is virtually impossible to restore the missing data manually.



Well, it depends.



If the data is part of a directory rep, it may be possible to

reconstruct the directory rep from the node-rev headers in the same rev

file.  Moreover, even if the directory rep isn't reconstructed at all,

so long as the file content reps are intact, extracting those into files

might still be useful, even though that wouldn't restore filenames.  For

filenames, one could consult the changed-paths section of the rev file,

commit notifications, TortoiseSVN log caches (if those cache changed-paths

info?), log messages, git-svn(1) mirrors, and even nearby cpath node-rev

headers.



If the data is part of a property rep, it may be acceptable to accept

the loss of those properties and restore the remaining data.



Even if the data is part of a content rep, that isn't necessarily game

over insofar as later revisions of the same file are concerned: it's

likely that some later revisions will be able to be reconstructed

despite the loss of one interim rep.  (Because of skip-deltas and,

IIUC, max-linear-deltification)



> This data is sometimes full text, sometimes deltified content.



Philippe specified that the data starts with "SVN", so it's unlikely to

be a full text.  It might be a self-delta, but it's most probably just

a regular delta.



> The only way to generate

> equivalent data is to replay all the commits that have occurred throughout

> repository history. The tool 'svnadmin load' can do this, provided you have

> previously saved a dump file of the repository to backup storage when the

> repository was still in a healthy condition.



It's not that simple.  Nothing guarantees that the lost rep and the rep

resulting from the load would be the same length.  One would have to

check the lengths manually and deal with the differences.



> Going forward, you should consider saving repository dump files to backup

> storage, or saving backup copies of your repositories with tools such as

> 'svnadmin hotcopy' or 'svnsync'  For more information, see this page:

> http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.reposadmin.maint.backup



+1



Cheers,



Daniel

Re: Restoring svn database

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Stefan Sperling wrote on Wed, Mar 03, 2021 at 11:15:26 +0100:
> On Wed, Mar 03, 2021 at 08:03:08AM +0000, Philippe DEMOUSTIER wrote:
> > Following an issue on our servers, we lost approximatively 30% of our svn database.
> > Admin dump fails so we're trying to restore some data manually.

_Which_ 30% did you lose?

> > How can we restore data between SVN and ENDREP tags ?
> > 
> > DELTA 30834 15564 155
> > SVN xxxxxxxxxxxxxxxxxxxxxx ENDREP
> > 
> > Many thanks for your help,
> > Philippe
> 
> Hi Philippe,
> 
> I am afraid the short answer is that you will need to restore your repository
> from backups. If you cannot do that, then you have now learned the hard way
> that an important SVN repository needs to be backed up. The data is gone.
> 
> It is virtually impossible to restore the missing data manually.

Well, it depends.

If the data is part of a directory rep, it may be possible to
reconstruct the directory rep from the node-rev headers in the same rev
file.  Moreover, even if the directory rep isn't reconstructed at all,
so long as the file content reps are intact, extracting those into files
might still be useful, even though that wouldn't restore filenames.  For
filenames, one could consult the changed-paths section of the rev file,
commit notifications, TortoiseSVN log caches (if those cache changed-paths
info?), log messages, git-svn(1) mirrors, and even nearby cpath node-rev
headers.

If the data is part of a property rep, it may be acceptable to accept
the loss of those properties and restore the remaining data.

Even if the data is part of a content rep, that isn't necessarily game
over insofar as later revisions of the same file are concerned: it's
likely that some later revisions will be able to be reconstructed
despite the loss of one interim rep.  (Because of skip-deltas and,
IIUC, max-linear-deltification)

> This data is sometimes full text, sometimes deltified content.

Philippe specified that the data starts with "SVN", so it's unlikely to
be a full text.  It might be a self-delta, but it's most probably just
a regular delta.

> The only way to generate
> equivalent data is to replay all the commits that have occurred throughout
> repository history. The tool 'svnadmin load' can do this, provided you have
> previously saved a dump file of the repository to backup storage when the
> repository was still in a healthy condition.

It's not that simple.  Nothing guarantees that the lost rep and the rep
resulting from the load would be the same length.  One would have to
check the lengths manually and deal with the differences.

> Going forward, you should consider saving repository dump files to backup
> storage, or saving backup copies of your repositories with tools such as
> 'svnadmin hotcopy' or 'svnsync'  For more information, see this page:
> http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.reposadmin.maint.backup

+1

Cheers,

Daniel

Re: Restoring svn database

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Mar 03, 2021 at 08:03:08AM +0000, Philippe DEMOUSTIER wrote:
> Hi all,
> 
> Following an issue on our servers, we lost approximatively 30% of our svn database.
> Admin dump fails so we're trying to restore some data manually.
> 
> How can we restore data between SVN and ENDREP tags ?
> 
> DELTA 30834 15564 155
> SVN xxxxxxxxxxxxxxxxxxxxxx ENDREP
> 
> Many thanks for your help,
> Philippe

Hi Philippe,

I am afraid the short answer is that you will need to restore your repository
from backups. If you cannot do that, then you have now learned the hard way
that an important SVN repository needs to be backed up. The data is gone.

It is virtually impossible to restore the missing data manually. This data
is sometimes full text, sometimes deltified content. The only way to generate
equivalent data is to replay all the commits that have occurred throughout
repository history. The tool 'svnadmin load' can do this, provided you have
previously saved a dump file of the repository to backup storage when the
repository was still in a healthy condition.

Going forward, you should consider saving repository dump files to backup
storage, or saving backup copies of your repositories with tools such as
'svnadmin hotcopy' or 'svnsync'  For more information, see this page:
http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.reposadmin.maint.backup

Regards,
Stefan