You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Philip Martin <ph...@codematters.co.uk> on 2014/01/22 22:19:54 UTC

FSFS rep-cache validation

On IRC today we were discussing the validation code in
svn_fs_fs__set_rep_reference.  When inserting a row into the rep-cache
the insert can fail because the row already exists.  The usual way this
would happen is that two parallel commits add content with the
same checksum at different locations, e.g.:

svn import some_file file://`pwd`/repo/f1
svn import some_file file://`pwd`/repo/f2

That's harmless and nothing goes wrong, one of the commits populates the
rep-cache.

Another problem is that the only caller of svn_fs_fs__set_rep_reference
always passes reject_dup=FALSE so there is code that is never
exercised.  Perhaps we should remove it?

A more serious problem is if somebody edits a repository and ends up
with a rep-cache that has references to revisions that do not exist.
This could happen if somebody removes revision files and edits 'current'
but doesn't run 'recover'.  Or if somebody uses an old backup with a
newer rep-cache.  We have attempted to catch this by having
svn_fs_fs__get_rep_reference return an error if the rep-cache refers to
future revisions. However the code isn't working:

$ svnadmin create repo --compatible-version 1.8
$ svnmucc -mm -U file://`pwd`/repo put repo/format f put repo/README.txt g
$ chmod +w -R repo
$ rm -f repo/db/revs/0/1* repo/db/revprops/0/1*
$ echo 0 > repo/db/current
$ svnmucc -mm -U file://`pwd`/repo put repo/README.txt g
$ svnadmin verify repo
* Verifying metadata at revision 0 ...
* Verifying repository metadata ...
svnadmin: E165011: No representation found at offset 276 for item 4 in revision 1
svnadmin: E165011: Repository 'repo' failed to verify

Despite the verify error the new r1 is correct represented, it doesn't
use the erroneous row from the rep-cache, and "svn cat" will work.
However the commit also means that the erroneous row no longer refers to
a revision in the future and so a subsequent commit may use the row and
be corrupt.

The problem is that svn_fs_fs__get_rep_reference calls
svn_fs_fs__ensure_revision_exists which returns the error
SVN_ERR_FS_NO_SUCH_REVISION while the caller,
transaction.c:get_shared_rep, expects the error SVN_ERR_FS_CORRUPT.

We could change the error returned by svn_fs_fs__get_rep_reference or
the error expected by get_shared_rep.  Either of those would cause the
second commit above to fail.  However that doesn't really solve the
problem.  When the erroneous row is present we really need to stop any
commit that would make the erroneous row refer to a revision even if
that commit itself doesn't use the row:

$ svnadmin create repo --compatible-version 1.8
$ svnmucc -mm -U file://`pwd`/repo put repo/format f put repo/README.txt g
$ chmod +w -R repo
$ rm -f repo/db/revs/0/1* repo/db/revprops/0/1*
$ echo 0 > repo/db/current
$ svn -mm mkdir file://`pwd`/repo/X
$ svnmucc -mm -U file://`pwd`/repo put repo/README.txt g

Perhaps commit should always check for the highest revision in the
rep-cache and fail if it is greater than HEAD?  This might be considered
overkill to help people who have introduced corruption by editing the
repository.  However when such an erroneous row is present the failure
mode is nasty: a commit succeeds but is corrupt and content is lost.
The difficult bit here is that we do not have an SQLite index on
rep_cache.revision.

-- 
Philip

RE: FSFS rep-cache validation

Posted by Bert Huijben <be...@qqmail.nl>.


> -----Original Message-----
> From: Philip Martin [mailto:philip.martin@wandisco.com]
> Sent: donderdag 23 januari 2014 11:55
> To: Julian Foad
> Cc: Philip Martin; dev@subversion.apache.org
> Subject: Re: FSFS rep-cache validation
> 
> Julian Foad <ju...@btopenworld.com> writes:
> 
> > I get the problem. By "store its own 'head' revision" I meant store
> > the maximum value of any referenced revision number -- in other words,
> > simply a substitute for having an index and querying the maximum value
> > in the index. If we update this value correctly then it would serve
> > the same purpose as an index (but maybe faster or maybe not). Like
> > this:
> >
> > Before adding the rep cache entries for a (recently) committed revision rX:
> >
> >   if (max_referenced_rev >= X):
> >     raise error
> >     # caller should escalate the error or clean up the rep-cache
> >   max_referenced_rev = X
> >
> > Before/during looking up a rep cache entry, when repository head is rY:
> >
> >   if (max_referenced_rev >= Y):
> >     raise error
> >     # caller should escalate the error or clean up the rep-cache
> >
> > In a roll-back to revision Z:
> >
> >   delete where rep_cache.revision > Z
> >   max_referenced_rev = Z
> >
> > I suppose the risk involved in users failing to do the roll-back
> > correctly (in this case, failing to update max_referenced_rev) would
> > still be present which perhaps makes the index approach superior.
> 
> That might work but how do we stop old code updating the cache and
> failing to update max_referenced_rev?  We don't have any version number
> in the rep-cache schema so that is going to be ugly.  Bump the FS
> format?  Rename the rep_cache table?
> 
> With the index we could have the new code create any missing index when
> opening the rep-cache, there would be a one-time delay the first time
> new code was used on a big cache.  Old code would update the index but
> not do the validation check.  The validation check done by the new code
> would include any rows added by old code.

-0.5 on adding an Sqlite index on something that shouldn't be supported in the first place. 
Adding an index just places the performance penalty at all the users that never use it and should never use it, while I can't be bothered with a very slow commit (for a table scan) for those users that hand-edit their repository. Updating the index on every insert shouldn't be really expensive to maintain as revision numbers are continuously growing, but it is more disk anyway.

I would say these users should just delete the sqlite database (or update the sqlite file manually) in the case where they are changing the repository structure in unsupported ways.

So +1 on the suggestion of just removing the dead code.

	Bert
> 
> --
> Philip Martin | Subversion Committer
> WANdisco // *Non-Stop Data*

Re: FSFS rep-cache validation

Posted by Philip Martin <ph...@wandisco.com>.

Julian Foad <ju...@btopenworld.com> writes:

> I get the problem. By "store its own 'head' revision" I meant store
> the maximum value of any referenced revision number -- in other words,
> simply a substitute for having an index and querying the maximum value
> in the index. If we update this value correctly then it would serve
> the same purpose as an index (but maybe faster or maybe not). Like
> this:
>
> Before adding the rep cache entries for a (recently) committed revision rX:
>
>   if (max_referenced_rev >= X):
>     raise error
>     # caller should escalate the error or clean up the rep-cache
>   max_referenced_rev = X
>
> Before/during looking up a rep cache entry, when repository head is rY:
>
>   if (max_referenced_rev >= Y):
>     raise error
>     # caller should escalate the error or clean up the rep-cache
>
> In a roll-back to revision Z:
>
>   delete where rep_cache.revision > Z
>   max_referenced_rev = Z
>
> I suppose the risk involved in users failing to do the roll-back
> correctly (in this case, failing to update max_referenced_rev) would
> still be present which perhaps makes the index approach superior.

That might work but how do we stop old code updating the cache and
failing to update max_referenced_rev?  We don't have any version number
in the rep-cache schema so that is going to be ugly.  Bump the FS
format?  Rename the rep_cache table?

With the index we could have the new code create any missing index when
opening the rep-cache, there would be a one-time delay the first time
new code was used on a big cache.  Old code would update the index but
not do the validation check.  The validation check done by the new code
would include any rows added by old code.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: FSFS rep-cache validation

Posted by Julian Foad <ju...@btopenworld.com>.

Philip Martin wrote:

> Julian Foad <ju...@btopenworld.com> writes:
>>  Ugh -- the doc string says if REJECT_DUP is TRUE, "return an error if
>>  there is an existing match for REP->CHECKSUM" but the implementation
>>  does something else: return an error if an existing match points to a
>>  different rep (that is, a different rev/index/size), but return
>>  successfully when an existing match points to the same rep.
>> 
>>  The caller says:
>> 
>>        /* FALSE because we don't care if another parallel commit happened to
>>         * collide with us.  (Non-parallel collisions will not be detected.) */
>>        SVN_ERR(svn_fs_fs__set_rep_reference(fs, rep, FALSE, scratch_pool));
>> 
>>  I'm not sure that comment makes sense either -- maybe I am just not
>>  reading it the right way.
> 
> I'm not really sure what that code is supposed to do, but since it is
> not used I'm inclined to delete it.
> 
>>>  Perhaps commit should always check for the highest revision in the
>>>  rep-cache and fail if it is greater than HEAD?  This might be considered
>>>  overkill to help people who have introduced corruption by editing the
>>>  repository.  However when such an erroneous row is present the failure
>>>  mode is nasty: a commit succeeds but is corrupt and content is lost.
>>>  The difficult bit here is that we do not have an SQLite index on
>>>  rep_cache.revision.
>> 
>>  A minimal way to make the rep cache robust against roll-back might be
>>  to store its own "head" revision in it (in a separate table I
>>  suppose), and check this against the repository head revision before
>>  looking up any reference in it and before adding reps for a new
>>  revision. I don't know if that would be more or less efficient than
>>  using an index on rep_cache.revision.
> 
> I don't think that works.  The problem is an erroneous row that refers
> to a revision beyond HEAD.  We already have code that stops such a row
> being used for commits, but not using the row is not enough.  The reason
> that recover removes rows beyond HEAD is that it is not safe to make any
> commits when such rows are present.  Allowing commits eventually makes
> the row refer to revisons that are present and then using the rows leads
> to data loss.

I get the problem. By "store its own 'head' revision" I meant store the maximum value of any referenced revision number -- in other words, simply a substitute for having an index and querying the maximum value in the index. If we update this value correctly then it would serve the same purpose as an index (but maybe faster or maybe not). Like this:

Before adding the rep cache entries for a (recently) committed revision rX:

  if (max_referenced_rev >= X):
    raise error
    # caller should escalate the error or clean up the rep-cache
  max_referenced_rev = X

Before/during looking up a rep cache entry, when repository head is rY:

  if (max_referenced_rev >= Y):
    raise error
    # caller should escalate the error or clean up the rep-cache

In a roll-back to revision Z:

  delete where rep_cache.revision > Z
  max_referenced_rev = Z

I suppose the risk involved in users failing to do the roll-back correctly (in this case, failing to update max_referenced_rev) would still be present which perhaps makes the index approach superior.

- Julian

Re: FSFS rep-cache validation

Posted by Philip Martin <ph...@wandisco.com>.

Julian Foad <ju...@btopenworld.com> writes:

> Ugh -- the doc string says if REJECT_DUP is TRUE, "return an error if
> there is an existing match for REP->CHECKSUM" but the implementation
> does something else: return an error if an existing match points to a
> different rep (that is, a different rev/index/size), but return
> successfully when an existing match points to the same rep.
>
> The caller says:
>
>       /* FALSE because we don't care if another parallel commit happened to
>        * collide with us.  (Non-parallel collisions will not be detected.) */
>       SVN_ERR(svn_fs_fs__set_rep_reference(fs, rep, FALSE, scratch_pool));
>
> I'm not sure that comment makes sense either -- maybe I am just not
> reading it the right way.

I'm not really sure what that code is supposed to do, but since it is
not used I'm inclined to delete it.

>> Perhaps commit should always check for the highest revision in the
>> rep-cache and fail if it is greater than HEAD?  This might be considered
>> overkill to help people who have introduced corruption by editing the
>> repository.  However when such an erroneous row is present the failure
>> mode is nasty: a commit succeeds but is corrupt and content is lost.
>> The difficult bit here is that we do not have an SQLite index on
>> rep_cache.revision.
>
> A minimal way to make the rep cache robust against roll-back might be
> to store its own "head" revision in it (in a separate table I
> suppose), and check this against the repository head revision before
> looking up any reference in it and before adding reps for a new
> revision. I don't know if that would be more or less efficient than
> using an index on rep_cache.revision.

I don't think that works.  The problem is an erroneous row that refers
to a revision beyond HEAD.  We already have code that stops such a row
being used for commits, but not using the row is not enough.  The reason
that recover removes rows beyond HEAD is that it is not safe to make any
commits when such rows are present.  Allowing commits eventually makes
the row refer to revisons that are present and then using the rows leads
to data loss.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: FSFS rep-cache validation

Posted by Julian Foad <ju...@btopenworld.com>.

Philip Martin wrote:

> On IRC today we were discussing the validation code in
> svn_fs_fs__set_rep_reference.  When inserting a row into the rep-cache
> the insert can fail because the row already exists.  The usual way this
> would happen is that two parallel commits add content with the
> same checksum at different locations, e.g.:
> 
> svn import some_file file://`pwd`/repo/f1
> svn import some_file file://`pwd`/repo/f2
> 
> That's harmless and nothing goes wrong, one of the commits populates the
> rep-cache.
> 
> Another problem is that the only caller of svn_fs_fs__set_rep_reference
> always passes reject_dup=FALSE so there is code that is never
> exercised.  Perhaps we should remove it?

Ugh -- the doc string says if REJECT_DUP is TRUE, "return an error if there is an existing match for REP->CHECKSUM" but the implementation does something else: return an error if an existing match points to a different rep (that is, a different rev/index/size), but return successfully when an existing match points to the same rep.

The caller says:

      /* FALSE because we don't care if another parallel commit happened to
       * collide with us.  (Non-parallel collisions will not be detected.) */
      SVN_ERR(svn_fs_fs__set_rep_reference(fs, rep, FALSE, scratch_pool));

I'm not sure that comment makes sense either -- maybe I am just not reading it the right way.

> A more serious problem is if somebody edits a repository and ends up
> with a rep-cache that has references to revisions that do not exist.
> This could happen if somebody removes revision files and edits 'current'
> but doesn't run 'recover'.  Or if somebody uses an old backup with a
> newer rep-cache.  We have attempted to catch this by having
> svn_fs_fs__get_rep_reference return an error if the rep-cache refers to
> future revisions. However the code isn't working:
> 
> $ svnadmin create repo --compatible-version 1.8
> $ svnmucc -mm -U file://`pwd`/repo put repo/format f put repo/README.txt g
> $ chmod +w -R repo
> $ rm -f repo/db/revs/0/1* repo/db/revprops/0/1*
> $ echo 0 > repo/db/current
> $ svnmucc -mm -U file://`pwd`/repo put repo/README.txt g
> $ svnadmin verify repo
> * Verifying metadata at revision 0 ...
> * Verifying repository metadata ...
> svnadmin: E165011: No representation found at offset 276 for item 4 in revision 
> 1
> svnadmin: E165011: Repository 'repo' failed to verify
> 
> Despite the verify error the new r1 is correct represented, it doesn't
> use the erroneous row from the rep-cache, and "svn cat" will work.
> However the commit also means that the erroneous row no longer refers to
> a revision in the future and so a subsequent commit may use the row and
> be corrupt.
> 
> The problem is that svn_fs_fs__get_rep_reference calls
> svn_fs_fs__ensure_revision_exists which returns the error
> SVN_ERR_FS_NO_SUCH_REVISION while the caller,
> transaction.c:get_shared_rep, expects the error SVN_ERR_FS_CORRUPT.
> 
> We could change the error returned by svn_fs_fs__get_rep_reference or
> the error expected by get_shared_rep.  Either of those would cause the
> second commit above to fail.  However that doesn't really solve the
> problem.  When the erroneous row is present we really need to stop any
> commit that would make the erroneous row refer to a revision even if
> that commit itself doesn't use the row:
> 
> $ svnadmin create repo --compatible-version 1.8
> $ svnmucc -mm -U file://`pwd`/repo put repo/format f put repo/README.txt g
> $ chmod +w -R repo
> $ rm -f repo/db/revs/0/1* repo/db/revprops/0/1*
> $ echo 0 > repo/db/current
> $ svn -mm mkdir file://`pwd`/repo/X
> $ svnmucc -mm -U file://`pwd`/repo put repo/README.txt g
> 
> Perhaps commit should always check for the highest revision in the
> rep-cache and fail if it is greater than HEAD?  This might be considered
> overkill to help people who have introduced corruption by editing the
> repository.  However when such an erroneous row is present the failure
> mode is nasty: a commit succeeds but is corrupt and content is lost.
> The difficult bit here is that we do not have an SQLite index on
> rep_cache.revision.

My take on this would be...

In an ideal world, we might like to say that the entire repository is a black box and should only be touched by the correct tools, and then it works as intended.

In the real world, we don't provide sufficient tools for all the situations that occur and people do touch the insides. Therefore we should try to design the insides as nicely behaved components -- separable, editable, robust.

(We don't even provide a "roll back to revision X" tool. You hint above that editing 'current' and then running 'svnadmin recover' achieves this. Perhaps we should add an explicit command.)

A minimal way to make the rep cache robust against roll-back might be to store its own "head" revision in it (in a separate table I suppose), and check this against the repository head revision before looking up any reference in it and before adding reps for a new revision. I don't know if that would be more or less efficient than using an index on rep_cache.revision.

- Julian

AW: FSFS rep-cache validation

Posted by Markus Schaber <m....@codesys.com>.

Hi,

Von: Bert Huijben [mailto:bert@qqmail.nl]
> From: Markus Schaber [mailto:m.schaber@codesys.com]
> > Von: Philip Martin [mailto:philip.martin@wandisco.com]
> > > Philip Martin <ph...@wandisco.com> writes:
> > > > Stefan Fuhrmann <st...@wandisco.com> writes:
> > > >> But we do write the database strictly in revision order.
> > > >> So, if we could simply read the latest record once upon opening
> > > >> the DB. If that refers to a future revision, read "current" and compare.
> > > >> If the DB it still "in the future", abort txn, i.e. prevent any
> > > >> future commit until rep-cache.db gets deleted by the admin.
> > > >
> > > > That might work.  The rep-cache for a new revision is not written
> > > > strictly in revision order but is written after updating HEAD.  So
> > > > such a check would not be as strong as "highest revision" but
> > > > would be a useful extra check if we can implement it efficiently
> > > > it without a table scan.  Is sqlite3_last_insert_rowid() the function we want?
> > >
> > > Bert pointed out that is the wrong function and there isn't really a suitable
> > > function.  So to do this check we need something like Julian's
> > > suggestion: a one row table containing the most recent revision added
> > > to the rep-cache that gets updated with each write.  It doesn't
> > > catch every possible failure but it does catch some.
> >
> > To increase the backwards compatibility: Could this row be updated by
> > a trigger?
> 
> Keeping backwards compatibility here would require a time machine :-)
> 
> We would have to change the database schema, which requires a format bump...
> (not just for the trigger; also for the new table)

But AFAICS, adding a new table which is maintained by a trigger is backwards compatible, existing (old) code continues to work fine.

When the table is maintained explicitly by the SVN code, old svn code won't keep the value, so it will be out of date often, increasing the risk of a "miss".

> And a trigger would perform this for any file update in any revision; not
> just once per revision.
> 
> If we just update it after writing all revisions (and right before the
> existing sqlite transaction is committed) there should only be a single db
> page update, so it should only make sqlite a very tiny bit slower. With a
> trigger the intermediate state in the sqlite log would grow more than a bit
> during the transaction.

I'm not sure whether the performance difference is that big. Benchmark anyone? :-)

Best regards

Markus Schaber

CODESYS(r) a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.schaber@codesys.com | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915

RE: FSFS rep-cache validation

Posted by Bert Huijben <be...@qqmail.nl>.


> -----Original Message-----
> From: Markus Schaber [mailto:m.schaber@codesys.com]
> Sent: maandag 27 januari 2014 13:36
> To: Subversion Development
> Subject: AW: FSFS rep-cache validation
> 
> Hi,
> 
> Von: Philip Martin [mailto:philip.martin@wandisco.com]
> > Philip Martin <ph...@wandisco.com> writes:
> >
> > > Stefan Fuhrmann <st...@wandisco.com> writes:
> > >
> > >> But we do write the database strictly in revision order.
> > >> So, if we could simply read the latest record once upon opening the
> > >> DB. If that refers to a future revision, read "current" and compare.
> > >> If the DB it still "in the future", abort txn, i.e. prevent any
> > >> future commit until rep-cache.db gets deleted by the admin.
> > >
> > > That might work.  The rep-cache for a new revision is not written
> > > strictly in revision order but is written after updating HEAD.  So
> > > such a check would not be as strong as "highest revision" but would be
> > > a useful extra check if we can implement it efficiently it without a
> > > table scan.  Is sqlite3_last_insert_rowid() the function we want?
> >
> > Bert pointed out that is the wrong function and there isn't really a
suitable
> > function.  So to do this check we need something like
> > Julian's suggestion: a one row table containing the most recent revision
> added
> > to the rep-cache that gets updated with each write.  It doesn't catch
every
> > possible failure but it does catch some.
> 
> To increase the backwards compatibility: Could this row be updated by a
> trigger?

Keeping backwards compatibility here would require a time machine :-)

We would have to change the database schema, which requires a format bump...
(not just for the trigger; also for the new table)

And a trigger would perform this for any file update in any revision; not
just once per revision.

If we just update it after writing all revisions (and right before the
existing sqlite transaction is committed) there should only be a single db
page update, so it should only make sqlite a very tiny bit slower. With a
trigger the intermediate state in the sqlite log would grow more than a bit
during the transaction.

	Bert

AW: FSFS rep-cache validation

Posted by Markus Schaber <m....@codesys.com>.

Hi,

Von: Philip Martin [mailto:philip.martin@wandisco.com]
> Philip Martin <ph...@wandisco.com> writes:
> 
> > Stefan Fuhrmann <st...@wandisco.com> writes:
> >
> >> But we do write the database strictly in revision order.
> >> So, if we could simply read the latest record once upon opening the
> >> DB. If that refers to a future revision, read "current" and compare.
> >> If the DB it still "in the future", abort txn, i.e. prevent any
> >> future commit until rep-cache.db gets deleted by the admin.
> >
> > That might work.  The rep-cache for a new revision is not written
> > strictly in revision order but is written after updating HEAD.  So
> > such a check would not be as strong as "highest revision" but would be
> > a useful extra check if we can implement it efficiently it without a
> > table scan.  Is sqlite3_last_insert_rowid() the function we want?
> 
> Bert pointed out that is the wrong function and there isn't really a suitable
> function.  So to do this check we need something like
> Julian's suggestion: a one row table containing the most recent revision added
> to the rep-cache that gets updated with each write.  It doesn't catch every
> possible failure but it does catch some.

To increase the backwards compatibility: Could this row be updated by a trigger?


Best regards

Markus Schaber

CODESYS(r) a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.schaber@codesys.com | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915

Re: FSFS rep-cache validation

Posted by Philip Martin <ph...@wandisco.com>.

Philip Martin <ph...@wandisco.com> writes:

> Stefan Fuhrmann <st...@wandisco.com> writes:
>
>> But we do write the database strictly in revision order.
>> So, if we could simply read the latest record once upon
>> opening the DB. If that refers to a future revision, read
>> "current" and compare. If the DB it still "in the future",
>> abort txn, i.e. prevent any future commit until rep-cache.db
>> gets deleted by the admin.
>
> That might work.  The rep-cache for a new revision is not written
> strictly in revision order but is written after updating HEAD.  So such
> a check would not be as strong as "highest revision" but would be a
> useful extra check if we can implement it efficiently it without a table
> scan.  Is sqlite3_last_insert_rowid() the function we want?

Bert pointed out that is the wrong function and there isn't really a
suitable function.  So to do this check we need something like
Julian'ssuggestion: a one row table containing the most recent revision
added to the rep-cache that gets updated with each write.  It doesn't
catch every possible failure but it does catch some.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: FSFS rep-cache validation

Posted by Philip Martin <ph...@wandisco.com>.

Stefan Fuhrmann <st...@wandisco.com> writes:

> But we do write the database strictly in revision order.
> So, if we could simply read the latest record once upon
> opening the DB. If that refers to a future revision, read
> "current" and compare. If the DB it still "in the future",
> abort txn, i.e. prevent any future commit until rep-cache.db
> gets deleted by the admin.

That might work.  The rep-cache for a new revision is not written
strictly in revision order but is written after updating HEAD.  So such
a check would not be as strong as "highest revision" but would be a
useful extra check if we can implement it efficiently it without a table
scan.  Is sqlite3_last_insert_rowid() the function we want?

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: FSFS rep-cache validation

Posted by Stefan Fuhrmann <st...@wandisco.com>.

On Wed, Jan 22, 2014 at 10:19 PM, Philip Martin <ph...@codematters.co.uk>wrote:

> On IRC today we were discussing the validation code in
> svn_fs_fs__set_rep_reference.  When inserting a row into the rep-cache
> the insert can fail because the row already exists.  The usual way this
> would happen is that two parallel commits add content with the
> same checksum at different locations, e.g.:
>
> svn import some_file file://`pwd`/repo/f1
> svn import some_file file://`pwd`/repo/f2
>
> That's harmless and nothing goes wrong, one of the commits populates the
> rep-cache.
>
> Another problem is that the only caller of svn_fs_fs__set_rep_reference
> always passes reject_dup=FALSE so there is code that is never
> exercised.  Perhaps we should remove it?
>
> A more serious problem is if somebody edits a repository and ends up
> with a rep-cache that has references to revisions that do not exist.
> This could happen if somebody removes revision files and edits 'current'
> but doesn't run 'recover'.  Or if somebody uses an old backup with a
> newer rep-cache.  We have attempted to catch this by having
> svn_fs_fs__get_rep_reference return an error if the rep-cache refers to
> future revisions. However the code isn't working:
>
> $ svnadmin create repo --compatible-version 1.8
> $ svnmucc -mm -U file://`pwd`/repo put repo/format f put repo/README.txt g
> $ chmod +w -R repo
> $ rm -f repo/db/revs/0/1* repo/db/revprops/0/1*
> $ echo 0 > repo/db/current
> $ svnmucc -mm -U file://`pwd`/repo put repo/README.txt g
> $ svnadmin verify repo
> * Verifying metadata at revision 0 ...
> * Verifying repository metadata ...
> svnadmin: E165011: No representation found at offset 276 for item 4 in
> revision 1
> svnadmin: E165011: Repository 'repo' failed to verify
>
> Despite the verify error the new r1 is correct represented, it doesn't
> use the erroneous row from the rep-cache, and "svn cat" will work.
> However the commit also means that the erroneous row no longer refers to
> a revision in the future and so a subsequent commit may use the row and
> be corrupt.
>
> The problem is that svn_fs_fs__get_rep_reference calls
> svn_fs_fs__ensure_revision_exists which returns the error
> SVN_ERR_FS_NO_SUCH_REVISION while the caller,
> transaction.c:get_shared_rep, expects the error SVN_ERR_FS_CORRUPT.
>
> We could change the error returned by svn_fs_fs__get_rep_reference or
> the error expected by get_shared_rep.  Either of those would cause the
> second commit above to fail.  However that doesn't really solve the
> problem.  When the erroneous row is present we really need to stop any
> commit that would make the erroneous row refer to a revision even if
> that commit itself doesn't use the row:
>
> $ svnadmin create repo --compatible-version 1.8
> $ svnmucc -mm -U file://`pwd`/repo put repo/format f put repo/README.txt g
> $ chmod +w -R repo
> $ rm -f repo/db/revs/0/1* repo/db/revprops/0/1*
> $ echo 0 > repo/db/current
> $ svn -mm mkdir file://`pwd`/repo/X
> $ svnmucc -mm -U file://`pwd`/repo put repo/README.txt g
>
> Perhaps commit should always check for the highest revision in the
> rep-cache and fail if it is greater than HEAD?  This might be considered
> overkill to help people who have introduced corruption by editing the
> repository.  However when such an erroneous row is present the failure
> mode is nasty: a commit succeeds but is corrupt and content is lost.
> The difficult bit here is that we do not have an SQLite index on
> rep_cache.revision.
>

But we do write the database strictly in revision order.
So, if we could simply read the latest record once upon
opening the DB. If that refers to a future revision, read
"current" and compare. If the DB it still "in the future",
abort txn, i.e. prevent any future commit until rep-cache.db
gets deleted by the admin.

-- Stefan^2.