You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@apache.org> on 2020/01/10 10:27:43 UTC

Re: [#4843] svn wc verify -- pristine files consistency check, and possibly repair

Filed as http://subversion.apache.org/issue/4843 .

We could start by defining more precisely what it needs to do.

Some aims, in order from highest priority:
   * check if any pristine file's content is corrupted (according to its 
filename hash)
     - report and rename/delete corrupted pristines
   * check if any pristines are missing (according to wc.db)
   * fetch missing (or corrupted) pristines from the repository
   * verify wc.db 'pristine' table entries against other tables

Checking for content corruption by recalculating the checksums is going 
to be slow -- there is no getting away from that -- so this most 
important check is probably going to be the last one we run, and we may 
choose to make it optional.  That's fine.

We could check quickly:
   * for each pristine file listed in the DB:
     - file is present
     - file size matches the DB
     - file mod-time matches the DB

The existing 'cleanup' implementation contains a function 
'pristine_cleanup_wcroot' which has in its doc string:

[[[
   TODO: Ideas for possible extra clean-up operations:

   * Check and correct all the refcounts.  Identify any rows missing
     from the 'pristine' table.  [...]

   * Check the checksums.  (Very expensive to check them all, so find
     a way to not check them all.)

   * Check for pristine files missing from disk but referenced in the
     'pristine' table.

   * Repair any pristine files missing from disk and/or rows missing
     from the 'pristine' table and/or bad checksums.  Generally
     requires contacting the server, so requires support at a higher
     level than this function.

   * Identify any pristine text files on disk that are not referenced
     in the DB, and delete them.
]]]

The refcounts are references within the DB from nodes to the 'pristines' 
table.  They are enforced by SQLite with 'REFERENCES' clauses in the 
schema, though I saw one comment somewhere saying this was "in debug 
builds" so we might want to double-check.

I am not aware of problems in the consistency of the DB tables, so I 
don't think checking that is a priority.  Though I don't have hard 
evidence, from problems reported over the years I think corrupted and 
missing pristine files on disk is the main concern.

- Julian


Re: [#4843] svn wc verify -- pristine files consistency check, and possibly repair

Posted by Johan Corveleyn <jc...@gmail.com>.
On Fri, Jan 10, 2020 at 11:27 AM Julian Foad <ju...@apache.org> wrote:
>
> Filed as http://subversion.apache.org/issue/4843 .
>
> We could start by defining more precisely what it needs to do.
>
> Some aims, in order from highest priority:
>    * check if any pristine file's content is corrupted (according to its
> filename hash)
>      - report and rename/delete corrupted pristines
>    * check if any pristines are missing (according to wc.db)
>    * fetch missing (or corrupted) pristines from the repository
>    * verify wc.db 'pristine' table entries against other tables
>
> Checking for content corruption by recalculating the checksums is going
> to be slow -- there is no getting away from that -- so this most
> important check is probably going to be the last one we run, and we may
> choose to make it optional.  That's fine.
>
> We could check quickly:
>    * for each pristine file listed in the DB:
>      - file is present
>      - file size matches the DB
>      - file mod-time matches the DB
>
> The existing 'cleanup' implementation contains a function
> 'pristine_cleanup_wcroot' which has in its doc string:
>
> [[[
>    TODO: Ideas for possible extra clean-up operations:
>
>    * Check and correct all the refcounts.  Identify any rows missing
>      from the 'pristine' table.  [...]
>
>    * Check the checksums.  (Very expensive to check them all, so find
>      a way to not check them all.)
>
>    * Check for pristine files missing from disk but referenced in the
>      'pristine' table.
>
>    * Repair any pristine files missing from disk and/or rows missing
>      from the 'pristine' table and/or bad checksums.  Generally
>      requires contacting the server, so requires support at a higher
>      level than this function.
>
>    * Identify any pristine text files on disk that are not referenced
>      in the DB, and delete them.
> ]]]
>
> The refcounts are references within the DB from nodes to the 'pristines'
> table.  They are enforced by SQLite with 'REFERENCES' clauses in the
> schema, though I saw one comment somewhere saying this was "in debug
> builds" so we might want to double-check.
>
> I am not aware of problems in the consistency of the DB tables, so I
> don't think checking that is a priority.  Though I don't have hard
> evidence, from problems reported over the years I think corrupted and
> missing pristine files on disk is the main concern.

Ah, how time flies ... I first started reporting / asking about
missing and corrupted pristines shortly after WC-NG was released in
1.7, because at that point you couldn't "fix" a broken WC anymore by
deleting the affected subdir. It's happened rarely in recent years,
but back then I've had to fix such corruptions regularly for
colleagues that ran into them. I just wanted to say: big +1 for taking
this on.

FWIW, some threads for historians:

https://svn.haxx.se/dev/archive-2011-07/0001.shtml ("wc-ng and
recoverability of corrupt wc's")
Before the final release of wc-ng I started asking about this problem
-- Philip and Daniel then already suggested two potential ways for
manual fixing: using "svn up -r0 $corrupted" and using "svn up
--set-depth exclude $corrupted".

https://svn.haxx.se/users/archive-2012-03/0458.shtml ("svn 1.7: how to
recover from a lost pristine file")
Some real-world experience. The wc was also stuck with a non-empty
work_queue ("E155037: Previous operation has not finished; run
'cleanup' if it was interrupted" -- where cleanup reports the same
error). Was fixed by first "manually emptying the work_queue in
wc.db", after which "svn up --set-depth exclude" fixed it.

https://svn.haxx.se/dev/archive-2012-06/0185.shtml: some further
feedback to devs after we discussed this issue a bit during the 2012
Hackathon. It also mentions another "fix" strategy, because the other
two didn't work: "in wc.db, set the presence to "not-present" in
NODES, and run 'svn update'".

https://svn.haxx.se/dev/archive-2012-09/0304.shtml ("SmartSVN (by
Wandisco) - repair working copy feature")
The (commercial) product SmartSVN had a feature listed as "Guided
fixing of rare working copy problems" in their professional version
(still listed on their https://www.smartsvn.com/compare-editions/
page). It has saved me and my colleagues a couple of times back then.
It did things like correcting checksums, refcounts, recovering missing
/ corrupt pristines, ...

https://svn.haxx.se/dev/archive-2013-04/0426.shtml ("Pristine text
missing - cleanup doesn't work")
Some more real-world feedback by Julian :-).


Obviously it would be much better if svn could detect and fix such
corruptions itself, instead of having to jump through (potentially
dangerous) hoops :-).

-- 
Johan