You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by ju...@apache.org on 2011/02/17 22:20:33 UTC

svn commit: r1071795 - /subversion/trunk/notes/wc-ng/pristine-store

Author: julianfoad
Date: Thu Feb 17 21:20:33 2011
New Revision: 1071795

URL: http://svn.apache.org/viewvc?rev=1071795&view=rev
Log:
* notes/wc-ng/pristine-store
  Update with initial feedback from danielsh.

Modified:
    subversion/trunk/notes/wc-ng/pristine-store

Modified: subversion/trunk/notes/wc-ng/pristine-store
URL: http://svn.apache.org/viewvc/subversion/trunk/notes/wc-ng/pristine-store?rev=1071795&r1=1071794&r2=1071795&view=diff
==============================================================================
--- subversion/trunk/notes/wc-ng/pristine-store (original)
+++ subversion/trunk/notes/wc-ng/pristine-store Thu Feb 17 21:20:33 2011
@@ -1,22 +1,36 @@
+This spec defines how the Pristine Store works (part A) and how the WC
+uses it (part B).
+
+
 A. THE PRISTINE STORE
 =====================
 
 === A-1. Introduction ===
 
-The Pristine Store is the part of the Working Copy metadata that holds
-a local copy of the full text of the base version of each WC file.
-
-Texts in the Pristine Store are addressed only by their SHA-1 checksum.
-The Pristine Store does not track which text relates to which repository
-and revision and path.  The Pristine Store does not hold pristine copies
-of directories, nor of properties.
+The Pristine Store is inherently just a blob store.  Texts in the Pristine
+Store are addressed by their SHA-1 checksum.
 
 The Pristine Store data is held in
  * the 'PRISTINE' table in the SQLite Data Base (SDB), and
  * the files in the 'pristine' directory.
 
-This specification uses SDB transactions to ensure the consistency of
-writes and reads.
+Currently the texts are stored verbatim; in future they could be stored
+compressed.
+
+The Working Copy library uses the Pristine Store to hold a local copy of
+the "base" or "pristine" version of each file.  The WC library uses it
+only for the content of files, not for directory listings nor symbolic
+links nor properties.  This usage could change in future.
+
+The Pristine Store itself does not track which text relates to which
+repository and revision and path; that information is stored in the NODES
+table and managed by a higher layer of logic within libsvn_wc.
+
+
+This specification defines how the store operates so as to ensure
+  * consistency between disk and DB;
+  * atomicity of, and arbitration between, add and delete and read
+    operations.
 
 ==== A-2. Invariants ====
 
@@ -25,8 +39,16 @@ These invariants apply at all times exce
 below.
 
 (a) Each row in the PRISTINE table has an associated pristine text file
-  that is not open for writing and is available for reading and whose
-  content matches the columns 'size', 'checksum', 'md5_checksum'.
+    that is not open for writing and is available for reading and whose
+    content matches the columns 'size', 'checksum', 'md5_checksum'.
+
+(b) Once written, a pristine text file in the store never changes.
+
+Note that although there is a file matching each row, there is not
+necessarily a row matching each file that exists in the 'pristine'
+files directory.  If Subversion crashes while adding or removing a
+pristine text, it can leave such a file, which is known as an "orphan"
+file.
 
 ==== A-3. Operating Procedures ====
 
@@ -49,21 +71,27 @@ rationale.)
        consistency error but, in a release build, return success.)
 
 (c) To query a pristine's existence or SDB metadata, the reader must:
-    1. Ensure no pristine-remove txn (as defined just above) is in
-       progress while querying it.  The obvious way to ensure it's not in
-       progress would be to do this query in an SDB txn with a suitable
-       lock (and I think the default lock type that you get in a txn would
-       be sufficient).
+    1. Simply query the 'PRISTINE' table row.  If that row exists, the
+       pristine text is in the store and its metadata is in the row; if
+       not, the pristine text is not currently in the store.
 
 (d) To read a pristine text, the reader must:
-    1. Ensure no pristine-remove txn is in progress while querying and
-       opening it.
+    1. Query the SDB and open the file within the same SDB txn (to ensure
+       that no pristine-remove txn (A-3(b)) is in progress at the same
+       time).
     2. Ensure the pristine text remains in the store continuously from
        opening it for the duration of the read. (Perhaps by ensuring
        refcount remains >= 1 and/or by cooperating with the clean-up
        code.  Under spec B, the clean-up code is controlled by rule
        B-3(c), so holding any WC lock would prevent a pristine from being
-       deleted.)
+       deleted.  An alternative is to use the operating system's ability
+       to keep the file available for reading as long as the file handle
+       is open, even if the file's directory entry is removed.)
+
+(e) To clean up "orphan" pristine files:
+    1. 
+
+###?
 
 ==== A-4. Rationale ====
 
@@ -74,8 +102,7 @@ rationale.)
      * Within the txn, the table row could be added after creating the
        file; it makes no difference as it will not become externally
        visible until commit.  But then we would have to take out a lock
-       explicitly before adding the file.  Adding the row takes out a
-       lock implicitly, so doing it first avoids an extra step.
+       explicitly before adding the file: see rationale (c).
      * Leaving an existing file in place is less likely to interfere with
        processes that are currently reading from the file.  Replacing it
        might also be acceptable, but that would need further
@@ -85,26 +112,28 @@ rationale.)
      * We can't remove the file *after* the SDB txn that updates the
        table, because that would leave a gap in which another process
        might re-add this same pristine file and then we would delete it.
-     * Within the txn, the table row could be removed after creating the
-       file, but see the rationale for adding a pristine.
+     * Within the txn, the table row could be removed after removing the
+       file; it makes no difference as it will not become externally
+       visible until commit.  But then we would have to take out a lock
+       explicitly before removing the file: see rationale (c).
      * In a typical use case for removing a pristine text, the caller
        would check the refcount before starting this txn, but
        nevertheless it may have changed and so must be checked again
        inside the txn.
 
-(c) In the add and remove txns, we need to acquire an SDB 'RESERVED'
-  lock before adding or removing the file.  This can be done by starting
-  the txn with 'BEGIN IMMEDIATE' and/or by performing an SDB write (such
-  as the table row update).  ### Would a 'SHARED' lock be sufficient,
-  and if so would it be noticably better?
+(c) In both the 'add' (a) and 'remove' (b) txns, we need to acquire a lock
+    that blocks both readers and writers (an SQLite 'RESERVED' lock)
+    before adding or removing the file on disk.  We could aquire this
+    explicitly (e.g. by starting the txn with 'BEGIN IMMEDIATE');
+    alternatively SQLite will upgrade the default 'SHARED' lock to
+    'RESERVED' the first time we write to a table.
 
 ==== A-5. Notes ====
 
 (a) This procedure can leave orphaned pristine files (files without a
     corresponding SDB row) if Subvsersion crashes.  The Pristine Store
-    will still operate correctly.  It should be easy to teach "svn
-    cleanup" to safely delete these.  ### Do we need to define the
-    clean-up procedure here?
+    will still operate correctly.  We should ensure that "svn
+    cleanup" deletes these.
 
 (b) This specification is conceptually simple, but requires completing disk
     operations within SDB transactions, which may make it too inefficient
@@ -130,9 +159,10 @@ addition and removal of the pristine tex
 
 One requirement is to allow a pristine text to be stored some
 time before the reference to it is written into the NODES table.  The
-'commit' code path, for example, needs to store a file's new pristine
-text somewhere (and the pristine store is an obvious option) and then,
-when the commit succeeds, update the WC to reference it.
+'commit' operation, for example, the way it is implemented in Subversion,
+needs to store a file's new pristine text somewhere (and the pristine
+store is an obvious option) and then, when the commit succeeds, update the
+WC to reference it.
 
 Store-then-reference could be achieved by:
 
@@ -157,7 +187,7 @@ equal to the number of in-SDB references
 txns.  It requires an interlock between adding/deleting references and
 purging unreferenced pristines - e.g. guard each of these operations by
 a WC lock.
-  * Add a pristine & reference it => any WC lock
+  * Add a pristine, then later reference it => need to hold a WC lock.
     (To prevent purging it while adding.)
   * Unreference a pristine => no lock needed.
   * Unreference a pristine & purge-if-0 => Same as doing these separately.
@@ -179,6 +209,8 @@ We choose method (b).
 
 === B-2. Invariants in a Valid WC DB State ===
 
+### TODO: This section needs work - it is not accurate.
+
   (a) No pristine text, even if refcount == 0, will be deleted from the
       store as long as any process holds any WC lock in this WC.
 
@@ -197,7 +229,8 @@ The following conditions are always true
 The following conditions are always true
     outside of a SQL txn,
     when the Work Queue is empty:
-    (### ?) when no WC locks are held by any process:
+### [JAF] What's this about the Work Queue here? Not sure that's intended.
+    when no WC locks are held by any process:
 
   (d) The 'refcount' column in a PRISTINE table row equals the number of
       NODES table rows whose 'checksum' column references that pristine
@@ -214,8 +247,8 @@ The numbered steps should be carried out
 
 (a) To add a pristine text reference to the WC, obtain the text and its
     checksum, and then do this while holding a WC lock:
-    (1) Add the pristine text to the Pristine Store, setting the desired
-        refcount >= 1.
+    (1) Add the pristine text to the Pristine Store (procedure A-3(a)),
+        setting the desired refcount >= 1.
     (2) Add the reference(s) in the NODES table.
 
 (b) To remove a pristine text reference from the WC, do this while holding
@@ -223,7 +256,19 @@ The numbered steps should be carried out
     (1) Remove the reference(s) in the NODES table.
     (2) Decrement the pristine text's 'refcount' column.
 
-(c) To purge an unreferenced pristine text, do this with an *exclusive*
-    WC lock:
+(c) To purge an unreferenced pristine text, do this with an exclusive
+    WC lock (see note (a)):
     (1) Check refcount == 0; skip if not.
-    (2) Remove it from the pristine store.
+    (2) Remove it from the pristine store (procedure A-3(b)).
+
+==== B-4. Notes ====
+
+(a) An exclusive WC lock is obtained by acquiring a recursive lock on the
+    WC root.
+
+(b) Invariant B-2(b) is enforced by constraints defined in
+    wc-metadata.sql.
+
+(c) Invariant B-2(c) is currently assisted by triggers defined in
+    wc-metadata.sql, but not enforced.
+



Re: svn commit: r1071795 - /subversion/trunk/notes/wc-ng/pristine-store

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Wed, Feb 23, 2011 at 16:02:49 +0200:
> julianfoad@apache.org wrote on Thu, Feb 17, 2011 at 21:20:33 -0000:
> > Author: julianfoad
> > Date: Thu Feb 17 21:20:33 2011
> > New Revision: 1071795
> > 
> > URL: http://svn.apache.org/viewvc?rev=1071795&view=rev
> > Log:
> > * notes/wc-ng/pristine-store
> >   Update with initial feedback from danielsh.
> > 
> > Modified:
> >     subversion/trunk/notes/wc-ng/pristine-store
> > 
> > Modified: subversion/trunk/notes/wc-ng/pristine-store
> > URL: http://svn.apache.org/viewvc/subversion/trunk/notes/wc-ng/pristine-store?rev=1071795&r1=1071794&r2=1071795&view=diff
> > ==============================================================================
> > --- subversion/trunk/notes/wc-ng/pristine-store (original)
> > +++ subversion/trunk/notes/wc-ng/pristine-store Thu Feb 17 21:20:33 2011
> > @@ -1,22 +1,36 @@
> >  A. THE PRISTINE STORE
> >  =====================
> >  
> >  === A-1. Introduction ===
> >  
> 
> +1 on your changes here, and (having read through) virtually everywhere
> else in this commit.
> 
> I made a minor clarification in r1073749, and I have one question:
> 
> >  (d) To read a pristine text, the reader must:
> > -    1. Ensure no pristine-remove txn is in progress while querying and
> > -       opening it.
> > +    1. Query the SDB and open the file within the same SDB txn (to ensure
> > +       that no pristine-remove txn (A-3(b)) is in progress at the same
> > +       time).
> >      2. Ensure the pristine text remains in the store continuously from
> >         opening it for the duration of the read. (Perhaps by ensuring
> >         refcount remains >= 1 and/or by cooperating with the clean-up
> >         code.  Under spec B, the clean-up code is controlled by rule
> >         B-3(c), so holding any WC lock would prevent a pristine from being
> > -       deleted.)
> > +       deleted.  An alternative is to use the operating system's ability
> > +       to keep the file available for reading as long as the file handle
> > +       is open, even if the file's directory entry is removed.)
> > +
> > +(e) To clean up "orphan" pristine files:
> > +    1. 
> > +
> > +###?
> 
> I guess it would be:
> 
> (e) To clean up an "orphan" pristine file:
>     0. Acquire a 'RESERVED' lock.
>     1. Add a table row with refcount=0.
>     2. Follow the procedure A-3(b) for removing a pristine.
> 

You may have noticed that I glossed over the "How to determine which
pristines are orphaned" detail.

Coming back to that detail, it seems that iterating all on-disk
pristines is going to be expensive --- that's O(# files in wc)
pristines, and I haven't mentioned the DB queries.

Would it make sense to route the unlink() calls via the work queue, in
order to avoid having to do that expensive scan for orphans?


> And then we can also add an invariant A-2(c);
> 
> A-2(c) Pristines will only be added to or removed from the store by an
>        entity that holds an SDB lock.
> 
> Thoughts?
> 
> (I think it's useful to document the high level "What locks are needed
> for what operations" --- I was looking for such an invariant but didn't
> find any in (A).)

Re: svn commit: r1071795 - /subversion/trunk/notes/wc-ng/pristine-store

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
julianfoad@apache.org wrote on Thu, Feb 17, 2011 at 21:20:33 -0000:
> Author: julianfoad
> Date: Thu Feb 17 21:20:33 2011
> New Revision: 1071795
> 
> URL: http://svn.apache.org/viewvc?rev=1071795&view=rev
> Log:
> * notes/wc-ng/pristine-store
>   Update with initial feedback from danielsh.
> 
> Modified:
>     subversion/trunk/notes/wc-ng/pristine-store
> 
> Modified: subversion/trunk/notes/wc-ng/pristine-store
> URL: http://svn.apache.org/viewvc/subversion/trunk/notes/wc-ng/pristine-store?rev=1071795&r1=1071794&r2=1071795&view=diff
> ==============================================================================
> --- subversion/trunk/notes/wc-ng/pristine-store (original)
> +++ subversion/trunk/notes/wc-ng/pristine-store Thu Feb 17 21:20:33 2011
> @@ -1,22 +1,36 @@
>  A. THE PRISTINE STORE
>  =====================
>  
>  === A-1. Introduction ===
>  

+1 on your changes here, and (having read through) virtually everywhere
else in this commit.

I made a minor clarification in r1073749, and I have one question:

>  (d) To read a pristine text, the reader must:
> -    1. Ensure no pristine-remove txn is in progress while querying and
> -       opening it.
> +    1. Query the SDB and open the file within the same SDB txn (to ensure
> +       that no pristine-remove txn (A-3(b)) is in progress at the same
> +       time).
>      2. Ensure the pristine text remains in the store continuously from
>         opening it for the duration of the read. (Perhaps by ensuring
>         refcount remains >= 1 and/or by cooperating with the clean-up
>         code.  Under spec B, the clean-up code is controlled by rule
>         B-3(c), so holding any WC lock would prevent a pristine from being
> -       deleted.)
> +       deleted.  An alternative is to use the operating system's ability
> +       to keep the file available for reading as long as the file handle
> +       is open, even if the file's directory entry is removed.)
> +
> +(e) To clean up "orphan" pristine files:
> +    1. 
> +
> +###?

I guess it would be:

(e) To clean up an "orphan" pristine file:
    0. Acquire a 'RESERVED' lock.
    1. Add a table row with refcount=0.
    2. Follow the procedure A-3(b) for removing a pristine.

And then we can also add an invariant A-2(c);

A-2(c) Pristines will only be added to or removed from the store by an
       entity that holds an SDB lock.

Thoughts?

(I think it's useful to document the high level "What locks are needed
for what operations" --- I was looking for such an invariant but didn't
find any in (A).)