You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Stein <gs...@gmail.com> on 2009/10/08 00:34:59 UTC

obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

fwiw, I'd be supportive of developing obliterate functionality within
trunk, as long as it is not visible at the cmdline (of any tools) for
non-developers. e.g. wrap with #ifdef SVN_DEBUG. (or maybe an
SVN__HAS_OBLITERATE for easier location, later)

For the APIs, I think they could go into things like include/private/
until they're ready to be public.

Cheers,
-g

On Fri, Oct 2, 2009 at 03:26, Julian Foad <ju...@btopenworld.com> wrote:
> Author: julianfoad
> Date: Fri Oct  2 00:26:11 2009
> New Revision: 39745
>
> Log:
> * subversion/svn/cl.h
>  Revert an unwanted part of r39744.
>
> Modified:
>   trunk/subversion/svn/cl.h
>
> Modified: trunk/subversion/svn/cl.h
> URL: http://svn.collab.net/viewvc/svn/trunk/subversion/svn/cl.h?pathrev=39745&r1=39744&r2=39745
> ==============================================================================
> --- trunk/subversion/svn/cl.h   Fri Oct  2 00:24:52 2009        (r39744)
> +++ trunk/subversion/svn/cl.h   Fri Oct  2 00:26:11 2009        (r39745)
> @@ -254,7 +254,6 @@ svn_opt_subcommand_t
>   svn_cl__mergeinfo,
>   svn_cl__mkdir,
>   svn_cl__move,
> -  svn_cl__obliterate,
>   svn_cl__patch,
>   svn_cl__propdel,
>   svn_cl__propedit,
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=495&dsMessageId=2402769
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2404749

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Julian Foad <ju...@btopenworld.com>.
David Glasser wrote:
> For simplicity perhaps just have a separate obliterate write lock and
> not allow concurrent obliterates.

Yes: if there is any extra difficulty in managing concurrent obliterates
then I would just disallow them. (However, it may turn out there is no
extra difficulty in managing concurrent obliterates compared to managing
obliterate with ordinary commits going on concurrently.)

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407190

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by David Glasser <gl...@davidglasser.net>.
On Mon, Oct 12, 2009 at 5:59 PM, Greg Stein <gs...@gmail.com> wrote:
> On Mon, Oct 12, 2009 at 19:28, Branko Čibej <br...@xbc.nu> wrote:
>> David Glasser wrote:
>>> For what it's worth, the current schema for the rep-sharing DB is
>>> completely incompatible with "remove data from server" obliterate
>>
>> You mean in the sense that you can't reasonably in real-time remove
>> entries from the datbase, right? I can imagine having separate, offline
>> process that would scan the whole repository for a particular
>> representation hash and do the actual on-disk data removal and rep-cache
>> cleanup. But it would require a write lock on the repository.
>
> Isn't there some notion of the "total size" of the repository?
> Couldn't you record that value, scan that amount of the repository,
> then repeat if the repository grew during the scan. When it appears
> stable, take a write lock and examine the size again; if changed,
> release the lock and go back to scanning the incremental data. If the
> size is the same, then perform the obliterate surgery.
>
> That would place all the scanning outside of the write lock.
>
> I'm not sure what parts of the repository are mutable and would need
> further consideration, but it's probably doable.
>
> Also note that *another* obliterate could be wiping out stuff while
> the scanning is taking place. But I believe there will be some
> obliterate "counter" or somesuch, and if that changes during the
> scanning process, then I believe you just restart the process from
> scratch. And keep restarting until that counter is unchanged after the
> write lock is acquired.

Yes, something like that should work.  "youngest rev" would be a
reasonable "total size" here.

For simplicity perhaps just have a separate obliterate write lock and
not allow concurrent obliterates.

--dave

-- 
glasser@davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407183

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Greg Stein <gs...@gmail.com>.
On Mon, Oct 12, 2009 at 19:28, Branko Čibej <br...@xbc.nu> wrote:
> David Glasser wrote:
>> For what it's worth, the current schema for the rep-sharing DB is
>> completely incompatible with "remove data from server" obliterate
>
> You mean in the sense that you can't reasonably in real-time remove
> entries from the datbase, right? I can imagine having separate, offline
> process that would scan the whole repository for a particular
> representation hash and do the actual on-disk data removal and rep-cache
> cleanup. But it would require a write lock on the repository.

Isn't there some notion of the "total size" of the repository?
Couldn't you record that value, scan that amount of the repository,
then repeat if the repository grew during the scan. When it appears
stable, take a write lock and examine the size again; if changed,
release the lock and go back to scanning the incremental data. If the
size is the same, then perform the obliterate surgery.

That would place all the scanning outside of the write lock.

I'm not sure what parts of the repository are mutable and would need
further consideration, but it's probably doable.

Also note that *another* obliterate could be wiping out stuff while
the scanning is taking place. But I believe there will be some
obliterate "counter" or somesuch, and if that changes during the
scanning process, then I believe you just restart the process from
scratch. And keep restarting until that counter is unchanged after the
write lock is acquired.

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406867

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by David Glasser <gl...@davidglasser.net>.
On Mon, Oct 12, 2009 at 4:28 PM, Branko Čibej <br...@xbc.nu> wrote:
> David Glasser wrote:
>> For what it's worth, the current schema for the rep-sharing DB is
>> completely incompatible with "remove data from server" obliterate
>>
>
> You mean in the sense that you can't reasonably in real-time remove
> entries from the datbase, right? I can imagine having separate, offline
> process that would scan the whole repository for a particular
> representation hash and do the actual on-disk data removal and rep-cache
> cleanup. But it would require a write lock on the repository.

OK, sure, if you're OK with "disable commits while we walk the entire
repository, every single node" then it would work (though note that a
few days ago we decided that the rep-sharing DB should be written
after the write lock is dropped, which could hypothetically be minorly
problematic here).  Of course, that's going to be kind of slow.  But I
guess we do even have some repo-walking logic in there already (in the
svnadmin verify code, IIRC).

--dave

-- 
glasser@davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406846

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Branko Cibej <br...@xbc.nu>.
David Glasser wrote:
> For what it's worth, the current schema for the rep-sharing DB is
> completely incompatible with "remove data from server" obliterate
>   

You mean in the sense that you can't reasonably in real-time remove
entries from the datbase, right? I can imagine having separate, offline
process that would scan the whole repository for a particular
representation hash and do the actual on-disk data removal and rep-cache
cleanup. But it would require a write lock on the repository.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406842

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by David Glasser <gl...@davidglasser.net>.
For what it's worth, the current schema for the rep-sharing DB is
completely incompatible with "remove data from server" obliterate
(though it's certainly compatible with "mask data from clients"
obliterate).

--dave

On Mon, Oct 12, 2009 at 4:24 AM, Julian Foad <ju...@btopenworld.com> wrote:
> Daniel Shahaf wrote:
>> [ couldn't sleep ]
>
> Heh. That's sometimes when the clearest thoughts come.
>
>> Daniel Shahaf wrote on Sat, 10 Oct 2009 at 14:31 +0200:
>> > Julian Foad wrote on Sat, 10 Oct 2009 at 01:55 +0100:
>> > > Add an initial skeleton implementation for "svn obliterate", with the UI
>> > > and public API changes hidden behind #ifdef SVN_WITH_EXPERIMENTAL_OBLITERATE.
>> > >
>> > > ...
>> > >
>> > > Index: subversion/libsvn_fs_fs/fs_fs.c
>> > > ===================================================================
>> > > --- subversion/libsvn_fs_fs/fs_fs.c       (revision 39914)
>> > > +++ subversion/libsvn_fs_fs/fs_fs.c       (working copy)
>> > > @@ -5885,6 +6025,45 @@ svn_fs_fs__commit(svn_revnum_t *new_rev_
>> > >  }
>> > >
>> > >  svn_error_t *
>> > > +svn_fs_fs__commit_obliteration(svn_revnum_t rev,
>> > > +                               svn_fs_t *fs,
>> > > +                               svn_fs_txn_t *txn,
>> > > +                               apr_pool_t *pool)
>> > > +{
>> > > +  struct commit_baton cb;
>> > > +  fs_fs_data_t *ffd = fs->fsap_data;
>> > > +
>> > > +  /* Analogous to svn_fs_fs__commit(). */
>> > > +  cb.new_rev_p = &rev;
>> > > +  cb.fs = fs;
>> > > +  cb.txn = txn;
>> > > +
>> > > +  if (ffd->rep_sharing_allowed)
>> > > +    {
>> > > +      cb.reps_to_cache = apr_array_make(pool, 5, sizeof(representation_t *));
>> > > +      cb.reps_pool = pool;
>> > > +    }
>> > > +  else
>> > > +    {
>> > > +      cb.reps_to_cache = NULL;
>> > > +      cb.reps_pool = NULL;
>> > > +    }
>> > > +
>> > > +  SVN_ERR(svn_fs_fs__with_write_lock(fs, commit_obliteration_body, &cb, pool));
>> > > +
>> > > +  if (ffd->rep_sharing_allowed)
>> > > +    {
>> > > +      /* ### TODO: ignore errors opening the DB (issue #3506) * */
>> > > +      SVN_ERR(svn_fs_fs__open_rep_cache(fs, pool));
>> > > +      SVN_ERR(svn_sqlite__with_transaction(ffd->rep_cache_db,
>> > > +                                           commit_sqlite_txn_callback,
>> > > +                                           &cb, pool));
>> > > +    }
>> > > +
>> >
>> > [ I think you know this already, but I'll say it anyway. ]
>
> Thanks for saying this: I don't yet have this level of awareness. I'm
> learning the whole repository and FSFS side as I go along. I'm very
> grateful for your eyes on this.
>
>> > Half of the lines of this function deal with storing new reps in the
>> > rep-cache DB.  Some day it should also remove the old DB rows (and
>> > somehow cause references in other revisions into the revision-being-
>> > obliterated to still work).
>
> When you say "cache", is it a cache in the sense that functionally it
> doesn't matter whether it contains any or all of these things? From what
> you ssay below, it sounds like it is a "rep store". Please let me
> understand that clearly before I go on.
>
>> Actually, after r39897, it's possible that DB rows referencing the
>> revision-being-obliterated[1] will be added at an arbitrary time after
>> that revision has been committed: it's (theoretically, depending on
>> the order SQLite hands out write locks) possible that the sequence
>>
>>     # make a few large (multi-file) commits
>>     # commit rBO
>>     # obliterate rBO
>>     # wait 3 seconds
>>
>> will result in rep-cache rows referring the obliterated version of rBO.
>>
>>
>> Not sure how to solve that.  We want to ensure that the obliterate doesn't
>> get the SQLite write lock before "its" commit gets the same lock. [2]
>>
>>
>> Possible damage?  If the DB rows relating to the original rBO are added
>> after its obliteration its complete, then reps created in the future may
>> rely on these rows and avoid writing themselves out explicitly --- even
>> though the reps may no longer be in the rBO rev file.
>
>
> Thanks for the other comments, which I believe I have addressed in what
> I committed in r39949.
>
> - Julian
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406584
>



-- 
glasser@davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406720

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Branko Cibej <br...@xbc.nu>.
Julian Foad wrote:
> Daniel Shahaf wrote:
>   
>>> Half of the lines of this function deal with storing new reps in the
>>> rep-cache DB.  Some day it should also remove the old DB rows (and
>>> somehow cause references in other revisions into the revision-being-
>>> obliterated to still work).
>>>       
>
> When you say "cache", is it a cache in the sense that functionally it
> doesn't matter whether it contains any or all of these things? From what
> you ssay below, it sounds like it is a "rep store". Please let me
> understand that clearly before I go on.
>   

It's not a cache; that was an unfortunate choice of name. The filesystem
doesn't break if that database goes away, but afterwards representation
sharing doesn't ... well, share, at least not existing representations.
So you lose functionality in the sense that you can get duplicate
representations.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406624

Re: FSFS "rev cache" operation [was: obliterate in trunk]

Posted by Greg Stein <gs...@gmail.com>.
On Tue, Oct 13, 2009 at 15:15, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>...
> Unrelated question: Do we keep a history of obliterations?  (I see it
> may be undesired for users, but it might make the implementation
> easier...)

I'd think, "we probably HAVE to keep the history."

Consider that a client could have a working copy that was checked out
at any point in time. It may have the obliterated content in it. We
have to tell the client, "that was obliterated. remove it from your
working copy." Not in the sense of data protection, but simply to keep
all the metadata synchronized. "DIR has 5 children [not 6]."

>...

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407365

Re: FSFS "rev cache" operation [was: obliterate in trunk]

Posted by Julian Foad <ju...@btopenworld.com>.
Daniel Shahaf wrote:
> Julian Foad wrote on Tue, 13 Oct 2009 at 18:06 +0100:
> > You raise the issue of deleting stale references from the cache, and
> > ensuring stale entries do not end up being used in new committed revs.
> > We can invalidate such entries in the cache synchronously with
> > obliteration, but you point out that the reference may have already been
> > read from the cache and stored in a pending transaction before we
> > invalidate it.
> 
> Good point, this is possible, but it isn't what I said.  The scenario I 
> described below (search for 'rBO') is a race condition around the sqlite 
> write lock that (theoretically) may cause cache entries for a given 
> revision to be first inserted into the DB only after the obliterate of 
> that revision has finished.
> 
> (I'm thinking of the case where the 'obliterate' gets the sqlite write
> lock before the commit-it-obliterates gets the sqlite write lock.)

OK, I see.

> > Therefore it seems we will need to validate all such
> > references in a new revision txn at commit finalization time. It might
> > be acceptable to abort a commit if it contains any stale reference, and
> > the user could re-try. The more correct action would be to ensure we
> > have a full copy of the rep available until we convert it into a
> > reference at finalization time.
> 
> IOW, when we take the write lock to finalize a commit, at that point we
> should make sure that all the reps we refer to are still alive and well.

Yes.

> Unrelated question: Do we keep a history of obliterations?  (I see it
> may be undesired for users, but it might make the implementation
> easier...)

Not sure yet. We will not keep a "full" history, because that would
effectively mean meta-versioning the repository, and that is not what
this feature is about. We might want to keep a trace of metadata about
what was obliterated, but my intuition says we should not strictly need
any such info for functional correctness. We will want to record such a
trace as a server log file for external consumption.

For performance, we will need at least a sequence number that enables us
to detect whether an obliteration has occurred since we recorded some
previous sequence number.


> > A way to describe this is that the conceptual "reference count" of a rep
> > needs to include references in pending transactions: if any pending
> > transaction refers to a given rep, then we can't (yet) delete that rep.
> > When we abort a transaction, then we can look through its references and
> > delete any reps for which it was the only reference. That sounds like it
> > would be inefficient to implement in the obvious way but I'm sure we can
> > find a good equivalent.
> > 
> 
> IOW: if we have other references for the rep, we should obliterate the
> file without physically deleting the rep?

Yes.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407352

Re: FSFS "rev cache" operation [was: obliterate in trunk]

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Tue, 13 Oct 2009 at 18:06 +0100:
> Daniel,
> 
> OK, I think I get it now. The "rep cache" is a cache of rep locations,
> keyed by rep checksum. When we want to add a new rep, we look in the
> cache to see if we have a copy of it stored already. If a location is
> not found in the cache, we might nonetheless already have a copy of the
> rep and we could search the rev files exhaustively, but we prefer not to
> spend the time doing so, so we just store (another copy of) the new rep
> in full.
> 

Yes.

> You raise the issue of deleting stale references from the cache, and
> ensuring stale entries do not end up being used in new committed revs.
> We can invalidate such entries in the cache synchronously with
> obliteration, but you point out that the reference may have already been
> read from the cache and stored in a pending transaction before we
> invalidate it.

Good point, this is possible, but it isn't what I said.  The scenario I 
described below (search for 'rBO') is a race condition around the sqlite 
write lock that (theoretically) may cause cache entries for a given 
revision to be first inserted into the DB only after the obliterate of 
that revision has finished.

(I'm thinking of the case where the 'obliterate' gets the sqlite write
lock before the commit-it-obliterates gets the sqlite write lock.)

> Therefore it seems we will need to validate all such
> references in a new revision txn at commit finalization time. It might
> be acceptable to abort a commit if it contains any stale reference, and
> the user could re-try. The more correct action would be to ensure we
> have a full copy of the rep available until we convert it into a
> reference at finalization time.
> 

IOW, when we take the write lock to finalize a commit, at that point we
should make sure that all the reps we refer to are still alive and well.

Unrelated question: Do we keep a history of obliterations?  (I see it
may be undesired for users, but it might make the implementation
easier...)

> A way to describe this is that the conceptual "reference count" of a rep
> needs to include references in pending transactions: if any pending
> transaction refers to a given rep, then we can't (yet) delete that rep.
> When we abort a transaction, then we can look through its references and
> delete any reps for which it was the only reference. That sounds like it
> would be inefficient to implement in the obvious way but I'm sure we can
> find a good equivalent.
> 

IOW: if we have other references for the rep, we should obliterate the
file without physically deleting the rep?

> - Julian
> 

Daniel
(a bit cloudy)

> 
> Daniel Shahaf wrote:
> > Representations are stored in the rev files.  The DB only stores the 
> > coordinates (offsets into rev files) of representations, keyed by the 
> > sha1 of the file generated by the representation.
> > 
> > Commits that want to use a representation write out its coordinates in 
> > full in the revision file they create.  (In particular, they *do not* 
> > refer to the DB; this is why the latter can be removed at any point.)
> > 
> > For example, in a Greek tree (r1) with 'svn ps foo bar iota' (r2) and
> > '/bin/cp iota iota2; svn add iota2' (r3):
> > 
> >     % sha1sum < wc1/trunk/iota
> >     2c0aa9014a0cd07f01795a333d82485ef6d083e2  -
> > 
> >     % sqlite3 r/db/rep-cache.db "select * from rep_cache where hash = '2c0aa9014a0cd07f01795a333d82485ef6d083e2';"
> >     2c0aa9014a0cd07f01795a333d82485ef6d083e2|1|547|37|25
> > 
> >     ### [1]
> >     % grep -ar 2c0aa9014a0cd07f01795a333d82485ef6d083e2 r/db/revs
> >     r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
> >     r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
> >     r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 2-2/_3
> > 
> >     ### [2]
> >     # that's the representation
> >     % xxd -s 347 -l 50 r/db/revs/0/1
> >     0000223: 4445 4c54 410a 5356 4e01 0000 1902 1a01  DELTA.SVN.......
> >     0000233: 9919 5468 6973 2069 7320 7468 6520 6669  ..This is the fi
> >     0000243: 6c65 2027 696f 7461 272e 0a45 4e44 5245  le 'iota'..ENDRE
> >     0000253: 500a                                     P.
> > 
> > Daniel
> > (The docs I got this info from are the 'Revision file format' section of
> > the FSFS 'structure' file.)
> > 
> > 
> > [1] There is a pack file because I build with PACK_AFTER_EVERY_COMMIT and
> >     with SVN_FS_FS_DEFAULT_MAX_FILES_PER_DIR=4.  A normal build would
> >     see matches in revs/0/{1,2,3}.
> >     
> > [2] 50 == 37 + strlen("DELTA\n") + strlen("ENDREP\n")
> > 
> > 
> > > > Actually, after r39897, it's possible that DB rows referencing the 
> > > > revision-being-obliterated[1] will be added at an arbitrary time after
> > > > that revision has been committed: it's (theoretically, depending on
> > > > the order SQLite hands out write locks) possible that the sequence
> > > > 
> > > >     # make a few large (multi-file) commits
> > > >     # commit rBO
> > > >     # obliterate rBO
> > > >     # wait 3 seconds
> > > > 
> > > > will result in rep-cache rows referring the obliterated version of rBO.
> > > > 
> > > > 
> > > > Not sure how to solve that.  We want to ensure that the obliterate doesn't
> > > > get the SQLite write lock before "its" commit gets the same lock. [2]
> > > > 
> > > > 
> > > > Possible damage?  If the DB rows relating to the original rBO are added
> > > > after its obliteration its complete, then reps created in the future may
> > > > rely on these rows and avoid writing themselves out explicitly --- even
> > > > though the reps may no longer be in the rBO rev file.
> 
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407219
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407282

Re: FSFS "rev cache" operation [was: obliterate in trunk]

Posted by Branko Cibej <br...@xbc.nu>.
David Glasser wrote:
> Well, the actual problem here is that the rep cache doesn't have a ref count.
>
> So let's say you want to obliterate the node /foo/bar@1234.  Its text
> may be in the rep r1234/9876.  If you actually want to remove that
> data from the backend (and not just mask it from clients), you need to
> know if any other nodes (which may not be related to this node via
> ancestry relations) use that rep.  Since the DB doesn't even have a
> refcount, you can't even know if it's safe to wipe the rep text (let
> alone where the other uses are).
>   

You could possibly solve this by adding another flag column o the
rep-cache table ... (lots of hand-waving now):

    * svn_fs_obliterate would just say, "this rep may have been
      obliterated". New commits that see this hint keep using the same rep.
    * A separate (periodic?) scanner would look at each such rep-cache
      entry, mark it as "now obliterating" and proceed to scan the
      repository to determine if any references remain.
    * Wave hands, prevent races with new commits that need this
      representation. It could be a simple as letting new commits just
      change that flag to "not obliterated" and let the scanner notice
      that before it actually tries to remove data.

> As Greg said, you can basically get around this by doing a bunch of
> slow repository walks to try to find other places that use the same
> rep.  (Of course half the reason that we want an obliterate feature is
> that people find dump/load to be too slow :) )
>   

These scans should obviously be offline, or rather, they shouldn't block
normal repository operations. This would require some more admin work to
set up a repository, but only when you use rep-sharing *and* need the
space-saving obliterate. And with proper design, the repository admin
could enable such periodic scans at any time after the repository was
created.

Better yet, when a poor admin notices that his lusers tend to obliterate
every second commit, she could turn off rep-sharing to make obliterate
more efficient. :D

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407408

Re: FSFS "rev cache" operation [was: obliterate in trunk]

Posted by David Glasser <gl...@davidglasser.net>.
Well, the actual problem here is that the rep cache doesn't have a ref count.

So let's say you want to obliterate the node /foo/bar@1234.  Its text
may be in the rep r1234/9876.  If you actually want to remove that
data from the backend (and not just mask it from clients), you need to
know if any other nodes (which may not be related to this node via
ancestry relations) use that rep.  Since the DB doesn't even have a
refcount, you can't even know if it's safe to wipe the rep text (let
alone where the other uses are).

As Greg said, you can basically get around this by doing a bunch of
slow repository walks to try to find other places that use the same
rep.  (Of course half the reason that we want an obliterate feature is
that people find dump/load to be too slow :) )

Note that Hyrum's original implementation had ref counts but since we
were combining two different attempts at ACID-style semantics (svn
FSFS and sqlite) in an unsound way, there were many ways that the ref
count could become incorrect, so we removed it rather than try to make
it perfect.

--dave

On Tue, Oct 13, 2009 at 10:06 AM, Julian Foad
<ju...@btopenworld.com> wrote:
> Daniel,
>
> OK, I think I get it now. The "rep cache" is a cache of rep locations,
> keyed by rep checksum. When we want to add a new rep, we look in the
> cache to see if we have a copy of it stored already. If a location is
> not found in the cache, we might nonetheless already have a copy of the
> rep and we could search the rev files exhaustively, but we prefer not to
> spend the time doing so, so we just store (another copy of) the new rep
> in full.
>
> You raise the issue of deleting stale references from the cache, and
> ensuring stale entries do not end up being used in new committed revs.
> We can invalidate such entries in the cache synchronously with
> obliteration, but you point out that the reference may have already been
> read from the cache and stored in a pending transaction before we
> invalidate it. Therefore it seems we will need to validate all such
> references in a new revision txn at commit finalization time. It might
> be acceptable to abort a commit if it contains any stale reference, and
> the user could re-try. The more correct action would be to ensure we
> have a full copy of the rep available until we convert it into a
> reference at finalization time.
>
> A way to describe this is that the conceptual "reference count" of a rep
> needs to include references in pending transactions: if any pending
> transaction refers to a given rep, then we can't (yet) delete that rep.
> When we abort a transaction, then we can look through its references and
> delete any reps for which it was the only reference. That sounds like it
> would be inefficient to implement in the obvious way but I'm sure we can
> find a good equivalent.
>
> - Julian
>
>
> Daniel Shahaf wrote:
>> Representations are stored in the rev files.  The DB only stores the
>> coordinates (offsets into rev files) of representations, keyed by the
>> sha1 of the file generated by the representation.
>>
>> Commits that want to use a representation write out its coordinates in
>> full in the revision file they create.  (In particular, they *do not*
>> refer to the DB; this is why the latter can be removed at any point.)
>>
>> For example, in a Greek tree (r1) with 'svn ps foo bar iota' (r2) and
>> '/bin/cp iota iota2; svn add iota2' (r3):
>>
>>     % sha1sum < wc1/trunk/iota
>>     2c0aa9014a0cd07f01795a333d82485ef6d083e2  -
>>
>>     % sqlite3 r/db/rep-cache.db "select * from rep_cache where hash = '2c0aa9014a0cd07f01795a333d82485ef6d083e2';"
>>     2c0aa9014a0cd07f01795a333d82485ef6d083e2|1|547|37|25
>>
>>     ### [1]
>>     % grep -ar 2c0aa9014a0cd07f01795a333d82485ef6d083e2 r/db/revs
>>     r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
>>     r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
>>     r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 2-2/_3
>>
>>     ### [2]
>>     # that's the representation
>>     % xxd -s 347 -l 50 r/db/revs/0/1
>>     0000223: 4445 4c54 410a 5356 4e01 0000 1902 1a01  DELTA.SVN.......
>>     0000233: 9919 5468 6973 2069 7320 7468 6520 6669  ..This is the fi
>>     0000243: 6c65 2027 696f 7461 272e 0a45 4e44 5245  le 'iota'..ENDRE
>>     0000253: 500a                                     P.
>>
>> Daniel
>> (The docs I got this info from are the 'Revision file format' section of
>> the FSFS 'structure' file.)
>>
>>
>> [1] There is a pack file because I build with PACK_AFTER_EVERY_COMMIT and
>>     with SVN_FS_FS_DEFAULT_MAX_FILES_PER_DIR=4.  A normal build would
>>     see matches in revs/0/{1,2,3}.
>>
>> [2] 50 == 37 + strlen("DELTA\n") + strlen("ENDREP\n")
>>
>>
>> > > Actually, after r39897, it's possible that DB rows referencing the
>> > > revision-being-obliterated[1] will be added at an arbitrary time after
>> > > that revision has been committed: it's (theoretically, depending on
>> > > the order SQLite hands out write locks) possible that the sequence
>> > >
>> > >     # make a few large (multi-file) commits
>> > >     # commit rBO
>> > >     # obliterate rBO
>> > >     # wait 3 seconds
>> > >
>> > > will result in rep-cache rows referring the obliterated version of rBO.
>> > >
>> > >
>> > > Not sure how to solve that.  We want to ensure that the obliterate doesn't
>> > > get the SQLite write lock before "its" commit gets the same lock. [2]
>> > >
>> > >
>> > > Possible damage?  If the DB rows relating to the original rBO are added
>> > > after its obliteration its complete, then reps created in the future may
>> > > rely on these rows and avoid writing themselves out explicitly --- even
>> > > though the reps may no longer be in the rBO rev file.
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407219
>



-- 
glasser@davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407239

FSFS "rev cache" operation [was: obliterate in trunk]

Posted by Julian Foad <ju...@btopenworld.com>.
Daniel,

OK, I think I get it now. The "rep cache" is a cache of rep locations,
keyed by rep checksum. When we want to add a new rep, we look in the
cache to see if we have a copy of it stored already. If a location is
not found in the cache, we might nonetheless already have a copy of the
rep and we could search the rev files exhaustively, but we prefer not to
spend the time doing so, so we just store (another copy of) the new rep
in full.

You raise the issue of deleting stale references from the cache, and
ensuring stale entries do not end up being used in new committed revs.
We can invalidate such entries in the cache synchronously with
obliteration, but you point out that the reference may have already been
read from the cache and stored in a pending transaction before we
invalidate it. Therefore it seems we will need to validate all such
references in a new revision txn at commit finalization time. It might
be acceptable to abort a commit if it contains any stale reference, and
the user could re-try. The more correct action would be to ensure we
have a full copy of the rep available until we convert it into a
reference at finalization time.

A way to describe this is that the conceptual "reference count" of a rep
needs to include references in pending transactions: if any pending
transaction refers to a given rep, then we can't (yet) delete that rep.
When we abort a transaction, then we can look through its references and
delete any reps for which it was the only reference. That sounds like it
would be inefficient to implement in the obvious way but I'm sure we can
find a good equivalent.

- Julian


Daniel Shahaf wrote:
> Representations are stored in the rev files.  The DB only stores the 
> coordinates (offsets into rev files) of representations, keyed by the 
> sha1 of the file generated by the representation.
> 
> Commits that want to use a representation write out its coordinates in 
> full in the revision file they create.  (In particular, they *do not* 
> refer to the DB; this is why the latter can be removed at any point.)
> 
> For example, in a Greek tree (r1) with 'svn ps foo bar iota' (r2) and
> '/bin/cp iota iota2; svn add iota2' (r3):
> 
>     % sha1sum < wc1/trunk/iota
>     2c0aa9014a0cd07f01795a333d82485ef6d083e2  -
> 
>     % sqlite3 r/db/rep-cache.db "select * from rep_cache where hash = '2c0aa9014a0cd07f01795a333d82485ef6d083e2';"
>     2c0aa9014a0cd07f01795a333d82485ef6d083e2|1|547|37|25
> 
>     ### [1]
>     % grep -ar 2c0aa9014a0cd07f01795a333d82485ef6d083e2 r/db/revs
>     r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
>     r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
>     r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 2-2/_3
> 
>     ### [2]
>     # that's the representation
>     % xxd -s 347 -l 50 r/db/revs/0/1
>     0000223: 4445 4c54 410a 5356 4e01 0000 1902 1a01  DELTA.SVN.......
>     0000233: 9919 5468 6973 2069 7320 7468 6520 6669  ..This is the fi
>     0000243: 6c65 2027 696f 7461 272e 0a45 4e44 5245  le 'iota'..ENDRE
>     0000253: 500a                                     P.
> 
> Daniel
> (The docs I got this info from are the 'Revision file format' section of
> the FSFS 'structure' file.)
> 
> 
> [1] There is a pack file because I build with PACK_AFTER_EVERY_COMMIT and
>     with SVN_FS_FS_DEFAULT_MAX_FILES_PER_DIR=4.  A normal build would
>     see matches in revs/0/{1,2,3}.
>     
> [2] 50 == 37 + strlen("DELTA\n") + strlen("ENDREP\n")
> 
> 
> > > Actually, after r39897, it's possible that DB rows referencing the 
> > > revision-being-obliterated[1] will be added at an arbitrary time after
> > > that revision has been committed: it's (theoretically, depending on
> > > the order SQLite hands out write locks) possible that the sequence
> > > 
> > >     # make a few large (multi-file) commits
> > >     # commit rBO
> > >     # obliterate rBO
> > >     # wait 3 seconds
> > > 
> > > will result in rep-cache rows referring the obliterated version of rBO.
> > > 
> > > 
> > > Not sure how to solve that.  We want to ensure that the obliterate doesn't
> > > get the SQLite write lock before "its" commit gets the same lock. [2]
> > > 
> > > 
> > > Possible damage?  If the DB rows relating to the original rBO are added
> > > after its obliteration its complete, then reps created in the future may
> > > rely on these rows and avoid writing themselves out explicitly --- even
> > > though the reps may no longer be in the rBO rev file.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407219

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Mon, 12 Oct 2009 at 12:24 +0100:
> > Daniel Shahaf wrote on Sat, 10 Oct 2009 at 14:31 +0200:
> > > Julian Foad wrote on Sat, 10 Oct 2009 at 01:55 +0100:
> > > > Index: subversion/libsvn_fs_fs/fs_fs.c
> > > > ===================================================================
> > > > --- subversion/libsvn_fs_fs/fs_fs.c	(revision 39914)
> > > > +++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
> > > > @@ -5885,6 +6025,45 @@ svn_fs_fs__commit(svn_revnum_t *new_rev_
> > > >  }
> > > >  
> > > >  svn_error_t *
> > > > +svn_fs_fs__commit_obliteration(svn_revnum_t rev,
> > > > +                               svn_fs_t *fs,
> > > > +                               svn_fs_txn_t *txn,
> > > > +                               apr_pool_t *pool)
> > > > +{
...
> > > > +  if (ffd->rep_sharing_allowed)
> > > > +    {
...
> > > > +    }
> > > > +
> > > > +  SVN_ERR(svn_fs_fs__with_write_lock(fs, commit_obliteration_body, &cb, pool));
> > > > +
> > > > +  if (ffd->rep_sharing_allowed)
> > > > +    {
...
> > > > +    }
> > > > +
> > > 
> > > [ I think you know this already, but I'll say it anyway. ]
> 
> Thanks for saying this: I don't yet have this level of awareness. I'm
> learning the whole repository and FSFS side as I go along. I'm very
> grateful for your eyes on this.
> 

Thanks for explaining.  However, I'm not much more familiar with FSFS
than you are --- I read a bit, and fixed a few bugs, but that only
scratches the surface.

> > > Half of the lines of this function deal with storing new reps in the
> > > rep-cache DB.  Some day it should also remove the old DB rows (and
> > > somehow cause references in other revisions into the revision-being-
> > > obliterated to still work).
> 
> When you say "cache", is it a cache in the sense that functionally it
> doesn't matter whether it contains any or all of these things? From what
> you ssay below, it sounds like it is a "rep store". Please let me
> understand that clearly before I go on.

Representations are stored in the rev files.  The DB only stores the 
coordinates (offsets into rev files) of representations, keyed by the 
sha1 of the file generated by the representation.

Commits that want to use a representation write out its coordinates in 
full in the revision file they create.  (In particular, they *do not* 
refer to the DB; this is why the latter can be removed at any point.)

For example, in a Greek tree (r1) with 'svn ps foo bar iota' (r2) and
'/bin/cp iota iota2; svn add iota2' (r3):

    % sha1sum < wc1/trunk/iota
    2c0aa9014a0cd07f01795a333d82485ef6d083e2  -

    % sqlite3 r/db/rep-cache.db "select * from rep_cache where hash = '2c0aa9014a0cd07f01795a333d82485ef6d083e2';"
    2c0aa9014a0cd07f01795a333d82485ef6d083e2|1|547|37|25

    ### [1]
    % grep -ar 2c0aa9014a0cd07f01795a333d82485ef6d083e2 r/db/revs
    r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
    r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 0-0/_16
    r/db/revs/0.pack/pack:text: 1 547 37 25 2d18c5e57e84c5b8a5e9a6e13fa394dc 2c0aa9014a0cd07f01795a333d82485ef6d083e2 2-2/_3

    ### [2]
    # that's the representation
    % xxd -s 347 -l 50 r/db/revs/0/1
    0000223: 4445 4c54 410a 5356 4e01 0000 1902 1a01  DELTA.SVN.......
    0000233: 9919 5468 6973 2069 7320 7468 6520 6669  ..This is the fi
    0000243: 6c65 2027 696f 7461 272e 0a45 4e44 5245  le 'iota'..ENDRE
    0000253: 500a                                     P.

Daniel
(The docs I got this info from are the 'Revision file format' section of
the FSFS 'structure' file.)


[1] There is a pack file because I build with PACK_AFTER_EVERY_COMMIT and
    with SVN_FS_FS_DEFAULT_MAX_FILES_PER_DIR=4.  A normal build would
    see matches in revs/0/{1,2,3}.
    
[2] 50 == 37 + strlen("DELTA\n") + strlen("ENDREP\n")


> > Actually, after r39897, it's possible that DB rows referencing the 
> > revision-being-obliterated[1] will be added at an arbitrary time after
> > that revision has been committed: it's (theoretically, depending on
> > the order SQLite hands out write locks) possible that the sequence
> > 
> >     # make a few large (multi-file) commits
> >     # commit rBO
> >     # obliterate rBO
> >     # wait 3 seconds
> > 
> > will result in rep-cache rows referring the obliterated version of rBO.
> > 
> > 
> > Not sure how to solve that.  We want to ensure that the obliterate doesn't
> > get the SQLite write lock before "its" commit gets the same lock. [2]
> > 
> > 
> > Possible damage?  If the DB rows relating to the original rBO are added
> > after its obliteration its complete, then reps created in the future may
> > rely on these rows and avoid writing themselves out explicitly --- even
> > though the reps may no longer be in the rBO rev file.
> 
> 
> Thanks for the other comments, which I believe I have addressed in what
> I committed in r39949.
> 
> - Julian
> 
> 
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406817

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Julian Foad <ju...@btopenworld.com>.
Daniel Shahaf wrote:
> [ couldn't sleep ]

Heh. That's sometimes when the clearest thoughts come.

> Daniel Shahaf wrote on Sat, 10 Oct 2009 at 14:31 +0200:
> > Julian Foad wrote on Sat, 10 Oct 2009 at 01:55 +0100:
> > > Add an initial skeleton implementation for "svn obliterate", with the UI
> > > and public API changes hidden behind #ifdef SVN_WITH_EXPERIMENTAL_OBLITERATE.
> > > 
> > > ... 
> > > 
> > > Index: subversion/libsvn_fs_fs/fs_fs.c
> > > ===================================================================
> > > --- subversion/libsvn_fs_fs/fs_fs.c	(revision 39914)
> > > +++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
> > > @@ -5885,6 +6025,45 @@ svn_fs_fs__commit(svn_revnum_t *new_rev_
> > >  }
> > >  
> > >  svn_error_t *
> > > +svn_fs_fs__commit_obliteration(svn_revnum_t rev,
> > > +                               svn_fs_t *fs,
> > > +                               svn_fs_txn_t *txn,
> > > +                               apr_pool_t *pool)
> > > +{
> > > +  struct commit_baton cb;
> > > +  fs_fs_data_t *ffd = fs->fsap_data;
> > > +
> > > +  /* Analogous to svn_fs_fs__commit(). */
> > > +  cb.new_rev_p = &rev;
> > > +  cb.fs = fs;
> > > +  cb.txn = txn;
> > > +
> > > +  if (ffd->rep_sharing_allowed)
> > > +    {
> > > +      cb.reps_to_cache = apr_array_make(pool, 5, sizeof(representation_t *));
> > > +      cb.reps_pool = pool;
> > > +    }
> > > +  else
> > > +    {
> > > +      cb.reps_to_cache = NULL;
> > > +      cb.reps_pool = NULL;
> > > +    }
> > > +
> > > +  SVN_ERR(svn_fs_fs__with_write_lock(fs, commit_obliteration_body, &cb, pool));
> > > +
> > > +  if (ffd->rep_sharing_allowed)
> > > +    {
> > > +      /* ### TODO: ignore errors opening the DB (issue #3506) * */
> > > +      SVN_ERR(svn_fs_fs__open_rep_cache(fs, pool));
> > > +      SVN_ERR(svn_sqlite__with_transaction(ffd->rep_cache_db,
> > > +                                           commit_sqlite_txn_callback,
> > > +                                           &cb, pool));
> > > +    }
> > > +
> > 
> > [ I think you know this already, but I'll say it anyway. ]

Thanks for saying this: I don't yet have this level of awareness. I'm
learning the whole repository and FSFS side as I go along. I'm very
grateful for your eyes on this.

> > Half of the lines of this function deal with storing new reps in the
> > rep-cache DB.  Some day it should also remove the old DB rows (and
> > somehow cause references in other revisions into the revision-being-
> > obliterated to still work).

When you say "cache", is it a cache in the sense that functionally it
doesn't matter whether it contains any or all of these things? From what
you ssay below, it sounds like it is a "rep store". Please let me
understand that clearly before I go on.

> Actually, after r39897, it's possible that DB rows referencing the 
> revision-being-obliterated[1] will be added at an arbitrary time after
> that revision has been committed: it's (theoretically, depending on
> the order SQLite hands out write locks) possible that the sequence
> 
>     # make a few large (multi-file) commits
>     # commit rBO
>     # obliterate rBO
>     # wait 3 seconds
> 
> will result in rep-cache rows referring the obliterated version of rBO.
> 
> 
> Not sure how to solve that.  We want to ensure that the obliterate doesn't
> get the SQLite write lock before "its" commit gets the same lock. [2]
> 
> 
> Possible damage?  If the DB rows relating to the original rBO are added
> after its obliteration its complete, then reps created in the future may
> rely on these rows and avoid writing themselves out explicitly --- even
> though the reps may no longer be in the rBO rev file.


Thanks for the other comments, which I believe I have addressed in what
I committed in r39949.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406584

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
[ couldn't sleep ]

Daniel Shahaf wrote on Sat, 10 Oct 2009 at 14:31 +0200:
> Julian Foad wrote on Sat, 10 Oct 2009 at 01:55 +0100:
> > Add an initial skeleton implementation for "svn obliterate", with the UI
> > and public API changes hidden behind #ifdef SVN_WITH_EXPERIMENTAL_OBLITERATE.
> > 
> > ... 
> > 
> > Index: subversion/libsvn_fs_fs/fs_fs.c
> > ===================================================================
> > --- subversion/libsvn_fs_fs/fs_fs.c	(revision 39914)
> > +++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
> > @@ -5885,6 +6025,45 @@ svn_fs_fs__commit(svn_revnum_t *new_rev_
> >  }
> >  
> >  svn_error_t *
> > +svn_fs_fs__commit_obliteration(svn_revnum_t rev,
> > +                               svn_fs_t *fs,
> > +                               svn_fs_txn_t *txn,
> > +                               apr_pool_t *pool)
> > +{
> > +  struct commit_baton cb;
> > +  fs_fs_data_t *ffd = fs->fsap_data;
> > +
> > +  /* Analogous to svn_fs_fs__commit(). */
> > +  cb.new_rev_p = &rev;
> > +  cb.fs = fs;
> > +  cb.txn = txn;
> > +
> > +  if (ffd->rep_sharing_allowed)
> > +    {
> > +      cb.reps_to_cache = apr_array_make(pool, 5, sizeof(representation_t *));
> > +      cb.reps_pool = pool;
> > +    }
> > +  else
> > +    {
> > +      cb.reps_to_cache = NULL;
> > +      cb.reps_pool = NULL;
> > +    }
> > +
> > +  SVN_ERR(svn_fs_fs__with_write_lock(fs, commit_obliteration_body, &cb, pool));
> > +
> > +  if (ffd->rep_sharing_allowed)
> > +    {
> > +      /* ### TODO: ignore errors opening the DB (issue #3506) * */
> > +      SVN_ERR(svn_fs_fs__open_rep_cache(fs, pool));
> > +      SVN_ERR(svn_sqlite__with_transaction(ffd->rep_cache_db,
> > +                                           commit_sqlite_txn_callback,
> > +                                           &cb, pool));
> > +    }
> > +
> 
> [ I think you know this already, but I'll say it anyway. ]
> 
> Half of the lines of this function deal with storing new reps in the
> rep-cache DB.  Some day it should also remove the old DB rows (and
> somehow cause references in other revisions into the revision-being-
> obliterated to still work).
> 

Actually, after r39897, it's possible that DB rows referencing the 
revision-being-obliterated[1] will be added at an arbitrary time after
that revision has been committed: it's (theoretically, depending on
the order SQLite hands out write locks) possible that the sequence

    # make a few large (multi-file) commits
    # commit rBO
    # obliterate rBO
    # wait 3 seconds

will result in rep-cache rows referring the obliterated version of rBO.


Not sure how to solve that.  We want to ensure that the obliterate doesn't
get the SQLite write lock before "its" commit gets the same lock. [2]


Possible damage?  If the DB rows relating to the original rBO are added
after its obliteration its complete, then reps created in the future may
rely on these rows and avoid writing themselves out explicitly --- even
though the reps may no longer be in the rBO rev file.


Daniel



[1] this needs an acronym, it's too long to type every time.
[2] theoretically I'm used to assuming different processes have arbitrary
    relative speeds, which makes the situation even worse (it forbids
    assuming that the commit process has even requested the SQLite write
    lock).
[3] http://www.sqlite.org/lockingv3.html

> > +  return SVN_NO_ERROR; +}

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406286

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Sat, 10 Oct 2009 at 01:55 +0100:
> Julian Foad wrote:
> > Here is the patch I'm planning to commit initially. I'll fill in the log
> > message first.
> 
> Thanks for the comments on IRC. In the revised patch attached, printf's
> are gone, fs_fs code updated with today's changes, file licence header
> added. Inserting the "obliterate" entry somewhere in the middle of the
> repos/fs vtables should be fine because those are private vtables so no
> API versioning needed and all code using them is built at once.

*nod*.

> By the way I think it's rotten to post a patch without a proper log
> message but I'm doing it again. My log message will basically say "New
> function" or "New entry in vtable" or "New test" for every change. There
> is not a single modification or deletion.
> 

I don't see a problem.  Seeing a thousand "New function" lines in the
log message would teach me nothing --- I would skip reading it.

> Add an initial skeleton implementation for "svn obliterate", with the UI
> and public API changes hidden behind #ifdef SVN_WITH_EXPERIMENTAL_OBLITERATE.
> 
> ... 
> 
> Index: subversion/libsvn_fs_fs/fs_fs.c
> ===================================================================
> --- subversion/libsvn_fs_fs/fs_fs.c	(revision 39914)
> +++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
> @@ -5885,6 +6025,45 @@ svn_fs_fs__commit(svn_revnum_t *new_rev_
>  }
>  
>  svn_error_t *
> +svn_fs_fs__commit_obliteration(svn_revnum_t rev,
> +                               svn_fs_t *fs,
> +                               svn_fs_txn_t *txn,
> +                               apr_pool_t *pool)
> +{
> +  struct commit_baton cb;
> +  fs_fs_data_t *ffd = fs->fsap_data;
> +
> +  /* Analogous to svn_fs_fs__commit(). */
> +  cb.new_rev_p = &rev;
> +  cb.fs = fs;
> +  cb.txn = txn;
> +
> +  if (ffd->rep_sharing_allowed)
> +    {
> +      cb.reps_to_cache = apr_array_make(pool, 5, sizeof(representation_t *));
> +      cb.reps_pool = pool;
> +    }
> +  else
> +    {
> +      cb.reps_to_cache = NULL;
> +      cb.reps_pool = NULL;
> +    }
> +
> +  SVN_ERR(svn_fs_fs__with_write_lock(fs, commit_obliteration_body, &cb, pool));
> +
> +  if (ffd->rep_sharing_allowed)
> +    {
> +      /* ### TODO: ignore errors opening the DB (issue #3506) * */
> +      SVN_ERR(svn_fs_fs__open_rep_cache(fs, pool));
> +      SVN_ERR(svn_sqlite__with_transaction(ffd->rep_cache_db,
> +                                           commit_sqlite_txn_callback,
> +                                           &cb, pool));
> +    }
> +

[ I think you know this already, but I'll say it anyway. ]

Half of the lines of this function deal with storing new reps in the
rep-cache DB.  Some day it should also remove the old DB rows (and
somehow cause references in other revisions into the revision-being-
obliterated to still work).

> +  return SVN_NO_ERROR;
> +}

> Index: subversion/libsvn_fs/fs-loader.c
> ===================================================================
> --- subversion/libsvn_fs/fs-loader.c	(revision 39914)
> +++ subversion/libsvn_fs/fs-loader.c	(working copy)
> @@ -678,6 +688,31 @@ svn_fs_commit_txn(const char **conflict_
>  }
>  
>  svn_error_t *
> +svn_fs__commit_obliteration_txn(svn_revnum_t rev, svn_fs_txn_t *txn,
> +                                apr_pool_t *pool)
> +{
> +#ifdef PACK_AFTER_EVERY_COMMIT
> +  svn_fs_root_t *txn_root;
> +  svn_fs_t *fs;
> +  const char *fs_path;
> +
> +  SVN_ERR(svn_fs_txn_root(&txn_root, txn, pool));
> +  fs = svn_fs_root_fs(txn_root);
> +  fs_path = svn_fs_path(fs, pool);
> +#endif
> +
> +  SVN_ERR(txn->vtable->commit_obliteration(rev, txn, pool));
> +
> +#ifdef PACK_AFTER_EVERY_COMMIT
> +  /* ### Can't do the normal packing: this isn't a normal commit.
> +   * SVN_ERR(svn_fs_pack(fs_path, NULL, NULL, NULL, NULL, pool)); */
> +  return svn_error_create(SVN_ERR_UNSUPPORTED_FEATURE, NULL, NULL);
> +#endif
> +

There is no reason to run svn_fs_pack() here because obliteration
doesn't create a new youngest revision.  You can probably kill
the #ifdef..#endif parts of this function.

(The reason PACK_AFTER_EVERY_COMMIT exists is to make it discover
concurrency issues in the packing code; see, for example,
update_min_unpacked_rev() in <fs_fs.c>.  It is not part of the API.)

> +  return SVN_NO_ERROR;
> +}

> Index: subversion/include/private/svn_fs_private.h
> ===================================================================
> --- subversion/include/private/svn_fs_private.h	(revision 39914)
> +++ subversion/include/private/svn_fs_private.h	(working copy)
> @@ -58,6 +58,37 @@ extern "C" {
>  apr_hash_t *
>  svn_fs__access_get_lock_tokens(svn_fs_access_t *access_ctx);
>  
> +
> +/**
> + * Same as svn_fs_begin_txn2(), except it begins an obliteration-txn
> + * that can be used to replace revision @a rev. @a rev must be a valid
> + * revision number at the time of this call. This transaction cannot be
> + * committed with a normal commit but only with an "obliterate commit" - see
> + * svn_fs_commit_obliterate_txn().

s/svn_fs_commit_obliterate_txn/svn_fs__commit_obliterate_txn/

> Index: subversion/libsvn_client/obliterate.c
> ===================================================================
> --- subversion/libsvn_client/obliterate.c	(revision 0)
> +++ subversion/libsvn_client/obliterate.c	(revision 0)
> +  if (ctx->notify_func2)
> +    {
> +      svn_wc_notify_t *notify
> +        = svn_wc_create_notify_url(url, svn_wc_notify_delete, pool);

svn_wc_notify_obliterate?

> Index: subversion/tests/libsvn_fs/fs-test.c
> ===================================================================
> --- subversion/tests/libsvn_fs/fs-test.c	(revision 39914)
> +++ subversion/tests/libsvn_fs/fs-test.c	(working copy)
> @@ -31,6 +31,7 @@
>  #include "svn_time.h"
>  #include "svn_string.h"
>  #include "svn_fs.h"
> +#include "private/svn_fs_private.h"

Usually all private includes come after all public includes, not mixed with
them.

>  #include "svn_checksum.h"
>  #include "svn_mergeinfo.h"
>  #include "svn_props.h"

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2405848

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Julian Foad <ju...@btopenworld.com>.
Julian Foad wrote:
> Here is the patch I'm planning to commit initially. I'll fill in the log
> message first.

Thanks for the comments on IRC. In the revised patch attached, printf's
are gone, fs_fs code updated with today's changes, file licence header
added. Inserting the "obliterate" entry somewhere in the middle of the
repos/fs vtables should be fine because those are private vtables so no
API versioning needed and all code using them is built at once.

By the way I think it's rotten to post a patch without a proper log
message but I'm doing it again. My log message will basically say "New
function" or "New entry in vtable" or "New test" for every change. There
is not a single modification or deletion.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2405736

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Julian Foad <ju...@btopenworld.com>.
Julian Foad wrote:
> I'm preparing a patch.

Here is the patch I'm planning to commit initially. I'll fill in the log
message first.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2405604

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Julian Foad <ju...@btopenworld.com>.
On Wed, 2009-10-07, Greg Stein wrote:
> fwiw, I'd be supportive of developing obliterate functionality within
> trunk, as long as it is not visible at the cmdline (of any tools) for
> non-developers. e.g. wrap with #ifdef SVN_DEBUG. (or maybe an
> SVN__HAS_OBLITERATE for easier location, later)
> 
> For the APIs, I think they could go into things like include/private/
> until they're ready to be public.

Yes, I think that would be a great way to get the development done where
most eyes are on it, and yet avoid any difficulty if we need to do a
release before this is ready.

I'm preparing a patch.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2405552