You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by David Glasser <gl...@davidglasser.net> on 2009/10/12 16:21:50 UTC

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

For what it's worth, the current schema for the rep-sharing DB is
completely incompatible with "remove data from server" obliterate
(though it's certainly compatible with "mask data from clients"
obliterate).

--dave

On Mon, Oct 12, 2009 at 4:24 AM, Julian Foad <ju...@btopenworld.com> wrote:
> Daniel Shahaf wrote:
>> [ couldn't sleep ]
>
> Heh. That's sometimes when the clearest thoughts come.
>
>> Daniel Shahaf wrote on Sat, 10 Oct 2009 at 14:31 +0200:
>> > Julian Foad wrote on Sat, 10 Oct 2009 at 01:55 +0100:
>> > > Add an initial skeleton implementation for "svn obliterate", with the UI
>> > > and public API changes hidden behind #ifdef SVN_WITH_EXPERIMENTAL_OBLITERATE.
>> > >
>> > > ...
>> > >
>> > > Index: subversion/libsvn_fs_fs/fs_fs.c
>> > > ===================================================================
>> > > --- subversion/libsvn_fs_fs/fs_fs.c       (revision 39914)
>> > > +++ subversion/libsvn_fs_fs/fs_fs.c       (working copy)
>> > > @@ -5885,6 +6025,45 @@ svn_fs_fs__commit(svn_revnum_t *new_rev_
>> > >  }
>> > >
>> > >  svn_error_t *
>> > > +svn_fs_fs__commit_obliteration(svn_revnum_t rev,
>> > > +                               svn_fs_t *fs,
>> > > +                               svn_fs_txn_t *txn,
>> > > +                               apr_pool_t *pool)
>> > > +{
>> > > +  struct commit_baton cb;
>> > > +  fs_fs_data_t *ffd = fs->fsap_data;
>> > > +
>> > > +  /* Analogous to svn_fs_fs__commit(). */
>> > > +  cb.new_rev_p = &rev;
>> > > +  cb.fs = fs;
>> > > +  cb.txn = txn;
>> > > +
>> > > +  if (ffd->rep_sharing_allowed)
>> > > +    {
>> > > +      cb.reps_to_cache = apr_array_make(pool, 5, sizeof(representation_t *));
>> > > +      cb.reps_pool = pool;
>> > > +    }
>> > > +  else
>> > > +    {
>> > > +      cb.reps_to_cache = NULL;
>> > > +      cb.reps_pool = NULL;
>> > > +    }
>> > > +
>> > > +  SVN_ERR(svn_fs_fs__with_write_lock(fs, commit_obliteration_body, &cb, pool));
>> > > +
>> > > +  if (ffd->rep_sharing_allowed)
>> > > +    {
>> > > +      /* ### TODO: ignore errors opening the DB (issue #3506) * */
>> > > +      SVN_ERR(svn_fs_fs__open_rep_cache(fs, pool));
>> > > +      SVN_ERR(svn_sqlite__with_transaction(ffd->rep_cache_db,
>> > > +                                           commit_sqlite_txn_callback,
>> > > +                                           &cb, pool));
>> > > +    }
>> > > +
>> >
>> > [ I think you know this already, but I'll say it anyway. ]
>
> Thanks for saying this: I don't yet have this level of awareness. I'm
> learning the whole repository and FSFS side as I go along. I'm very
> grateful for your eyes on this.
>
>> > Half of the lines of this function deal with storing new reps in the
>> > rep-cache DB.  Some day it should also remove the old DB rows (and
>> > somehow cause references in other revisions into the revision-being-
>> > obliterated to still work).
>
> When you say "cache", is it a cache in the sense that functionally it
> doesn't matter whether it contains any or all of these things? From what
> you ssay below, it sounds like it is a "rep store". Please let me
> understand that clearly before I go on.
>
>> Actually, after r39897, it's possible that DB rows referencing the
>> revision-being-obliterated[1] will be added at an arbitrary time after
>> that revision has been committed: it's (theoretically, depending on
>> the order SQLite hands out write locks) possible that the sequence
>>
>>     # make a few large (multi-file) commits
>>     # commit rBO
>>     # obliterate rBO
>>     # wait 3 seconds
>>
>> will result in rep-cache rows referring the obliterated version of rBO.
>>
>>
>> Not sure how to solve that.  We want to ensure that the obliterate doesn't
>> get the SQLite write lock before "its" commit gets the same lock. [2]
>>
>>
>> Possible damage?  If the DB rows relating to the original rBO are added
>> after its obliteration its complete, then reps created in the future may
>> rely on these rows and avoid writing themselves out explicitly --- even
>> though the reps may no longer be in the rBO rev file.
>
>
> Thanks for the other comments, which I believe I have addressed in what
> I committed in r39949.
>
> - Julian
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406584
>



-- 
glasser@davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406720

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Julian Foad <ju...@btopenworld.com>.

David Glasser wrote:
> For simplicity perhaps just have a separate obliterate write lock and
> not allow concurrent obliterates.

Yes: if there is any extra difficulty in managing concurrent obliterates
then I would just disallow them. (However, it may turn out there is no
extra difficulty in managing concurrent obliterates compared to managing
obliterate with ordinary commits going on concurrently.)

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407190

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by David Glasser <gl...@davidglasser.net>.

On Mon, Oct 12, 2009 at 5:59 PM, Greg Stein <gs...@gmail.com> wrote:
> On Mon, Oct 12, 2009 at 19:28, Branko Čibej <br...@xbc.nu> wrote:
>> David Glasser wrote:
>>> For what it's worth, the current schema for the rep-sharing DB is
>>> completely incompatible with "remove data from server" obliterate
>>
>> You mean in the sense that you can't reasonably in real-time remove
>> entries from the datbase, right? I can imagine having separate, offline
>> process that would scan the whole repository for a particular
>> representation hash and do the actual on-disk data removal and rep-cache
>> cleanup. But it would require a write lock on the repository.
>
> Isn't there some notion of the "total size" of the repository?
> Couldn't you record that value, scan that amount of the repository,
> then repeat if the repository grew during the scan. When it appears
> stable, take a write lock and examine the size again; if changed,
> release the lock and go back to scanning the incremental data. If the
> size is the same, then perform the obliterate surgery.
>
> That would place all the scanning outside of the write lock.
>
> I'm not sure what parts of the repository are mutable and would need
> further consideration, but it's probably doable.
>
> Also note that *another* obliterate could be wiping out stuff while
> the scanning is taking place. But I believe there will be some
> obliterate "counter" or somesuch, and if that changes during the
> scanning process, then I believe you just restart the process from
> scratch. And keep restarting until that counter is unchanged after the
> write lock is acquired.

Yes, something like that should work.  "youngest rev" would be a
reasonable "total size" here.

For simplicity perhaps just have a separate obliterate write lock and
not allow concurrent obliterates.

--dave

-- 
glasser@davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407183

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Greg Stein <gs...@gmail.com>.

On Mon, Oct 12, 2009 at 19:28, Branko Čibej <br...@xbc.nu> wrote:
> David Glasser wrote:
>> For what it's worth, the current schema for the rep-sharing DB is
>> completely incompatible with "remove data from server" obliterate
>
> You mean in the sense that you can't reasonably in real-time remove
> entries from the datbase, right? I can imagine having separate, offline
> process that would scan the whole repository for a particular
> representation hash and do the actual on-disk data removal and rep-cache
> cleanup. But it would require a write lock on the repository.

Isn't there some notion of the "total size" of the repository?
Couldn't you record that value, scan that amount of the repository,
then repeat if the repository grew during the scan. When it appears
stable, take a write lock and examine the size again; if changed,
release the lock and go back to scanning the incremental data. If the
size is the same, then perform the obliterate surgery.

That would place all the scanning outside of the write lock.

I'm not sure what parts of the repository are mutable and would need
further consideration, but it's probably doable.

Also note that *another* obliterate could be wiping out stuff while
the scanning is taking place. But I believe there will be some
obliterate "counter" or somesuch, and if that changes during the
scanning process, then I believe you just restart the process from
scratch. And keep restarting until that counter is unchanged after the
write lock is acquired.

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406867

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by David Glasser <gl...@davidglasser.net>.

On Mon, Oct 12, 2009 at 4:28 PM, Branko Čibej <br...@xbc.nu> wrote:
> David Glasser wrote:
>> For what it's worth, the current schema for the rep-sharing DB is
>> completely incompatible with "remove data from server" obliterate
>>
>
> You mean in the sense that you can't reasonably in real-time remove
> entries from the datbase, right? I can imagine having separate, offline
> process that would scan the whole repository for a particular
> representation hash and do the actual on-disk data removal and rep-cache
> cleanup. But it would require a write lock on the repository.

OK, sure, if you're OK with "disable commits while we walk the entire
repository, every single node" then it would work (though note that a
few days ago we decided that the rep-sharing DB should be written
after the write lock is dropped, which could hypothetically be minorly
problematic here).  Of course, that's going to be kind of slow.  But I
guess we do even have some repo-walking logic in there already (in the
svnadmin verify code, IIRC).

--dave

-- 
glasser@davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406846

Re: obliterate in trunk (was: svn commit: r39745 - trunk/subversion/svn)

Posted by Branko Cibej <br...@xbc.nu>.

David Glasser wrote:
> For what it's worth, the current schema for the rep-sharing DB is
> completely incompatible with "remove data from server" obliterate
>   

You mean in the sense that you can't reasonably in real-time remove
entries from the datbase, right? I can imagine having separate, offline
process that would scan the whole repository for a particular
representation hash and do the actual on-disk data removal and rep-cache
cleanup. But it would require a write lock on the repository.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2406842