You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@wandisco.com> on 2009/10/01 14:48:29 UTC

Two approaches to data-hiding (for obliterate)

I'm thinking about the data-hiding part of "obliterate".

Example: obliterate (hide) /path/to/foo in revision 50.


* Approach 1:

  Edit the FS in the database: replace the old revision 50 with a new
transaction that's mostly the same except doesn't have an entry for
"foo" in its "/dir/to" directory node. "Commit" the new transaction in
place of revision 50, adjusting all links that pointed to the old
revision 50 to point to the new one instead.

  This is an FS-layer change, plus support needed in the DB layers to be
able to replace an existing revision and edit other metadata.


* Approach 2:

  Extend Subversion's path-based authz semantics to enable us to block
access to /path/to/foo@50.


These both sound feasible to me. Approach 1 gets us much closer to being
able to recover the disk space. This is the approach I'm looking at so
far. Approach 2 sounds like it might be quicker (easier) to reach the
data-hiding stage. Indeed, it is already a usable solution for hiding an
unwanted file if the file's path will never need to be visible in any
revision. But Greg cautions me that it will be hard.

Can anyone comment?

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402554

Re: Two approaches to data-hiding (for obliterate)

Posted by Branko Cibej <br...@xbc.nu>.
Julian Foad wrote:
> I think what you're getting at is that data-hiding at the repos
> layer, with data-destruction going on underneath it at the FS layer, is
> not a clean design because the two layers would have to synchronize
> their knowledge of what's hidden. If they didn't, the repos layer might
> expose paths that have been destroyed at the FS layer. Or that sort of
> thing: a spurious coupling between the layers.
>   

Yes, that's a major problem to be avoided, it always makes maintenance
harder than it should be.

Of course it may turn out to be easier to do the repos-level thing
first, then just ditch it as you dig deeper into the space-saving.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402677

Re: Two approaches to data-hiding (for obliterate)

Posted by Greg Stein <gs...@gmail.com>.
On Thu, Oct 1, 2009 at 17:00, Mark Phippard <ma...@gmail.com> wrote:
> On Thu, Oct 1, 2009 at 4:54 PM, Greg Stein <gs...@gmail.com> wrote:
>> On Thu, Oct 1, 2009 at 16:07, Julian Foad <ju...@wandisco.com> wrote:
>>>...
>>>>     * Authz is (currenty, brokenly :) a RA-layer-specific optional
>>>>       feature; obliterate cannot be.
>>>
>>> That's not a conceptual difference. Authz is RA-layer-specific just a
>>> de-facto outcome of how authz was developed. A comparable situation with
>>> obliterate is that it could be FS-type-specific - implemented for FSFS
>>> but not BDB or vice versa, and at least it will necessarily have
>>> different supporting implementations for the two FS types.
>>>
>>> Of course users will want obliterate to be available on both FSFS and
>>> BDB, and as developers we will want the design and implementation to be
>>> as FS-agnostic as possible, but it will certainly be dependent on if and
>>> when we implement equivalent things in two back-ends.
>>
>> I would not be opposed to telling people "you must use FSFS for the
>> obliterate feature". I'm sure others may feel differently, but there's
>> my vote/opinion.
>
> I seem to recall Julian mentioned it would be easier to implement this
> with BDB.  Would you agree that we would probably not be OK with only
> supporting it fully in BDB?  IOW, supporting it in FSFS is probably a
> must, and BDB is only nice.

Yah, I think that's true. Consider my opinion duly amended :-D

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402721

Re: Two approaches to data-hiding (for obliterate)

Posted by "Ph. Marek" <ph...@emerion.com>.
Hello everybody!

>>> I would not be opposed to telling people "you must use FSFS for the
>>> obliterate feature". I'm sure others may feel differently, but there's
>>> my vote/opinion.
>> I seem to recall Julian mentioned it would be easier to implement this
>> with BDB.
> Yes, it would be *much* easier with BDB, since it wasn't conceived and
> designed as a write-once store.
 From my practical tests BDB is *much* more reliable for commits with  
an awful lot of files, ie. in the multiple thousand case.

With that I mean that FSVS behaviour of putting 2 files for every  
to-be-committed file in the transaction directory (without any  
sharding) makes it highly sensitive on the used file system; and while  
there are lots of hints available ("use dirindex"), etc., it's  
nonetheless one point that makes FSVS a bit less useful (for big  
commits!) than BDB.

So, because of this small point, I'd be happy with BDB obliteration  
support ;-)


Regards,

Phil

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402811

Re: Two approaches to data-hiding (for obliterate)

Posted by Branko Cibej <br...@xbc.nu>.
Mark Phippard wrote:
> On Thu, Oct 1, 2009 at 4:54 PM, Greg Stein <gs...@gmail.com> wrote:
>   
>> I would not be opposed to telling people "you must use FSFS for the
>> obliterate feature". I'm sure others may feel differently, but there's
>> my vote/opinion.
>>     
>
> I seem to recall Julian mentioned it would be easier to implement this
> with BDB.

Yes, it would be *much* easier with BDB, since it wasn't conceived and
designed as a write-once store.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402737

Re: Two approaches to data-hiding (for obliterate)

Posted by Mark Phippard <ma...@gmail.com>.
On Thu, Oct 1, 2009 at 4:54 PM, Greg Stein <gs...@gmail.com> wrote:
> On Thu, Oct 1, 2009 at 16:07, Julian Foad <ju...@wandisco.com> wrote:
>>...
>>>     * Authz is (currenty, brokenly :) a RA-layer-specific optional
>>>       feature; obliterate cannot be.
>>
>> That's not a conceptual difference. Authz is RA-layer-specific just a
>> de-facto outcome of how authz was developed. A comparable situation with
>> obliterate is that it could be FS-type-specific - implemented for FSFS
>> but not BDB or vice versa, and at least it will necessarily have
>> different supporting implementations for the two FS types.
>>
>> Of course users will want obliterate to be available on both FSFS and
>> BDB, and as developers we will want the design and implementation to be
>> as FS-agnostic as possible, but it will certainly be dependent on if and
>> when we implement equivalent things in two back-ends.
>
> I would not be opposed to telling people "you must use FSFS for the
> obliterate feature". I'm sure others may feel differently, but there's
> my vote/opinion.

I seem to recall Julian mentioned it would be easier to implement this
with BDB.  Would you agree that we would probably not be OK with only
supporting it fully in BDB?  IOW, supporting it in FSFS is probably a
must, and BDB is only nice.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402689

Re: Two approaches to data-hiding (for obliterate)

Posted by Greg Stein <gs...@gmail.com>.
On Thu, Oct 1, 2009 at 16:07, Julian Foad <ju...@wandisco.com> wrote:
>...
>>     * Authz is (currenty, brokenly :) a RA-layer-specific optional
>>       feature; obliterate cannot be.
>
> That's not a conceptual difference. Authz is RA-layer-specific just a
> de-facto outcome of how authz was developed. A comparable situation with
> obliterate is that it could be FS-type-specific - implemented for FSFS
> but not BDB or vice versa, and at least it will necessarily have
> different supporting implementations for the two FS types.
>
> Of course users will want obliterate to be available on both FSFS and
> BDB, and as developers we will want the design and implementation to be
> as FS-agnostic as possible, but it will certainly be dependent on if and
> when we implement equivalent things in two back-ends.

I would not be opposed to telling people "you must use FSFS for the
obliterate feature". I'm sure others may feel differently, but there's
my vote/opinion.

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402688

Re: Two approaches to data-hiding (for obliterate)

Posted by John Szakmeister <jo...@szakmeister.net>.
On Sat, Oct 3, 2009 at 6:19 AM, Branko Čibej <br...@xbc.nu> wrote:
[snip]
> That has always been the intention. Julian just decided to take two
> steps to get there instead of one; which IMHO is a good idea.

My apologies.  I agree.  It's a big task. :-)

-John

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2405371

Re: Two approaches to data-hiding (for obliterate)

Posted by Branko Cibej <br...@xbc.nu>.
John Szakmeister wrote:
> On Thu, Oct 1, 2009 at 4:07 PM, Julian Foad <ju...@wandisco.com> wrote:
>   
>> Branko Cibej wrote:
>>     
>>> Julian Foad wrote:
>>>       
>>>> On Thu, 2009-10-01 at 19:16 +0200, Branko Cibej wrote:
>>>>         
>>>>> Seriously: don't try to overload path-based authz [...]
>>>>>           
>>>> Thanks... but please read the rest of the thread first :-) I'm not.
>>>>         
>>> Well yes; i did read it. Part of the differenceis the "repos-level" vs.
>>> "filesystem-level", and with obliterate, you're deep in the latter.
>>> Sooner or later.
>>>       
>> I really appreciate your feedback. And yes you're right that I'll be
>> deep in the FS in the end, as my impression is that the space-saving
>> aspect of obliterate is the more important aspect for the majority of
>> users (users meaning administrators).
>>     
>
> The other big reason I've heard for having obliterate is the "Whups!
> I committed something that wasn't meant to be shared with these folks,
> and I want to remove all traces of it" factor.  Perhaps because of a
> IP violation, a password, etc.  It's good to be able to go back and
> say: "It's gone.  No, I really mean, it's *gone*."  That screams
> needing this at the FS level too.
>   

That has always been the intention. Julian just decided to take two
steps to get there instead of one; which IMHO is a good idea.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2403183

Re: Two approaches to data-hiding (for obliterate)

Posted by John Szakmeister <jo...@szakmeister.net>.
On Thu, Oct 1, 2009 at 4:07 PM, Julian Foad <ju...@wandisco.com> wrote:
> Branko Cibej wrote:
>> Julian Foad wrote:
>> > On Thu, 2009-10-01 at 19:16 +0200, Branko Cibej wrote:
>> >> Seriously: don't try to overload path-based authz [...]
>> >
>> > Thanks... but please read the rest of the thread first :-) I'm not.
>
>> Well yes; i did read it. Part of the differenceis the "repos-level" vs.
>> "filesystem-level", and with obliterate, you're deep in the latter.
>> Sooner or later.
>
> I really appreciate your feedback. And yes you're right that I'll be
> deep in the FS in the end, as my impression is that the space-saving
> aspect of obliterate is the more important aspect for the majority of
> users (users meaning administrators).

The other big reason I've heard for having obliterate is the "Whups!
I committed something that wasn't meant to be shared with these folks,
and I want to remove all traces of it" factor.  Perhaps because of a
IP violation, a password, etc.  It's good to be able to go back and
say: "It's gone.  No, I really mean, it's *gone*."  That screams
needing this at the FS level too.

-John

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2403180

Re: Two approaches to data-hiding (for obliterate)

Posted by Julian Foad <ju...@wandisco.com>.
Branko Cibej wrote:
> Julian Foad wrote:
> > On Thu, 2009-10-01 at 19:16 +0200, Branko Cibej wrote:
> >> Seriously: don't try to overload path-based authz [...]
> >
> > Thanks... but please read the rest of the thread first :-) I'm not.

> Well yes; i did read it. Part of the differenceis the "repos-level" vs.
> "filesystem-level", and with obliterate, you're deep in the latter.
> Sooner or later.

I really appreciate your feedback. And yes you're right that I'll be
deep in the FS in the end, as my impression is that the space-saving
aspect of obliterate is the more important aspect for the majority of
users (users meaning administrators).

> > But even after reading what I'm thinking about, you may think it's not
> > something I should even think about. Why am I thinking like this?
> > Because I see a similarity between authz and the data-hiding requirement
> > which is a necessary semantic part of "obliterate". I think it is worth
> > examining this similarity to learn something from it, even though the
> > obvious (and almost vertainly best) implementation is to actually remove
> > stuff from the FS rather than introduce a special "hiding" layer.
> 
> There are similarities ... I can see two:
> 
>     * The concept of hiding something from clients.
>     * The pain that happens to clients' existing working copies when you
>       hide or unhide something.
> 
> There are also differences:
> 
>     * Authz is (currently) time-irrelevant; your proposed obliterate
>       works on specific revisions.
>     * Obliterate doesn't care who's looking at the tree; authz does.

Yup, two clear differences.

>     * Authz is (currenty, brokenly :) a RA-layer-specific optional
>       feature; obliterate cannot be.

That's not a conceptual difference. Authz is RA-layer-specific just a
de-facto outcome of how authz was developed. A comparable situation with
obliterate is that it could be FS-type-specific - implemented for FSFS
but not BDB or vice versa, and at least it will necessarily have
different supporting implementations for the two FS types.

Of course users will want obliterate to be available on both FSFS and
BDB, and as developers we will want the design and implementation to be
as FS-agnostic as possible, but it will certainly be dependent on if and
when we implement equivalent things in two back-ends.


> Last but not least: Data hiding is only one aspect of obliterate. Wait
> one moment while I move these sheep's entrails to grab my crystal ball
> ... ah. It says here that once you start on the space-saving and/or
> data-erasing aspect, you'll want the FS-level data hiding anyway as
> twiddling txns is unavoidable.

Maybe. I think what you're getting at is that data-hiding at the repos
layer, with data-destruction going on underneath it at the FS layer, is
not a clean design because the two layers would have to synchronize
their knowledge of what's hidden. If they didn't, the repos layer might
expose paths that have been destroyed at the FS layer. Or that sort of
thing: a spurious coupling between the layers.

> Also, as far as I can remember, our authz isn't really very good at
> hiding paths. ISTR issues about that, e.g., that when you have no access
> to A/X, you can still see X in A, though not its contents.

Sure. That doesn't particularly bother me for obliterate purposes.

> So it appears that if you piggy-back "hiding obliterate" onto the authz
> layer, you may first have to fix the authz layer, and then later do all
> the FS-related work anyway. Low-hanging fruit may on occasion turn out
> to be overripe durian, but de gustibus non disputandum est.

Don't know about having to "fix the authz layer". Is it the case that
the RA-layer-specific nature of authz is ingrained so that the authz
callbacks would not even be invoked in RA-local if RA-local provided
implementations of them?

Thanks again. And yes I basically agree that this isn't the way to go,
but still think it's important to understand it.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402671

Re: Two approaches to data-hiding (for obliterate)

Posted by Branko Cibej <br...@xbc.nu>.
Julian Foad wrote:
> On Thu, 2009-10-01 at 19:16 +0200, Branko Cibej wrote:
>   
>> Julian Foad wrote:
>>     
>>> I'm thinking about the data-hiding part of "obliterate".
>>>
>>> Example: obliterate (hide) /path/to/foo in revision 50.
>>>
>>>
>>> * Approach 1:
>>>
>>>   Edit the FS in the database: replace the old revision 50 with a new
>>> transaction that's mostly the same except doesn't have an entry for
>>> "foo" in its "/dir/to" directory node. "Commit" the new transaction in
>>> place of revision 50, adjusting all links that pointed to the old
>>> revision 50 to point to the new one instead.
>>>
>>>   This is an FS-layer change, plus support needed in the DB layers to be
>>> able to replace an existing revision and edit other metadata.
>>>
>>>
>>> * Approach 2:
>>>
>>>   Extend Subversion's path-based authz semantics to enable us to block
>>> access to /path/to/foo@50.
>>>   
>>>       
>> I was just wondering why I saw nails everywhere I looked, until I
>> noticed the hammer in my hand ... :)
>>
>> Seriously: don't try to overload path-based authz [...]
>>     
>
> Thanks... but please read the rest of the thread first :-) I'm not.
>   

Well yes; i did read it. Part of the differenceis the "repos-level" vs.
"filesystem-level", and with obliterate, you're deep in the latter.
Sooner or later.

> But even after reading what I'm thinking about, you may think it's not
> something I should even think about. Why am I thinking like this?
> Because I see a similarity between authz and the data-hiding requirement
> which is a necessary semantic part of "obliterate". I think it is worth
> examining this similarity to learn something from it, even though the
> obvious (and almost vertainly best) implementation is to actually remove
> stuff from the FS rather than introduce a special "hiding" layer.
>   

There are similarities ... I can see two:

    * The concept of hiding something from clients.
    * The pain that happens to clients' existing working copies when you
      hide or unhide something.

There are also differences:

    * Authz is (currently) time-irrelevant; your proposed obliterate
      works on specific revisions.
    * Obliterate doesn't care who's looking at the tree; authz does.
    * Authz is (currenty, brokenly :) a RA-layer-specific optional
      feature; obliterate cannot be.

Last but not least: Data hiding is only one aspect of obliterate. Wait
one moment while I move these sheep's entrails to grab my crystal ball
... ah. It says here that once you start on the space-saving and/or
data-erasing aspect, you'll want the FS-level data hiding anyway as
twiddling txns is unavoidable.

Also, as far as I can remember, our authz isn't really very good at
hiding paths. ISTR issues about that, e.g., that when you have no access
to A/X, you can still see X in A, though not its contents.

So it appears that if you piggy-back "hiding obliterate" onto the authz
layer, you may first have to fix the authz layer, and then later do all
the FS-related work anyway. Low-hanging fruit may on occasion turn out
to be overripe durian, but de gustibus non disputandum est.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402654

Re: Two approaches to data-hiding (for obliterate)

Posted by Julian Foad <ju...@wandisco.com>.
On Thu, 2009-10-01 at 19:16 +0200, Branko Cibej wrote:
> Julian Foad wrote:
> > I'm thinking about the data-hiding part of "obliterate".
> >
> > Example: obliterate (hide) /path/to/foo in revision 50.
> >
> >
> > * Approach 1:
> >
> >   Edit the FS in the database: replace the old revision 50 with a new
> > transaction that's mostly the same except doesn't have an entry for
> > "foo" in its "/dir/to" directory node. "Commit" the new transaction in
> > place of revision 50, adjusting all links that pointed to the old
> > revision 50 to point to the new one instead.
> >
> >   This is an FS-layer change, plus support needed in the DB layers to be
> > able to replace an existing revision and edit other metadata.
> >
> >
> > * Approach 2:
> >
> >   Extend Subversion's path-based authz semantics to enable us to block
> > access to /path/to/foo@50.
> >   
> 
> I was just wondering why I saw nails everywhere I looked, until I
> noticed the hammer in my hand ... :)
> 
> Seriously: don't try to overload path-based authz [...]

Thanks... but please read the rest of the thread first :-) I'm not.

But even after reading what I'm thinking about, you may think it's not
something I should even think about. Why am I thinking like this?
Because I see a similarity between authz and the data-hiding requirement
which is a necessary semantic part of "obliterate". I think it is worth
examining this similarity to learn something from it, even though the
obvious (and almost vertainly best) implementation is to actually remove
stuff from the FS rather than introduce a special "hiding" layer.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402626

Re: Two approaches to data-hiding (for obliterate)

Posted by Branko Cibej <br...@xbc.nu>.
Julian Foad wrote:
> I'm thinking about the data-hiding part of "obliterate".
>
> Example: obliterate (hide) /path/to/foo in revision 50.
>
>
> * Approach 1:
>
>   Edit the FS in the database: replace the old revision 50 with a new
> transaction that's mostly the same except doesn't have an entry for
> "foo" in its "/dir/to" directory node. "Commit" the new transaction in
> place of revision 50, adjusting all links that pointed to the old
> revision 50 to point to the new one instead.
>
>   This is an FS-layer change, plus support needed in the DB layers to be
> able to replace an existing revision and edit other metadata.
>
>
> * Approach 2:
>
>   Extend Subversion's path-based authz semantics to enable us to block
> access to /path/to/foo@50.
>   

I was just wondering why I saw nails everywhere I looked, until I
noticed the hammer in my hand ... :)

Seriously: don't try to overload path-based authz that is not even part
of the FS proper (and therefore doesn't work for local access, amongst
other things) for solving a prolem that isn't really related to
authorization. It's liable to turn into a can of worms.

Dealing with FS issues and switching txns possibly more complex up
front, but is certainly the right approach to obliterate. IIRC doing the
revision replacement dance shouldn't be too hard, since a revision is
basically just a numbered pointer to a transaction.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402616

Re: Two approaches to data-hiding (for obliterate)

Posted by "C. Michael Pilato" <cm...@collab.net>.
Julian Foad wrote:
> I am not thinking of extending mod_dav_svn, but rather of extending the
> Subversion-internal side of this interaction. Extending the idea of
> Subversion's question (step 1) to include "at revision R" and extending
> if necessary the way Subversion handles its three different actions
> (step 4). Then implementing the "obliterated node-revs" database and the
> obliteration-authz look-up function centrally within Subversion. Like
> this:
> 
> 
>   1. Subversion wants to know if it can read or write something;
> 
>   2. mod_dav_svn (or other plug-in authz module) is asked (by
>      Subversion or by Apache, doesn't matter here) about path P
>      for user U;
> 
>   3. mod_dav_svn (or other) consults its authz rules or database
>      and replies "writable" or "read-only" or "no access";
> 
> + 4. Subversion asks its "obliterated path-revs database" whether
>      path P at rev R is accessible; if not, then change the authz
>      result to "no access";
> 
>   5. Subversion takes one of three different actions depending on
>      whether the "authz result" is "writable" or "read-only" or
>      "no access".
> 
> The set up and administration of the "obliterated path-revs database"
> used in step 4 can be completely new, not tied to mod_authz_svn or to
> any optional component, and can have whatever administrative interface
> we want.

If you go this route, you'll be happy to learn that our path-based
authorization subsystem already passes around FS roots and paths, so you get
the 'rev R' bit for free.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2404223

Re: Two approaches to data-hiding (for obliterate)

Posted by Julian Foad <ju...@wandisco.com>.
On Thu, 2009-10-01 at 10:53 -0400, Mark Phippard wrote:
> On Thu, Oct 1, 2009 at 10:48 AM, Julian Foad <ju...@wandisco.com> wrote:
> > I'm thinking about the data-hiding part of "obliterate".
> >
> > Example: obliterate (hide) /path/to/foo in revision 50.
> >
> >
> > * Approach 1:
> >
> >  Edit the FS in the database: replace the old revision 50 with a new
> > transaction that's mostly the same except doesn't have an entry for
> > "foo" in its "/dir/to" directory node. "Commit" the new transaction in
> > place of revision 50, adjusting all links that pointed to the old
> > revision 50 to point to the new one instead.
> >
> >  This is an FS-layer change, plus support needed in the DB layers to be
> > able to replace an existing revision and edit other metadata.
> >
> >
> > * Approach 2:
> >
> >  Extend Subversion's path-based authz semantics to enable us to block
> > access to /path/to/foo@50.
> 
> I do not think this would be a great way to do it.  mod_authz_svn is
> an optional module and [...]

Thanks for the comments. This is just the sort of thing I need to
understand and think about.

I didn't explain what I meant. For the purposes of explaining this line
of thought, the way I imagine path-based authz works now is (please
excuse the crudeness of my understanding):

  1. Subversion wants to know if it can read or write something;

  2. mod_dav_svn (or other plug-in authz module) is asked (by
     Subversion or by Apache, doesn't matter here) about path P
     for user U;

  3. mod_dav_svn (or other) consults its authz rules or database
     and replies "writable" or "read-only" or "no access";

  4. Subversion takes one of three different actions depending on
     whether the "authz result" is "writable" or "read-only" or
     "no access".


I am not thinking of extending mod_dav_svn, but rather of extending the
Subversion-internal side of this interaction. Extending the idea of
Subversion's question (step 1) to include "at revision R" and extending
if necessary the way Subversion handles its three different actions
(step 4). Then implementing the "obliterated node-revs" database and the
obliteration-authz look-up function centrally within Subversion. Like
this:


  1. Subversion wants to know if it can read or write something;

  2. mod_dav_svn (or other plug-in authz module) is asked (by
     Subversion or by Apache, doesn't matter here) about path P
     for user U;

  3. mod_dav_svn (or other) consults its authz rules or database
     and replies "writable" or "read-only" or "no access";

+ 4. Subversion asks its "obliterated path-revs database" whether
     path P at rev R is accessible; if not, then change the authz
     result to "no access";

  5. Subversion takes one of three different actions depending on
     whether the "authz result" is "writable" or "read-only" or
     "no access".

The set up and administration of the "obliterated path-revs database"
used in step 4 can be completely new, not tied to mod_authz_svn or to
any optional component, and can have whatever administrative interface
we want.

The last step (step 5) is interesting to me, as that is where Subversion
knows how to deal with a path that it can no longer access (or rather is
no longer allowed to send to the client). This is a crucial piece of the
"obliterate" puzzle. The feasibility of this approach depends on the
degree to which Subversion is tied to path-based rather than path-rev
based authz in this operation.


> Besides, a very common use case today for obliterate is to simply
> block that a path ever existed in the repository (all revisions).
> This can be done today using authz but I have never heard anyone
> suggest they considered that a decent option.  So there is no reason
> to think taking this idea further would help.

That doesn't sound quite right. I strongly expect that people are
already using path-based authz to hide paths that unexpectedly became
sensitive. Admittedly the method of its administration and the fact that
it's specific to HTTP and svnserve access methods make it not the ideal
interface for obliteration, so maybe people do find it "not a decent
option". But I'm not thinking of building on the existing
implementations of path-based authz so I think this is moot.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402573

Re: Two approaches to data-hiding (for obliterate)

Posted by Mark Phippard <ma...@gmail.com>.
On Thu, Oct 1, 2009 at 10:48 AM, Julian Foad <ju...@wandisco.com> wrote:
> I'm thinking about the data-hiding part of "obliterate".
>
> Example: obliterate (hide) /path/to/foo in revision 50.
>
>
> * Approach 1:
>
>  Edit the FS in the database: replace the old revision 50 with a new
> transaction that's mostly the same except doesn't have an entry for
> "foo" in its "/dir/to" directory node. "Commit" the new transaction in
> place of revision 50, adjusting all links that pointed to the old
> revision 50 to point to the new one instead.
>
>  This is an FS-layer change, plus support needed in the DB layers to be
> able to replace an existing revision and edit other metadata.
>
>
> * Approach 2:
>
>  Extend Subversion's path-based authz semantics to enable us to block
> access to /path/to/foo@50.

I do not think this would be a great way to do it.  mod_authz_svn is
an optional module and it is possible to put other authz
implementations in place.  Of course it also does not apply to file://
access which means that tools like ViewVC that access the repository
directly would have to be enhanced to apply whatever authz rules we
are doing in mod_authz.

Besides, a very common use case today for obliterate is to simply
block that a path ever existed in the repository (all revisions).
This can be done today using authz but I have never heard anyone
suggest they considered that a decent option.  So there is no reason
to think taking this idea further would help.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402559