You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2009/02/04 22:59:27 UTC
Re: Comment on obliterate functional specification
Magnus,
As I saw no other response, I'll just speak up and mention that your
proposal sounds extremely sensible. I haven't followed the previous
history or proposals of the feature, though.
Of course, the test of whether your proposal really does simplify the
whole thing is in the details of what OBLITERATION SETS are needed, and
how they can be constructed, to satisfy the end-user goals.
Can I encourage you to submit a patch to the document, that incorporates
your proposal and at least makes a start on describing how it is used to
solve the goals? Or write a second proposal that we can check in beside
the existing one.
- Julian
On Tue, 2009-01-20 at 20:14 -0800, Magnus wrote:
> I have been going through the discussion of the obliterate feature, which,
> although it has tended to start and stop, has now found a home in the
> functional spec.
> (trunk/notes/obliterate/obliterate-functional-spec.txt)
>
> The design of this behavior is not trivial, but although I believe that
> the svnsync approach suggested by Karl Fogel (and others, off-list, if I
> understand correctly) result in a much more feasible design, I do not
> completely agree with his comments (on dev) that there are many possible
> ways in which it could behave. More specifically, I believe that different
> obliteration use cases all need to be built around a core obliteration
> functionality, and that there really is only one good option for
> implementing that core in a way which does not lead into a
> quagmire of ill-defined outcomes.
>
> Using the language of the specification, (along with a new concept,
> that of an OBLITERATION SET) this core consists of:
>
>
> 1: SELECT multiple modifications. These modifications comprise the
> OBLITERATION SET in the form of multiple PATH@REV pairs.
>
> 2: OBLITERATE selected modifications.
>
>
> Short and sweet :-)
>
> Three observations result from this way of viewing the matter, the first
> of which is crucial in my view, the others are "convenience observations"
>
> A: The data of a PATH@REV that does NOT intersect with the OBLITERATION SET
> is UNCHANGED by an obliteration. Always. History data may change when an
> ancestor of the PATH@REV has been obliterated, but:
> svn co REPO\PATH@REV LOCALPATH
> results in EXACTLY the same working copy when REPO is the
> post-obliterate repository as when it is the original repository.
>
> B: There is no "obliteration of files" that is independent of the
> obliteration of modifications. To "obliterate a file" (or directory),
> one simply has to obliterate every single modification to that file.
> Thus, if a file needs to be completely obliterated, this can be done
> by specifying a PATH@REV, finding all ancestors, direct descendants
> (and optionally copied descendants), and including each of them
> in the OBLITERATION SET.
>
> C: There is no "obliteration of revisions" that is independent of the
> obliteration of modifications. To "obliterate a revision", one simply
> has to obliterate the modifications in the OBLITERATION SET implied by
> "/@REV".
>
> If point A is agreed on, I believe that the functional specification could
> be simplified quite a bit, with a main section on how to implement
> functionality consistent with A, and additional sections on how to
> implement specific use cases through the construction of the
> OBLITERATION SET in different ways.
>
> I would appreciate any comments on this, and if others concur with this
> view, I might contribute a patch to the functional-spec with some edits
> reflecting this approach.
>
> Best,
> Magnus
>
> ps. I posted this earlier today but it seems to have disappeared. I'm terribly sorry if I am double-posting.
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1040326
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1104601
Re: Comment on obliterate functional specification
Posted by "Magnus ..." <zu...@gmail.com>.
No, I don't believe this has been archived anywhere
outside the dev list. But not to worry, I'm still on
the case. I submitted this text as a patch to the
functional specification. See:
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1135837
And I will make sure that any insights from this discussion
get either committed or explicitly rejected before my
involvement ends.
In fact, I have two more installments ready in this little
obliteration series, but I've been hesitant to compete for
bandwidth with the effort to get 1.6 out the door. I'll
probably post them in the near future and repost the patch
I submitted as well.
Best,
Magnus
Daniel Shahaf wrote:
>
> Have we archived this somewhere? On issue #516, or in the notes/
> directory, etc.?
>
> Daniel
>
>> -------------------------------------------------
>>
>> DEFINITION OF THE OBLITERATION OPERATION
>>
>> ...
>>
>>
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1205597
Re: Comment on obliterate functional specification
Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Magnus wrote on Thu, 5 Feb 2009 at 08:30 -0800:
> Thanks for the encouragement, Julian. As a matter of fact, I
> had written up more on the definition, but had intended to hold
> off until after relese 1.6, assuming that things would ease up
> after that. However, I will send what I have prepared now,
> and would welcome any comments.
>
> The following text would belong somewhere early in a revised
> functional specification for obliteration:
>
Have we archived this somewhere? On issue #516, or in the notes/
directory, etc.?
Daniel
> -------------------------------------------------
>
> DEFINITION OF THE OBLITERATION OPERATION
>
> An OBLITERATION SET is defined by a list of PATH@REVISON elements
> (that is, each element is a pair, consisting of a PATH and REVISION).
> The same PATH can be paired with multiple REVISIONS to form
> multiple elements and vice versa.
>
> Note: The set is restricted so that if, for a given REVISION,
> PATH@REVISION is part of the OBLITERATION SET, any element of
> the of the form [PATH/RELATIVEPATH]@REVISION is also part of
> the set. (This simply means that if a directory change is
> obliterated in a revision, all changes to its contents must
> also be obliterated in the same revision).
> [Note on the note. Perhaps this restriction can be lifted.
> However, it seems that doing so would greatly complicate
> both the behavior and implementation of the operation,
> without much benefit.]
>
> An ORIGINAL repository is a repository to which an OBLITERATION
> operation could be applied, but has not (this includes any
> subversion repository without obliterations).
>
> A MODIFIED repository is a repository which is identical to the
> ORIGINAL but for which an OBLITERATION SET has been defined and
> an OBLITERATION operation has been applied.
>
> The OBLITERATION operation is defined by the following two properties:
>
> 1. If a PATH@REVISION is checked out of the MODIFIED repository,
> and the PATH@REVISION is NOT in the OBLITERATION SET, the
> checkout data is identical to what would have been returned
> if PATH@REVISION had been checked out of the ORIGINAL.
>
> 2. If a PATH@REVISION is checked out of the MODIFIED repository,
> and the PATH@REVISION IS in the OBLITERATION SET, the
> checkout data is identical to what would have been returned
> if PATH@REVPRIOR had been checked out of the ORIGINAL, where
> REVPRIOR is the last revision prior to REVISION for which
> PATH@REVPRIOR is not in the OBLITERATION SET.
>
> 3. Any other mechanism through which a user can interact with
> the repository (diff/merge/copy/commit/etc) should work
> consistently. That is, assume that a REFERENCE repository
> existed from which nothing had been obliterated, but for
> which any checkout operation yielded the same data as for the
> MODIFIED repository. Then every remote interaction with
> MODIFIED must yield a result indistinguishable from what
> would happen if the same operation were applied to the
> REFERENCE repository.
>
> Note: Here, data refers to the reported existence of the path,
> the versioned properties that apply to the path, and for files,
> the actual contents of the file.
>
> Note: This definition does not state what happens to
> revision properties (several options are available), and it
> does not state what happens to the reported history of
> the path (again, several options are available).
>
> Note: Implicit in the above is the fact that the core
> OBLITERATION functionality would not drop empty revisions.
> This is intentional, and dropping empty revisions should be
> done through a separate mechanism.
>
> -------------------------------------------------
>
> The above definition fulfills several desirable criteria:
> * It is in my view parsimonious
> * It is relatively short
> * It has clearly defined behavioral implications
>
> However, the make-or-break criteria are of course two:
> * Can obliteration, as defined above, be feasibly implemented?
> * Would such an implementation address all required use-cases?
>
> I believe the answer to both of the above questions to be yes,
> and I would be happy to elaborate on why I believe this to
> be the case, through discussions on the mailing list and through
> patches to the functional specification.
>
> Best regards,
> Magnus
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1108134
>
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1200601
Re: Comment on obliterate functional specification
Posted by ac...@zulutime.net.
Agreed, as I mentioned this is intended "somewhere early in a
revised functional specification", it is not supposed to be the
complete specification.
The two issues you mention do indeed need to be agreed on.
However, I would like to note here that in my opinion:
1. Given a MODIFIED repository, property 3 of the definition
quaranties that space can be reclaimed and data completely
removed from the repository through a svnsync->replace cycle.
(Since svnsync must by definition see the repository as
it would have been had the obliterated revisions never
been committed).
2. The sync->replace cycle approach to obliteration was
originally suggested by Karl Fogel (although I believe
he envisioned the obliteration logic as residing in the
sync-type program itself, rather than using svnsync as-is).
For an utter and complete removal from the repository, I
believe that such a sync->replace cycle (perhaps made
transparent by encapsulating it in a single operation)
is the best that can be achieved without vastly
complicating the operation. The reason:
If the repository is traversed from revision 0 up to HEAD, it
is close to impossible to know if a (much) later revision
includes paths copied from the obliterated sets (in which
case the data would be lost). Thus, the operation would first
need to compile a list of every single copy operation, check
every obliteration against this list, and store the data
somewhere before removing it from the repository.
If the repository is traversed from revision HEAD down to 0,
it will, upon encountering an obliterated modification at
revision N, merge that modification with whatever happened
at revision N+1. If it then encounters a modification of the
same path at revision N-1, it will have to go back to
N+1 to merge the N-1 mod with the joint N and N+1 mod. This
means that in doing the obliteration, a revision can not
be finalized until the obliteration is over. Furthermore,
whenever a copy is encountered, logic must be applied to
figure out where it should originate AFTER obliterations
are performed, and it rewritten. (This might require it to
be rewritten in a way that is not consistent with the
earlier revision as it is occurs BEFORE modification)
Thus to sum up: Any "true" obliteration mechanism
that does not have access to the complete data of the
original repository during the whole obliteration operation
will become hopelessly nonlinear in any scenario, and close
to impossible to implement on a live repository. My functional
specification will therefore not require such an approach.
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1110077
Re: Comment on obliterate functional specification
Posted by Philipp Marek <ph...@emerion.com>.
Hello David,
On Freitag, 6. Februar 2009, David Glasser wrote:
> That's a good write-up, but it doesn't handle the other big design
> decisions for obliterate: whether it's acceptable for the data to be
> reconstructible by somebody with direct access to the repository, and
> whether it's acceptable for space to not be reclaimed after
> obliterate.
>
> (For FSFS in particular, the answer to these questions hugely
> constrains implementation alternatives, since node IDs include the
> offset in a rev file.)
Of course, just changing a few pointers so that the obliterated data becomes
unreachable is fine as a fast operation that can be done while running
normally. (Apart from destroying some users' working copies, if they currently
have some to-be-obliterated data checked out, and where the next update would
[probably] result in wrong data.)
But I think there should be at least some way (eg. by using "svnadmin pack")
to reclaim the space.
Wiping the data (with a single pass of zeroes) might work for some people,
too, but as there's no easy way to punch holes in files (ie. make them sparse
in the middle) the space would be lost.
About the node IDs: How about some kind of "svnadmin pack -rX", that keeps all
offsets intact (to avoid having to change *a lot* of revisions), but skips
blocks where possible to make the file sparse? Sounds like an easy way for me.
Regards,
Phil
Re: Comment on obliterate functional specification
Posted by David Glasser <gl...@davidglasser.net>.
That's a good write-up, but it doesn't handle the other big design
decisions for obliterate: whether it's acceptable for the data to be
reconstructible by somebody with direct access to the repository, and
whether it's acceptable for space to not be reclaimed after
obliterate.
(For FSFS in particular, the answer to these questions hugely
constrains implementation alternatives, since node IDs include the
offset in a rev file.)
--dave
On Thu, Feb 5, 2009 at 8:30 AM, Magnus <ac...@zulutime.net> wrote:
> Thanks for the encouragement, Julian. As a matter of fact, I
> had written up more on the definition, but had intended to hold
> off until after relese 1.6, assuming that things would ease up
> after that. However, I will send what I have prepared now,
> and would welcome any comments.
>
> The following text would belong somewhere early in a revised
> functional specification for obliteration:
>
> -------------------------------------------------
>
> DEFINITION OF THE OBLITERATION OPERATION
>
> An OBLITERATION SET is defined by a list of PATH@REVISON elements
> (that is, each element is a pair, consisting of a PATH and REVISION).
> The same PATH can be paired with multiple REVISIONS to form
> multiple elements and vice versa.
>
> Note: The set is restricted so that if, for a given REVISION,
> PATH@REVISION is part of the OBLITERATION SET, any element of
> the of the form [PATH/RELATIVEPATH]@REVISION is also part of
> the set. (This simply means that if a directory change is
> obliterated in a revision, all changes to its contents must
> also be obliterated in the same revision).
> [Note on the note. Perhaps this restriction can be lifted.
> However, it seems that doing so would greatly complicate
> both the behavior and implementation of the operation,
> without much benefit.]
>
> An ORIGINAL repository is a repository to which an OBLITERATION
> operation could be applied, but has not (this includes any
> subversion repository without obliterations).
>
> A MODIFIED repository is a repository which is identical to the
> ORIGINAL but for which an OBLITERATION SET has been defined and
> an OBLITERATION operation has been applied.
>
> The OBLITERATION operation is defined by the following two properties:
>
> 1. If a PATH@REVISION is checked out of the MODIFIED repository,
> and the PATH@REVISION is NOT in the OBLITERATION SET, the
> checkout data is identical to what would have been returned
> if PATH@REVISION had been checked out of the ORIGINAL.
>
> 2. If a PATH@REVISION is checked out of the MODIFIED repository,
> and the PATH@REVISION IS in the OBLITERATION SET, the
> checkout data is identical to what would have been returned
> if PATH@REVPRIOR had been checked out of the ORIGINAL, where
> REVPRIOR is the last revision prior to REVISION for which
> PATH@REVPRIOR is not in the OBLITERATION SET.
>
> 3. Any other mechanism through which a user can interact with
> the repository (diff/merge/copy/commit/etc) should work
> consistently. That is, assume that a REFERENCE repository
> existed from which nothing had been obliterated, but for
> which any checkout operation yielded the same data as for the
> MODIFIED repository. Then every remote interaction with
> MODIFIED must yield a result indistinguishable from what
> would happen if the same operation were applied to the
> REFERENCE repository.
>
> Note: Here, data refers to the reported existence of the path,
> the versioned properties that apply to the path, and for files,
> the actual contents of the file.
>
> Note: This definition does not state what happens to
> revision properties (several options are available), and it
> does not state what happens to the reported history of
> the path (again, several options are available).
>
> Note: Implicit in the above is the fact that the core
> OBLITERATION functionality would not drop empty revisions.
> This is intentional, and dropping empty revisions should be
> done through a separate mechanism.
>
> -------------------------------------------------
>
> The above definition fulfills several desirable criteria:
> * It is in my view parsimonious
> * It is relatively short
> * It has clearly defined behavioral implications
>
> However, the make-or-break criteria are of course two:
> * Can obliteration, as defined above, be feasibly implemented?
> * Would such an implementation address all required use-cases?
>
> I believe the answer to both of the above questions to be yes,
> and I would be happy to elaborate on why I believe this to
> be the case, through discussions on the mailing list and through
> patches to the functional specification.
>
> Best regards,
> Magnus
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1108134
>
--
glasser@davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1109763
Re: Comment on obliterate functional specification
Posted by Magnus Torfason <zu...@gmail.com>.
Branko Cibej wrote:
> Consider your example of a "bad" comment in the code -- you do want to
> find all the versions of the file in HEAD (all branches and tags, too)
> that contain the offending text, an automated relatedness search will
> help there. But then you have to fix all those variants (perhaps by
> applying the same patch to all) and likely *not* obliterate the fixed
> versions, only the ones from your original list of relatives. An
> automated obliterate-by-bloodline would happily kill off your latest
> fixed versions in HEAD too. :)
I agree with your general analysis, as well as your comments in
following email on the great fuzziness of allowing the system to
retroactively edit a file's contents throughout its history. On that
note, I would like to point out that the functional specification
already contains the following text (from before I started messing
around with it):
"The lowest level of modification we should consider is the change
to a file or directory committed in a specific revision.
(Read: no need to support obliterating a single line in a document)"
And I think we absolutely should not allow the "modifying" history
(as contrasted with "erasing" history) use-case to enter into the
specification. (Read: no retroactive applying of patches to
non-head revisions)
> (Which raises another interesting question: what happens to object
> relatedness if you obliterate key links in the revision tree?)
Yes, this is very interesting and important.
If the obliteration does not affect the existence of a source
path@rev, my view is that a copy from path@rev should continue to
originate from the same path@rev. The delta needs to change, but as
subversion already allows copy+modify to occur in a single commit,
this does not seem like a problem to me.
If the existence of path@rev changes with obliteration (i.e. path@rev
disappears), then the simplest thing is to just let the copy get
converted to an add. This is what svnsync does currently in all cases,
as a former post of mine demonstrates:
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1234159
That behavior could be improved in my opinion, in the following manner:
If the copy from path@rev becomes obsolete, but a copy from an earlier
path@rev in the object's history is possible (typically because it has
been copied before), then the copy should be rewritten to come from
the latest previous path@rev that exists in the repo.
If the copy source disappears in the obliteration, and all its prior
history is being obliterated as well, I think there is no real option
other than to convert the copy to a plain add. I've toyed around with
ideas where the copy direction would get switched so that when there
was originally an A->B copy, but then the initial part of A's history
gets obliterated so that B comes into existence first, the addition of
A would be recorded as a B->A copy. But that is just to ugly to
consider seriously (IMHO).
I do like this discussion, I feel that a lot of ambiguity about the
obliteration functionality is getting cleared up here. I realize that
we are still in the middle of a big release, but I do hope
that a level of agreement can be reached, which can then be codified
into the functional specification (I volunteer to do the
codification), and (if we are lucky) into implementation notes.
Best,
Magnus
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1263069
Re: Comment on obliterate functional specification
Posted by Branko Cibej <br...@xbc.nu>.
Magnus Torfason wrote:
> However, even in the disk-space story, data *is* destroyed
> (imagine a file that was deleted because it was not useful at the
> time, and "hey, it's all in subversion, so I can just keep my
> directory clean without having to worry"). Someone naively running
> "svn archive" and then wanting to restore an old file might be in
> for a nasty surprise.
>
I'd expect "svn archive" to do exactly that -- split old stuff out of
the repo, but archive it in a way that keeps it marginally accessible,
so that ancient archived data can be reconstructed.
-- Brane
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1278537
Re: Comment on obliterate functional specification
Posted by Magnus Torfason <zu...@gmail.com>.
On 3/4/2009 6:04 PM, Jack Repenning wrote:
> [In the disc-space story,] we want to remove the space no longer
> in use for any path/revs that should remain available post-
> obliteration, but the space that makes up some ancient delta
> which is still in use, post-obliteration, we should not remove.
> That is: a post-obliterate checkout of path@HEAD should show the
> same result as it did before obliteration, even if the post-
> obliteration checkout includes some text which was introduced
> into the repository during some now-removed revision.
I agree 100%.
> [...] the "security" story, [...] wants to remove *information* even
> from current versions.
>
> ...
>
> - "security" wants to remove information, requires absolute removal
> throughout all revisions, and is willing to sacrifice working copy
> continuity.
This is very true. I'll admit that the "security" story is a bit
further from my day-to-day reality than the "disk-space" story.
However, I've been working on a writeup of a use-case that I
envision, along with the work flow to resolve it using what's
in the functional specification.
I hope to post that in the relatively near future.
> It's remarkably hard for me to think of these two things as the
> same operation! I would call the "disc space story" something
> else, "archive," because as a practical matter all our customers
> keep asking us for this function, and they always call it
> "archive." I would leave the name "obliterate" for the "security
> story," because though relatively few of our customers ever
> mention this, when it comes up, that's the sort of term they
> use for it.
I get where you're coming from. I think the idea is that since the
two both involve changing old revs in the repo, they belong together
in implementation, even if that would not rule out differing
user interfaces.
However, even in the disk-space story, data *is* destroyed
(imagine a file that was deleted because it was not useful at the
time, and "hey, it's all in subversion, so I can just keep my
directory clean without having to worry"). Someone naively running
"svn archive" and then wanting to restore an old file might be in
for a nasty surprise.
Best,
Magnus
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1273982
Re: Comment on obliterate functional specification
Posted by Jack Repenning <jr...@collab.net>.
On Mar 3, 2009, at 8:36 AM, Magnus Torfason wrote:
> I have to think about svn blame. Are you saying that "svn blame"
> should continue to return the same output as before the obliteration?
Sorry, no, not particularly. I was using "svn blame" as a short hand
for "infinite knowledge about the ancestry of every byte in every path/
rev," and specifically using it to describe our knowledge of the repo
just *before* the obliteration. Guess I over-simplified my wording a
mite. To restate the paragraph without mentioning "blame,"
[In the disc-space story,] we want to remove the space no longer in
use for any path/revs that should remain available post-obliteration,
but the space that makes up some ancient delta which is still in use,
post-obliteration, we should not remove. That is: a post-obliterate
checkout of path@HEAD should show the same result as it did before
obliteration, even if the post-obliteration checkout includes some
text which was introduced into the repository during some now-removed
revision.
That is, I was drawing the distinction between the "security" story,
which wants to remove *information* even from current versions, and
the "space" story, which wants no change in current and near-current
(that is, post-obliteration) checkouts.
Or, to point up the difference in another way:
- "space" wants to save disc space, requires no change in recent
revisions (and working copy continuity), and is willing to sacrifice
invariance of checkouts of older revisions.
- "security" wants to remove information, requires absolute removal
throughout all revisions, and is willing to sacrifice working copy
continuity.
It's remarkably hard for me to think of these two things as the same
operation! I would call the "disc space story" something else,
"archive," because as a practical matter all our customers keep asking
us for this function, and they always call it "archive." I would leave
the name "obliterate" for the "security story," because though
relatively few of our customers ever mention this, when it comes up,
that's the sort of term they use for it.
-==-
Jack Repenning
Chief Technology Officer
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
mobile: +1 408.835.8090
raindance: +1 877.326.2337, x844.7461
aim: jackrepenning
skype: jackrepenning
twitter: http://twitter.com/jrep
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1269073
Re: Comment on obliterate functional specification
Posted by Magnus Torfason <zu...@gmail.com>.
Hi Jack,
I would say that you both correctly spotted the problem (that
the complexity of consistently modifying the history of the repository
is magnified because of the wide variety of use-cases), and my proposed
solution (to try to factor some of the complexity out of what could
be thought of the "core" obliterate functionality, so that it could be
"dealt with later").
The question is then, is my proposed solution feasible? Needless to
say, I think it is. See specific comments below.
On Mar 2, 2009, at 8:17 PM, Jack Repenning wrote:
> I seem to see a problem here, or perhaps I only fail to see the
> solution. Let me spin a user story and see where it takes us.
>
> Suppose we're dealing with the "security" form of the problem: some
> information has been introduced into the repository that ought not to
> have been, and we need to ensure that it disappears, as thoroughly as
> possible. Suppose, further, that this sensitive information was
> introduced in the form of comment text in a source-code file. The
> error was introduced as a change at the/bad/path@BADREV. Changes to
> the/bad/path have also been made in (BADREV+1) and so on. Feel free to
> assume any ugly thing you like such as copies, post-BADREV, to other
> paths.
>
> In such a situation, it's not just the/bad/path@BADREV that must be
> expunged, but in fact all the later revisions based on it (unless,
> indeed, we can positively determine that someone edited that text out
> again at some later date).
Yes, absolutely. And all kinds of usability issues arise, not only
copies, but merges, too. And should we purge the copies, but leave the
merges, or vice versa. As you say, ugly.
> So either the OBLITERATION SET includes the/bad/path@BADREV and also
> all derived paths and revs (in which case, we need to automate finding
> them all, 'cause depending on the peoples for this won't fly), or
> alternatively some files@REVS not in the OBLITERATION SET need to have
> check-outs which differ depending on whether they come from the
> "original" or "modified" repository.
>
> Which did you have in mind?
The former.
And yes, my idea is to automate finding them all. It's just that I
think that "finding", or "constructing the correct obliteration set"
is going to seem much more manageable if we are absolutely clear on
what happens after the set has been defined, and don't have to worry
about that as well.
Writing code that messes with the repository data while leaving it in
a well defined and consistent state is a challenging task as it is,
even if the functionality is 100% defined.
> But conversely, if we're dealing with the disc-space form of the
> problem, then we exactly do not want these later paths@REVS affected.
Exactly.
> We want to remove the space no longer in use, but the space that makes
> up some ancient delta which is still in use we should not remove, but
> rather keep. A checkout of path@HEAD should show the same result,
> including lines that "svn blame" would show us were added at r1, even
> though we've removed (what we can of) revs 1-10000.
I absolutely agree that (core) obliterating ^/@1:10000 should have *no*
effect on the bytes returned by a checkout of HEAD, in a repository
that was up to revision 10001 before obliteration.
I have to think about svn blame. Are you saying that "svn blame"
should continue to return the same output as before the obliteration?
That does not seem right to me. I would say that after the above
obliteration the repository would look like it had 10000 empty
commits, and one huge commit in the end. Everything would look as if
the author of the last commit had added everything. After all, blame
is just a function of the revision in which a line was added to
the repository and of the revision properties.
> So it seems like one form of obliterate most definitely _does_ want
> some sort of closure used based on the indicated problem point, while
> the other form most definitely does _not_ want that closure applied.
Agreed, so after the first implementation of obliterate, which might
have the syntax:
svn obliterate ^/bad/path/very/bad/path@13:666
We might add switches to the command of the form:
svn obliterate --include-descendants ^/path@100
svn obliterate --include-descendants --include-copies-from ^/path@100
svn obliterate --include-descendants --include-merges-from ^/path@100
And of course, if we want to find the ancestors instead:
svn obliterate --include-ancestors ^/path@100
svn obliterate --include-ancestors --include-copies-to ^/path@100
svn obliterate --include-ancestors --include-merges-to ^/path@100
It would also be very reasonable to interpret
svn obliterate ^/bad/path
as a shorthand for
svn obliterate ^/bad/path@0:HEAD
But the list does not stop here. What about the following use-case,
which may seem silly, but is actually quite reasonable in some
work flows:
svn obliterate --find-me-all-psd-files-older-than-three-months-
that-have-modifications-occurring-less-than-one-week-apart-
and-obliterate-the-next-to-last-commit-in-the-series-
then-repeat-until-there-is-at-least-one-week-
between-deltas ^/my/really/big/photoshop/projects
(Of course, the above syntax is silly in any case).
And as Brane noted, obliterating key links in the revision tree
may be undesirable (even if the result is well-defined), so
we might imagine:
svn obliterate --exclude-copies-from ^/old/and/big
And so on ...
I think all of these use-cases, and more, can be implemented on
top of an "obliteration-set" driven core functionality. Some of
them can eventually (or immediately) find their way into the
utility that subversion users see, others will only be available
in perl scripts operating on log files (but note that all of them
could be implemented through "svn log", "perl" and
"core obliteration".)
Furthermore, if agreement is reached these use-cases will find
their way into obliterate-functional-spec.txt as "add-on"
features, of different priority.
Best,
Magnus
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1269031
Re: Comment on obliterate functional specification
Posted by Jack Repenning <jr...@collab.net>.
On Mar 2, 2009, at 9:09 PM, Branko Čibej wrote:
> Hyrum K. Wright wrote:
>>
>> On Mar 2, 2009, at 10:05 PM, Branko Cibej wrote:
>>
>>> Hyrum K. Wright wrote:
>>
>> You're asking a version control system to remove data, for goodness
>> sakes. That's just dangerous and if you don't have adult
>> supervision,
>> you get what you ask for.
>
> :) Well, I find letting your version control system collapse a file's
> history a lot less scary than letting said system edit the file's
> content throughout its history. The one is a well-defined operation,
> the
> other is fuzzy at best and gets fuzzier along the line -- not to
> mention
> that you can't avoid breaking all working copies in existence.
Considering just how heretical this sort of removal always seems to VC
folks, I'm actually on Brane's side: better to publish a list of
proposed changes, than to run off and do it. I've worn several hats,
and straddled several fences, in this area for years, but to
personify: if "I," the person with the compelling security problem and
the not-quite-but-almost-as-compelling need to keep getting my real
work done, come to "you," the gloriously and normally commendably
compulsive data preserving VC person, and ask, in full recognition of
the VC heresy, yet none the less in absolute earnest, that you expunge
a bit of history ... well, then, I'm frankly inclined to want to check
over what you do, because deep in my heart I know that deep in your
heart you're only doing this under protest, and that you don't have
that visceral understanding of the problem necessary to make proper
edge-case calls.
-==-
Jack Repenning
Chief Technology Officer
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
mobile: +1 408.835.8090
raindance: +1 877.326.2337, x844.7461
aim: jackrepenning
skype: jackrepenning
twitter: http://twitter.com/jrep
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1262267
Re: Comment on obliterate functional specification
Posted by Magnus <ac...@zulutime.net>.
Thanks for the encouragement, Julian. As a matter of fact, I
had written up more on the definition, but had intended to hold
off until after relese 1.6, assuming that things would ease up
after that. However, I will send what I have prepared now,
and would welcome any comments.
The following text would belong somewhere early in a revised
functional specification for obliteration:
-------------------------------------------------
DEFINITION OF THE OBLITERATION OPERATION
An OBLITERATION SET is defined by a list of PATH@REVISON elements
(that is, each element is a pair, consisting of a PATH and REVISION).
The same PATH can be paired with multiple REVISIONS to form
multiple elements and vice versa.
Note: The set is restricted so that if, for a given REVISION,
PATH@REVISION is part of the OBLITERATION SET, any element of
the of the form [PATH/RELATIVEPATH]@REVISION is also part of
the set. (This simply means that if a directory change is
obliterated in a revision, all changes to its contents must
also be obliterated in the same revision).
[Note on the note. Perhaps this restriction can be lifted.
However, it seems that doing so would greatly complicate
both the behavior and implementation of the operation,
without much benefit.]
An ORIGINAL repository is a repository to which an OBLITERATION
operation could be applied, but has not (this includes any
subversion repository without obliterations).
A MODIFIED repository is a repository which is identical to the
ORIGINAL but for which an OBLITERATION SET has been defined and
an OBLITERATION operation has been applied.
The OBLITERATION operation is defined by the following two properties:
1. If a PATH@REVISION is checked out of the MODIFIED repository,
and the PATH@REVISION is NOT in the OBLITERATION SET, the
checkout data is identical to what would have been returned
if PATH@REVISION had been checked out of the ORIGINAL.
2. If a PATH@REVISION is checked out of the MODIFIED repository,
and the PATH@REVISION IS in the OBLITERATION SET, the
checkout data is identical to what would have been returned
if PATH@REVPRIOR had been checked out of the ORIGINAL, where
REVPRIOR is the last revision prior to REVISION for which
PATH@REVPRIOR is not in the OBLITERATION SET.
3. Any other mechanism through which a user can interact with
the repository (diff/merge/copy/commit/etc) should work
consistently. That is, assume that a REFERENCE repository
existed from which nothing had been obliterated, but for
which any checkout operation yielded the same data as for the
MODIFIED repository. Then every remote interaction with
MODIFIED must yield a result indistinguishable from what
would happen if the same operation were applied to the
REFERENCE repository.
Note: Here, data refers to the reported existence of the path,
the versioned properties that apply to the path, and for files,
the actual contents of the file.
Note: This definition does not state what happens to
revision properties (several options are available), and it
does not state what happens to the reported history of
the path (again, several options are available).
Note: Implicit in the above is the fact that the core
OBLITERATION functionality would not drop empty revisions.
This is intentional, and dropping empty revisions should be
done through a separate mechanism.
-------------------------------------------------
The above definition fulfills several desirable criteria:
* It is in my view parsimonious
* It is relatively short
* It has clearly defined behavioral implications
However, the make-or-break criteria are of course two:
* Can obliteration, as defined above, be feasibly implemented?
* Would such an implementation address all required use-cases?
I believe the answer to both of the above questions to be yes,
and I would be happy to elaborate on why I believe this to
be the case, through discussions on the mailing list and through
patches to the functional specification.
Best regards,
Magnus
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1108134