You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Michael Eager <ea...@eagercon.com> on 2006/09/05 16:15:20 UTC

Delete from Repository

I saw the FAQ about the unimplemented "svn obliterate" command.

It's easy to import or checkin crud into a repository.  For example,
I have several svn-commit.tmp files resulting from errors in importing
files, or .swp files from files which had been edited.  It's also
pretty easy to import a tree into the wrong place in a repository,
then import it into the right place, resulting in two copies in the
repository.

The work-around in the FAQ to delete a file or directory from the
repository is cumbersome.

The unpleasant choice seems to be to "svn del" files or directories
so that they are not checked out, but to have them persist in the
repository, cluttering up contents generated by viewvc.cgi, and
confusing people about which is the "real" repository.

This is (in my opinion) a major flaw in Subversion.  Managing
a repo means being able to clean up errors and messes when they
occur, not having to live with them forever.  Is there any way to
fix these kind of problems in an Subversion repository?

I've only looked at the SVN code very briefly.  What would it take
to implement the "svn obliterate" command?

-- 
Michael Eager	 eager@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Delete from Repository

Posted by Karl Fogel <kf...@google.com>.
I may be guilty of leading people on, here, sorry: I personally don't
have time to work on this, and realistically, it needs to have the
attention of some developer, or of people willing to drive the
discussion *and* provide a patch.

Again, I think it would be a great feature.  I'll bet if someone out
there put up a $10k bounty for it, it would get done...

-Karl

Michael Eager <ea...@eagercon.com> writes:
> Karl Fogel wrote:
>
>> The hard part is to define exactly what it will do, see
>> http://subversion.tigris.org/issues/show_bug.cgi?id=516#desc17
>> for more on that.
>>
>> We badly need the feature, everyone agrees -- but defining exactly
>> what the feature is is the hard part :-).  
>
> I read the comments in the bug report.
>
> The feature I need is to remove a file/directory from the repo.
> It seems to me (naively) that this means removing any deltas,
> any history, any logs, etc.  Make it as if the file/dir were
> never added in the first place.  This would seem to address some
> of the problems mentioned:  someone checks in binaries or checks
> in a source tree in the wrong location.
>
> Tagging a node as "obliterated" in the repo would not recover
> any wasted space, but it would logically act as if the file/dir
> had been deleted.  A compress operation (i.e., dump/restore
> of the archive) would recover the space.  Not as nice as recovering
> the space when the file/dir is obliterated, but it makes the
> obliterate function reversible.  (The more secure variation
> is to replace the node data with a comment like "redacted" as
> mentioned in the bug report, but that makes the obliterate
> operation irreversible.)  (This was Jason Robbins suggestion.)
>
> I'm a bit unclear about the comments that obliterating
> a file/dir would make working copies invalid.  What happens
> when svn sees a entry for a file in a working copy when no
> file exists in the repo?  Or better, what should happen?
>
> There were also comments about deltas against obliterated
> nodes.  I'm unfamiliar with the internals of svn, but if there
> are deltas against a file which is obliterated, then it seems
> like all of these deltas should also be obliterated.  I don't
> understand the comment about needing to "re-delta" nodes.
>
> It may be that there is a different requirement to obliterate
> some intermediate update, while retaining the file/dir in
> the repo.  That's something different from what I would like
> to see.  Some of the comments (like those of Ben Collins-Sussman)
> seem to address this problem.
>
> It seems that for many applications it would be satisfactory
> to take the repo offline while compressing (dump/restore) it,
> eliminating space used by obliterated files.  There are comments
> that this is not a satisfactory solution for large repos, but it
> would seem to provide a stepping stone to a more comprehensive
> solution.
>
> That better solution might be to walk the database looking
> for obliterated nodes and then remove them and any nodes
> which reference them, recursively.  (Obviously, this is done
> bottom up, removing nodes which have no references and removing
> the references to the nodes.)  I'm speaking from a naive viewpoint,
> since I haven't looked at the code.  This sounds to me like
> something which can be done in the background as a maintenance
> activity, without taking the repo offline.
>
>
> -- 
> Michael Eager	 eager@eagercon.com
> 1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Delete from Repository

Posted by Michael Eager <ea...@eagercon.com>.
Karl Fogel wrote:

> The hard part is to define exactly what it will do, see
> http://subversion.tigris.org/issues/show_bug.cgi?id=516#desc17
> for more on that.
> 
> We badly need the feature, everyone agrees -- but defining exactly
> what the feature is is the hard part :-).  

I read the comments in the bug report.

The feature I need is to remove a file/directory from the repo.
It seems to me (naively) that this means removing any deltas,
any history, any logs, etc.  Make it as if the file/dir were
never added in the first place.  This would seem to address some
of the problems mentioned:  someone checks in binaries or checks
in a source tree in the wrong location.

Tagging a node as "obliterated" in the repo would not recover
any wasted space, but it would logically act as if the file/dir
had been deleted.  A compress operation (i.e., dump/restore
of the archive) would recover the space.  Not as nice as recovering
the space when the file/dir is obliterated, but it makes the
obliterate function reversible.  (The more secure variation
is to replace the node data with a comment like "redacted" as
mentioned in the bug report, but that makes the obliterate
operation irreversible.)  (This was Jason Robbins suggestion.)

I'm a bit unclear about the comments that obliterating
a file/dir would make working copies invalid.  What happens
when svn sees a entry for a file in a working copy when no
file exists in the repo?  Or better, what should happen?

There were also comments about deltas against obliterated
nodes.  I'm unfamiliar with the internals of svn, but if there
are deltas against a file which is obliterated, then it seems
like all of these deltas should also be obliterated.  I don't
understand the comment about needing to "re-delta" nodes.

It may be that there is a different requirement to obliterate
some intermediate update, while retaining the file/dir in
the repo.  That's something different from what I would like
to see.  Some of the comments (like those of Ben Collins-Sussman)
seem to address this problem.

It seems that for many applications it would be satisfactory
to take the repo offline while compressing (dump/restore) it,
eliminating space used by obliterated files.  There are comments
that this is not a satisfactory solution for large repos, but it
would seem to provide a stepping stone to a more comprehensive
solution.

That better solution might be to walk the database looking
for obliterated nodes and then remove them and any nodes
which reference them, recursively.  (Obviously, this is done
bottom up, removing nodes which have no references and removing
the references to the nodes.)  I'm speaking from a naive viewpoint,
since I haven't looked at the code.  This sounds to me like
something which can be done in the background as a maintenance
activity, without taking the repo offline.


-- 
Michael Eager	 eager@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Delete from Repository

Posted by Les Mikesell <le...@gmail.com>.
On Tue, 2006-09-05 at 11:50 -0700, Karl Fogel wrote:
> Les Mikesell <le...@gmail.com> writes:
> > I'd expect it to have exactly the same results as a dump/filter/load
> > sequence back into the same location, in which case the feature is
> > already defined and acceptable.  We just need an implementation that
> > is faster, doesn't need the intermediate copies, and doesn't break
> > checked-out workspaces any more than necessary.
> 
> Sure, any of the proposed behaviors in 
> 
>    http://subversion.tigris.org/issues/show_bug.cgi?id=516#desc17
> 
> could be implemented via a dump/filter/load sequence, but that doesn't
> specify which exact one you had in mind.  Can you describe it in terms
> of results, rather than of implementation?

I still think in terms of the way CVS works so I'd want the effect
of removing the file,v file from a CVS repository filesystem.  That
is, all versions completely gone at once.  Perhaps this could be
combined with a directory-level dump/filter/restore operation if you
wanted to put back some subset of versions in its place.

-- 
  Les Mikesell
   lesmikesell@gmail.com


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Delete from Repository

Posted by Karl Fogel <kf...@google.com>.
Les Mikesell <le...@gmail.com> writes:
> I'd expect it to have exactly the same results as a dump/filter/load
> sequence back into the same location, in which case the feature is
> already defined and acceptable.  We just need an implementation that
> is faster, doesn't need the intermediate copies, and doesn't break
> checked-out workspaces any more than necessary.

Sure, any of the proposed behaviors in 

   http://subversion.tigris.org/issues/show_bug.cgi?id=516#desc17

could be implemented via a dump/filter/load sequence, but that doesn't
specify which exact one you had in mind.  Can you describe it in terms
of results, rather than of implementation?

Thanks,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Delete from Repository

Posted by Les Mikesell <le...@gmail.com>.
On Tue, 2006-09-05 at 10:41 -0700, Karl Fogel wrote:
> > I saw the FAQ about the unimplemented "svn obliterate" command.
> >
> > It's easy to import or checkin crud into a repository.  For example,
> > I have several svn-commit.tmp files resulting from errors in importing
> > files, or .swp files from files which had been edited.  It's also
> > pretty easy to import a tree into the wrong place in a repository,
> > then import it into the right place, resulting in two copies in the
> > repository.
> >
> > The work-around in the FAQ to delete a file or directory from the
> > repository is cumbersome.
> >
> > The unpleasant choice seems to be to "svn del" files or directories
> > so that they are not checked out, but to have them persist in the
> > repository, cluttering up contents generated by viewvc.cgi, and
> > confusing people about which is the "real" repository.
> >
> > This is (in my opinion) a major flaw in Subversion.  Managing
> > a repo means being able to clean up errors and messes when they
> > occur, not having to live with them forever.  Is there any way to
> > fix these kind of problems in an Subversion repository?
> >
> > I've only looked at the SVN code very briefly.  What would it take
> > to implement the "svn obliterate" command?
> 
> The hard part is to define exactly what it will do, see
> http://subversion.tigris.org/issues/show_bug.cgi?id=516#desc17
> for more on that.
> 
> We badly need the feature, everyone agrees -- but defining exactly
> what the feature is is the hard part :-). 

I'd expect it to have exactly the same results as a dump/filter/load
sequence back into the same location, in which case the feature is
already defined and acceptable.  We just need an implementation that
is faster, doesn't need the intermediate copies, and doesn't break
checked-out workspaces any more than necessary.

-- 
  Les Mikesell
   lesmikesell@gmail.com


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Delete from Repository

Posted by Karl Fogel <kf...@google.com>.
Michael Eager <ea...@eagercon.com> writes:
> I saw the FAQ about the unimplemented "svn obliterate" command.
>
> It's easy to import or checkin crud into a repository.  For example,
> I have several svn-commit.tmp files resulting from errors in importing
> files, or .swp files from files which had been edited.  It's also
> pretty easy to import a tree into the wrong place in a repository,
> then import it into the right place, resulting in two copies in the
> repository.
>
> The work-around in the FAQ to delete a file or directory from the
> repository is cumbersome.
>
> The unpleasant choice seems to be to "svn del" files or directories
> so that they are not checked out, but to have them persist in the
> repository, cluttering up contents generated by viewvc.cgi, and
> confusing people about which is the "real" repository.
>
> This is (in my opinion) a major flaw in Subversion.  Managing
> a repo means being able to clean up errors and messes when they
> occur, not having to live with them forever.  Is there any way to
> fix these kind of problems in an Subversion repository?
>
> I've only looked at the SVN code very briefly.  What would it take
> to implement the "svn obliterate" command?

The hard part is to define exactly what it will do, see
http://subversion.tigris.org/issues/show_bug.cgi?id=516#desc17
for more on that.

We badly need the feature, everyone agrees -- but defining exactly
what the feature is is the hard part :-).  It will need someone to
drive the discussion, an "honest broker" in the sense described here:
http://producingoss.com/html-chunk/consensus-democracy.html#voting (I
don't mean to imply that voting will be needed here, I'm just linking
to that as a description of what broking means).

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Delete from Repository

Posted by Kevin Greiner <gr...@gmail.com>.
On 9/5/06, Michael Eager <ea...@eagercon.com> wrote:
>
>
> I've only looked at the SVN code very briefly.  What would it take
> to implement the "svn obliterate" command?


I suggest reading
http://subversion.tigris.org/issues/show_bug.cgi?id=516and searching
the following list archives for "obliterate".
http://svn.haxx.se/dev/
http://svn.haxx.se/users/