You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2009/06/23 17:36:40 UTC

Re: Easy "obliterate" for BDB and lowest revision [Resent 2]

On Tue, 2009-06-23 at 09:39 +0200, Philipp Marek wrote:
> [I'm resending this mail {a third time} as it doesn't seem to have reached the 
> mailing list - Jack already knows about that.]

I don't know what's up with that, but I expect this reply from me will
get there.

> Hello everybody,
> 
> looking at trunk/subversion/libsvn_fs_base/notes/structure:358 I see this:
> 
> 	At present, Subversion generally stores the youngest strings in "fulltext"
> 	form, and older strings as "delta"s against them (unless the delta would
> 	save no space compared to the fulltext).
> 
> If I understand that correctly it should be relatively easy to obliterate the 
> *lowest* (oldest) revision of any entry, as it's not referenced anywhere else.
> (Whether the directories could be purged too, and only kept in later 
> revisions, I'm not sure ATM).

Well, I believe you're right in thinking the oldest delta of a file is
not referenced by anything else, so no other deltas need to be
re-calculated in order to remove it. However, I suspect that
re-calculating deltas is not one of the major difficulties with
implementing any kind of "obliterate" feature, as we already do it
during ordinary commits.

For example, one of the things we would need to take care of, if we
remove the existence of that file's oldest revision, is any "copy-from"
references that exist in later objects that were copied from it. (If, on
the other hand, we replace the file's text content with empty content or
some other content and leave it existing, that particular problem goes
away.)

I don't know what other things we'll have to take care of in the
repository. On the client side there is of course still the issue of how
working copies should behave (and how old working copies of clients that
don't understand "obliterate" will behave).


> I'm asking this because in 2004 there's been rumors about a script that does 
> surgery in BDB repositories 
> (http://svn.haxx.se/dev/archive-2004-08/index.shtml#163), and the simply 
> "obliterate" way above would allow to use subversion for a round-robin backup 
> system.

> Is there anything left of this script, or can someone assist in getting 
> something like this done?

I don't know about such a script.

Personally I'm hoping to work on a more general solution that enables
obliterating later revisions, not just the oldest. But if someone wants
to fund me to do obliterate-just-the-oldest in a way that could be
extended to other cases later, I'll certainly do it. (I am currently
looking at the higher level issues: what kinds of functional change to
the repository's externally visible structure would best satisfy the
various kinds of user requirements, and how to manage the invalidation
that will occur to working copies.)

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2364572

Re: Easy "obliterate" for BDB and lowest revision [Resent 2]

Posted by Philipp Marek <ph...@emerion.com>.
Hello Julian,
hello others,

On Dienstag, 23. Juni 2009, Julian Foad wrote:
> On Tue, 2009-06-23 at 09:39 +0200, Philipp Marek wrote:
> > [I'm resending this mail {a third time} as it doesn't seem to have
> > reached the mailing list - Jack already knows about that.]
>
> I don't know what's up with that, but I expect this reply from me will
> get there.
Thanks a lot. I hope that this mail will get through.

> > If I understand that correctly it should be relatively easy to obliterate
> > the *lowest* (oldest) revision of any entry, as it's not referenced
> > anywhere else. (Whether the directories could be purged too, and only
> > kept in later revisions, I'm not sure ATM).
>
> Well, I believe you're right in thinking the oldest delta of a file is
> not referenced by anything else, so no other deltas need to be
> re-calculated in order to remove it. ...
> is any "copy-from"
> references that exist in later objects that were copied from it. 
Yes, you're right ... I forgot about copy-from, because for this use case 
(backup) it wouldn't be used.

> (If, on
> the other hand, we replace the file's text content with empty content or
> some other content and leave it existing, that particular problem goes
> away.)
But then we wouldn't get the allocated space back.

> I don't know what other things we'll have to take care of in the
> repository. On the client side there is of course still the issue of how
> working copies should behave (and how old working copies of clients that
> don't understand "obliterate" will behave).
Of course, the client side is always some hard point to consider ...
But, as I mentioned above, for backup there's no "client side", just someone 
who puts newer files into the repository ;-)

> > I'm asking this because in 2004 there's been rumors about a script that
> > does surgery in BDB repositories
> > (http://svn.haxx.se/dev/archive-2004-08/index.shtml#163), and the simply
> > "obliterate" way above would allow to use subversion for a round-robin
> > backup system.
> >
> > Is there anything left of this script, or can someone assist in getting
> > something like this done?
>
> I don't know about such a script.
Well, from http://svn.haxx.se/dev/archive-2004-08/0179.shtml 

"C. Michael Pilato" writes at "2004-08-05 22:07:32 CEST"
> "C. Michael Pilato" <cm...@collab.net> writes:
> > > Didn't cmpilato have some magic script that zeroed-out the nasty files
> > >
> > > in the strings table? or something like that?
> >
> > No sir. Have no such script.
>
> So, I must confess that I was not completely upfront in my reply.
>
> I don't have a script which does the work needed to say, remove
> evidence of a file's content ever being present in a repository. But
> I *do* have a script (begun some time ago, but never finished) which
> simply unlinks all revisions of a file from their parent directories.
> I never donated said script because it was guaranteed to break
> repositories in a particular way (doesn't munge the all-important
> 'changes' table).
So there's some evidence left of _some_ script ;-]


> Personally I'm hoping to work on a more general solution that enables
> obliterating later revisions, not just the oldest. But if someone wants
> to fund me to do obliterate-just-the-oldest in a way that could be
> extended to other cases later, I'll certainly do it.
Of course, a full obliterate would be better ... just wanted to ask about that 
shortcut, that could solve some use-cases, eg if the first import of a file 
was unwanted, just commit the correct version, and remove the fulltext before.

> (I am currently
> looking at the higher level issues: what kinds of functional change to
> the repository's externally visible structure would best satisfy the
> various kinds of user requirements, and how to manage the invalidation
> that will occur to working copies.)
I would be interested to read about any conclusions you have drawn.


Regards,

Phil



starting new threads bug for mutt users (was: Re: Easy "obliterate" for BDB and lowest revision [Resent 2])

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Jun 23, 2009 at 06:36:40PM +0100, Julian Foad wrote:
> On Tue, 2009-06-23 at 09:39 +0200, Philipp Marek wrote:
> > [I'm resending this mail {a third time} as it doesn't seem to have reached the 
> > mailing list - Jack already knows about that.]
> 
> I don't know what's up with that, but I expect this reply from me will
> get there.

It's a bug in tigris' shiny new mail system preventing people using
mutt from opening new threads on lists.

See
http://www.tigris.org/ds/viewMessage.do?dsForumId=926&dsMessageId=2361450
and
http://www.tigris.org/ds/viewMessage.do?dsForumId=926&dsMessageId=2363328

Stefan