You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Brandon Ehle <az...@yahoo.com> on 2006/10/14 17:16:35 UTC

Optional text base design discussion?

Has there ever been a design discussion on how to implement optional
text base working copies?

I have found plenty of flame wars in the past on the subject, but I
haven't found any design discussions on the best way to implement this.

The possibilities I have seen so far:


Compressed text bases
  * Keeps current low bandwidth performance
  * Reduces the amount of disk space slightly
  * Local working copy operations are slower


Removal of the text bases
 * Low bandwidth performance suffers
 * Reduces the amount of disk space by half in some cases
 * Local working copy operations possibly require server communication
 * Faster performance for working copy operations that have lots of files
 * Checkout performance for large binary files (no EOL translation) has
the possibility of doubling
 * Less chance for search and replace bugs


Single .svn directory (database?) for the entire working copy
 * Could be combined with compressed or missing text bases
 * Faster performance for working copies with lots of directories


There are probably other advantages and disadvantages that I have not
seen mentioned for the various approaches.  If implemented, I would
imagine that these features would be optional by either some server side
configuration, or a client side option.

With the possibility of multiple working copy formats, there is an
interesting side discussion as to whether the ability to switch formats
on an existing working copy would be useful (which is possibly similar
to takeover functionality).


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Optional text base design discussion?

Posted by Joseph Galbraith <ga...@vandyke.com>.
Brandon Ehle wrote:
> Has there ever been a design discussion on how to implement optional
> text base working copies?
> 
> I have found plenty of flame wars in the past on the subject, but I
> haven't found any design discussions on the best way to implement this.
> 
> The possibilities I have seen so far:
> 
> 
> Compressed text bases
>   * Keeps current low bandwidth performance
>   * Reduces the amount of disk space slightly
>   * Local working copy operations are slower
> 
> 
> Removal of the text bases
>  * Low bandwidth performance suffers
>  * Reduces the amount of disk space by half in some cases
>  * Local working copy operations possibly require server communication

I think this could be done in such a way that low bandwidth
performance and server communication would only be a drawback
if the user failed to notify svn that he was about to modify
a file.

For example, we could add a "svn modify <file>" command,
which would check the hash for the working copy file
and if it hadn't already been modified, copy it into
the test-base area.

Then, as long as you remember to "svn modify" your files
before you change them, you get all the advantages of
the current system.

Additionally, it might be possible to add something like
a --no-scan option to check-in, which would simply assume that
only files present in the text-base area were candidates
for check-in and not preform an exhaustive scan of the
working copy.  There could be significant performance
gains.

For those of us working in IDE environments, the right
set of tools could also help prevent the case where
I have to contact the server because I forgot to svn modify
before changing a file.

Also, while not something the subversion team
would undertake, a design such as this would make it
possible to combine subversion with a kernel mode
component that automatically copied the files before
allowing any write operation to proceed.

> Single .svn directory (database?) for the entire working copy
>  * Could be combined with compressed or missing text bases
>  * Faster performance for working copies with lots of directories

And the downside to this is that you don't have
'tear-off' working copies, where any subfolder of
a working copy can be move elsewhere and independently
remains a working copy.

Of course, I don't find much value in this feature...
and if I could trade it for working-copy performance,
I'd do it in a heart-beat.

Thanks,

Joseph

Re: Optional text base design discussion?

Posted by Michael Sinz <Mi...@sinz.org>.
Brandon Ehle wrote:
>> BTW - I find the WC performance to not be that bad relative to the benefit
>> of the "svn diff" and "svn status" performance win due to the text base.
>> Now, I do not use Windows much so I do not suffer from the filesystem
>> performance issues that come up there.
>>
> 
> I typically run into problems when the working copy contents are over
> 20GB (40GB with text base).  My assumption is that if you have a working
> copy that large, your project should be able to afford a fast server and
> Gigabit Ethernet to it.  For large files like movie clips, a "svn diff"
> is not all that useful anyway.  Mostly you just want to find out if the
> file is modified.

Ahh, maybe the right answer here is that certain files can be marked as
"no text base" due to the fact that it is of less use.  Those files would,
however, need some hash match feature (MD5/SHA1/etc) such that "svn status"
would be fast.

Even more interesting is how to handle the commit process.  A large file, even
MPEG video, that has been changed is still a lot of data to send.  It would be
good if the server did not need to do the diff operation but then doing the
diff operation on the client would mean significant network traffic.

I wonder where the best trade-off is here.  In one case there is more local
disk space used but less server resources, in the other the server resources
are larger and the network usage is larger but significant savings in local
disk.

Generally I would say that costs in items that scale "naturally" is better
than in central systems, but there is a valid argument to be made that for
some content, the text base really is a significant cost and of minimal
benefit.

The real question now is what the cost of the complexity in the WC handling
would be to implement something to address this...

-- 
Michael Sinz                     Technology and Engineering Director/Consultant
"Starting Startups"                                mailto:michael.sinz@sinz.org
My place on the web                            http://www.sinz.org/Michael.Sinz

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Optional text base design discussion?

Posted by Michael Sinz <Mi...@sinz.org>.
Brandon Ehle wrote:
> Has there ever been a design discussion on how to implement optional
> text base working copies?
> 
> I have found plenty of flame wars in the past on the subject, but I
> haven't found any design discussions on the best way to implement this.
> 
> The possibilities I have seen so far:
> 
> 
> Compressed text bases
>   * Keeps current low bandwidth performance
>   * Reduces the amount of disk space slightly
>   * Local working copy operations are slower
> 
> 
> Removal of the text bases
>  * Low bandwidth performance suffers
>  * Reduces the amount of disk space by half in some cases
>  * Local working copy operations possibly require server communication
>  * Faster performance for working copy operations that have lots of files
>  * Checkout performance for large binary files (no EOL translation) has
> the possibility of doubling
>  * Less chance for search and replace bugs
> 
> 
> Single .svn directory (database?) for the entire working copy
>  * Could be combined with compressed or missing text bases
>  * Faster performance for working copies with lots of directories

You did not mention changing the .svn directory into a special database
and not having so many individual files within it.  Still have one
per directory but have minimal contents.  This could dramatically
improve some WC operations without losing the tear-off and low-bandwidth
features.

BTW - I find the WC performance to not be that bad relative to the benefit
of the "svn diff" and "svn status" performance win due to the text base.
Now, I do not use Windows much so I do not suffer from the filesystem
performance issues that come up there.

-- 
Michael Sinz                     Technology and Engineering Director/Consultant
"Starting Startups"                                mailto:michael.sinz@sinz.org
My place on the web                            http://www.sinz.org/Michael.Sinz

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Optional text base design discussion?

Posted by Michael Brouwer <mb...@gmail.com>.
Sure a caching proxy would be a different way to implement this behavior.  I
was thinking more along the lines of this being a new wc model directly
though.  However I don't think one excludes the other, and that if such a
cache backend was created you could probably use it directly for the new wc
code as well as to implement a caching svn proxy.

Michael


On 10/16/06, John Peacock <jp...@rowman.com> wrote:
>
> Michael Brouwer wrote:
> > John, I'm well aware as to what svk does (I'm an svk comitter).
>
> And I thought your name sounded familiar... ;-)
>
> > Think of this cache as a subversion repository with obliterate-like
> > functionality.  Obviously this idea would need to be fleshed out more,
> > but I do believe this could be implemented in a way that unifies the
> > benefits of both svn and svk, and even let's users check out  working
> > copies of large projects without the need for a second local copy of the
> > entire tree at all.
>
> I think it might be easier to describe this feature as a caching proxy
> service.
> If all repository-to-WC activity occurred through a specially designed
> proxy,
> and the Subversion client library was aware that it was being proxied,
> then you
> could have a tunable cache which would automatically retrieve and cache
> the
> text-base files as you needed them.  I don't thing you would want to
> strictly
> limit the cache by size, since a single very large binary file could
> exceed the
> storage (of course you could also mark binary files as uncacheable).
>
> Checking out files would be a transparent proxy to the original
> repository.
> Executing diff or checking in files would transparently retrieve the
> text-base
> from the server (if needed) and cache it for later use (you could also
> tune the
> proxy to cache the files during checkout in order to have a fully local
> cache,
> say for multiple overlapping WC's).
>
> Does that seem like a different way to approach your proposal?
>
> John
>
> --
> John Peacock
> Director of Information Research and Technology
> Rowman & Littlefield Publishing Group
> 4501 Forbes Blvd
> Suite H
> Lanham, MD 20706
> 301-459-3366 x.5010
> fax 301-429-5747
>

Re: Optional text base design discussion?

Posted by John Peacock <jp...@rowman.com>.
Michael Brouwer wrote:
> John, I'm well aware as to what svk does (I'm an svk comitter). 

And I thought your name sounded familiar... ;-)

> Think of this cache as a subversion repository with obliterate-like
> functionality.  Obviously this idea would need to be fleshed out more,
> but I do believe this could be implemented in a way that unifies the
> benefits of both svn and svk, and even let's users check out  working
> copies of large projects without the need for a second local copy of the
> entire tree at all.

I think it might be easier to describe this feature as a caching proxy service.
 If all repository-to-WC activity occurred through a specially designed proxy,
and the Subversion client library was aware that it was being proxied, then you
could have a tunable cache which would automatically retrieve and cache the
text-base files as you needed them.  I don't thing you would want to strictly
limit the cache by size, since a single very large binary file could exceed the
storage (of course you could also mark binary files as uncacheable).

Checking out files would be a transparent proxy to the original repository.
Executing diff or checking in files would transparently retrieve the text-base
from the server (if needed) and cache it for later use (you could also tune the
proxy to cache the files during checkout in order to have a fully local cache,
say for multiple overlapping WC's).

Does that seem like a different way to approach your proposal?

John

-- 
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4501 Forbes Blvd
Suite H
Lanham, MD 20706
301-459-3366 x.5010
fax 301-429-5747

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Optional text base design discussion?

Posted by Michael Brouwer <mb...@gmail.com>.
John, I'm well aware as to what svk does (I'm an svk comitter).  What I'm
suggesting is a scalabe cache rather than a complete mirror.  It's just that
on one end of the spectrum you have the current text-base cache (or even
less, and only a cache of the files you are operating on right now) while at
the other end of the spectrum you have a full mirror like svk.

The main difference would be that since this is a cache, if I'm no longer
interested in keeping certain older revisions of files around I can tell the
cache to toss them (possibly automaticaly by limited the amount of diskspace
the cache is allowed to use).  Also if I only check out a subset of a large
svn repo and later checkout more the cache would get populated with whatever
is needed on demand.

Think of this cache as a subversion repository with obliterate-like
functionality.  Obviously this idea would need to be fleshed out more, but I
do believe this could be implemented in a way that unifies the benefits of
both svn and svk, and even let's users check out  working copies of large
projects without the need for a second local copy of the entire tree at all.

Michael


On 10/16/06, John Peacock <jp...@rowman.com> wrote:
>
> Michael Brouwer wrote:
> > Another option would be to locally store a text-base cache of some sort
> > outside the actual working copies.  This cache could be shared between
> > multiple working copies of the same project tree.  The text base cache
> itself
> > could potentially use some kind of delta storage model to provide
> multiple
> > cached revisions of files.  That way not just svn diff and svn status
> could
> > be local, but potentially diffing between trunk and a branch that are
> both in
> > cache could be.  The cache can be as sparse or as filled as a user
> likes,
> > with the ultimate cache being a mirror of the entire repository, making
> every
> > operation disconnected.
>
> Congratulations!  You've just reinvented SVK:
>
>         http://svk.elixus.org/
>
> Let's see:
>
> 1) stores the "text base" using a delta storage model - check (SVK has a
> repository using delta storage to compare against);
>
> 2) shares the "text base" with multiple working copies - you can check out
> multiple working copies from the same SVK repository;
>
> 3) sparse cache - check (SVK can mirror a remote repository in whole or
> from any
> arbitrary revision).
>
> ;-)
>
> John
>
>
> --
> John Peacock
> Director of Information Research and Technology
> Rowman & Littlefield Publishing Group
> 4501 Forbes Blvd
> Suite H
> Lanham, MD 20706
> 301-459-3366 x.5010
> fax 301-429-5747
>

Re: Optional text base design discussion?

Posted by John Peacock <jp...@rowman.com>.
Michael Brouwer wrote:
> Another option would be to locally store a text-base cache of some sort
> outside the actual working copies.  This cache could be shared between
> multiple working copies of the same project tree.  The text base cache itself
> could potentially use some kind of delta storage model to provide multiple
> cached revisions of files.  That way not just svn diff and svn status could
> be local, but potentially diffing between trunk and a branch that are both in
> cache could be.  The cache can be as sparse or as filled as a user likes,
> with the ultimate cache being a mirror of the entire repository, making every
> operation disconnected.

Congratulations!  You've just reinvented SVK:

	http://svk.elixus.org/

Let's see:

1) stores the "text base" using a delta storage model - check (SVK has a
repository using delta storage to compare against);

2) shares the "text base" with multiple working copies - you can check out
multiple working copies from the same SVK repository;

3) sparse cache - check (SVK can mirror a remote repository in whole or from any
arbitrary revision).

;-)

John


-- 
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4501 Forbes Blvd
Suite H
Lanham, MD 20706
301-459-3366 x.5010
fax 301-429-5747

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Optional text base design discussion?

Posted by Michael Brouwer <mb...@gmail.com>.
Another option would be to locally store a text-base cache of some sort
outside the actual working copies.  This cache could be shared between
multiple working copies of the same project tree.  The text base cache
itself could potentially use some kind of delta storage model to provide
multiple cached revisions of files.  That way not just svn diff and svn
status could be local, but potentially diffing between trunk and a branch
that are both in cache could be.  The cache can be as sparse or as filled as
a user likes, with the ultimate cache being a mirror of the entire
repository, making every operation disconnected.

Michael


On 10/16/06, Brandon Ehle <az...@yahoo.com> wrote:
>
> Peter Samuelson wrote:
> > And the UI:
> >
> > - New flag for 'svn co', 'svn ci' and 'svn up' to indicate desire for
> >   missing or compressed text-bases.
>
> I assume that some sort of flag in the entries file is needed to
> remember this setting to remember this flag after a commit.
>
>
> Did the developers ever have a discussion about where to pull the
> text-base wanted/unwanted flag from?
>
> I see at least three possibilities:
>
> 1) Local configuration option per repository (per filetype or other?)
>
> GUI's would be able to share the setting with the command line.
>
> 2) Optional server side properties per file
>
> Would work well for extremely large teams and repositories with mixed
> code and assets.  You would probably still want the code to have
> text-base, but for large binary assets you would want to disable it.
>
> 3) Command line option like suggested above
>
> Does not make it easy to checkout repositories with mixed code and
> assets.  Users are required to pass the option each time they update.
> Good for performance testing, unit tests, and automated processes that
> do not need text-base (on the same machine as a seperate copy of the
> repository that has text-base).
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>

Re: Optional text base design discussion?

Posted by Brandon Ehle <az...@yahoo.com>.
Peter Samuelson wrote:
> And the UI:
> 
> - New flag for 'svn co', 'svn ci' and 'svn up' to indicate desire for
>   missing or compressed text-bases.

I assume that some sort of flag in the entries file is needed to
remember this setting to remember this flag after a commit.


Did the developers ever have a discussion about where to pull the
text-base wanted/unwanted flag from?

I see at least three possibilities:

 1) Local configuration option per repository (per filetype or other?)

GUI's would be able to share the setting with the command line.

 2) Optional server side properties per file

Would work well for extremely large teams and repositories with mixed
code and assets.  You would probably still want the code to have
text-base, but for large binary assets you would want to disable it.

 3) Command line option like suggested above

Does not make it easy to checkout repositories with mixed code and
assets.  Users are required to pass the option each time they update.
Good for performance testing, unit tests, and automated processes that
do not need text-base (on the same machine as a seperate copy of the
repository that has text-base).


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Optional text base design discussion?

Posted by Peter Samuelson <pe...@p12n.org>.
[Brandon Ehle]
> Has there ever been a design discussion on how to implement optional
> text base working copies?

So here's what I think would need to happen:

- Eliminate any assumptions about when a text-base file should always
  exist.  And don't assume that, for example, a file must be newly
  added if it doesn't have a text-base.  I have no idea if the latter
  assumption exists anywhere.

- If a text-base file is needed and does not exist, I guess create it
  using svn_ra.

- Check for two text-base filenames, with and without .gz, and if
  *.text-base.gz is found, stream it through zlib.

- Commands like 'merge' and 'rm' need to move the old copy of the file
  to a text-base if it doesn't already exist.

And the UI:

- New flag for 'svn co', 'svn ci' and 'svn up' to indicate desire for
  missing or compressed text-bases.

- New svn subcommand to add a text-base to an unmodified file, in
  anticipation of modifying the file.  Of course, thanks to eol-style
  and keywords, this may be more than a simple file copy.


> Single .svn directory (database?) for the entire working copy
>  * Could be combined with compressed or missing text bases

That issue has little or nothing to do with missing/compressed
text-bases.