You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Kean Johnston <jk...@sco.com> on 2002/12/16 12:27:30 UTC

text-base penalty: A proposed solution

All,

About 6 weeks ago I started (restarted) the text-base penalty thread.
A lot was said, and I can precis it all if needed, but now that I
have taken a much closer look at svn than I had at the time, and now
that I actually have time to play with it seriously, I'd like to
re-open that particular can of worms. Please bear with me.

My personal goal (and its important, because it reflects typical large
project usage) is to convert the SCO OpenServer source code base from
its current tools-on-top-of-SCCS system to one managed by svn.  This is
an ambitious project, complicated by the sheer volume of existing
history,
and the size of the code base.  Other issues, such as importing from
SCCS,
can be addressed later.  Before I even begin such a project with just
the current top-of-tree code for a proof-of-concept project, there are
things like the text base penalty I need to address.

First of all, I would like to state an objection to the notion of the
cached text base.  The documentation sings its praises, and posts from
previous threads seem to indicate that people place a great deal of
importance on the ability to edit files on aeroplanes or environments
where they are not network-attached to their respository, but I submit
that such usage caters to such a tiny percentage of the audience likely
to use svn that making all other users incur the penalty seems
misguided.
It is approximately equivalent to the government producing every
document
it ever prints in every language on earth, on the offchance that one or
two of its employees are more comfortable reading in their native
language
than in English.  A noble sentiment, to be sure, but rather impractical.
In the case of the text base, there are other soultions that are equally
functional, yet much smaller and (in my opinion) simpler.

Second, before anyone is tempted to throw the "disk space is cheap"
argument
around, please don't.  A software solution should not rely on the
*CURRENT*
costs of hardware in *SOME COUNTRIES* as a means of justifying a
solution
that can be implemented more cheaply yet just as functionally.  Using
the
price of hardware is a mental trap.  It is equivalent to a 3D graphics
package author saying "Oh, there is no need to optimize my raytracing
algorithm, CPUs are getting faster and cheaper and I can just wait for
good hardware to hide my bad design".  Such thinking thwarts innovation,
which is usually a prime motivator for any software package in the first
place.  So, in order to have a clear and rational debate about the text
base, I think it would be prudent to list its benefits, especially as
they relate to the overall system functionality.

o  Allows for easy reversion of files without repository access
o  Allows for easy detection of changes to the WC w/o repository access
o  Allows for minimal client-to-server communication on commit
o  Allows for local diffs (handy for developers) w/o repository access

I am sure there are other hidden benefits, but from a systemic
perspective,
or most certainly from an end users persective, those are the benefits.
These are what I see as the negatives.

o  Thwarts tools like cscope that scan for all source files
o  Makes `find . -name \*.[cCh]|xargs grep SOMETHING` much less useful
o  Uses exactly double the space, which is wasted in read-only
environments
o  Doubles the workload of any directory-traversal tools
o  Doubles the inode count (this IS an issue!)

Just as there are hidden benefits, I am equally sure there are hidden
negatives.  Time and more experience may lengthen this list.  To some
of these problems, there are solutions, I know.  Yes, I can use

  find . -name \*.[cCh]|sed -e '/\.svn/d'|xargs grep SOMETHING

Yes, I can trim out .svn files from cscope file lists.  The real point
though, is that in both of these cases, the version control system is
getting in the way of practices that are as old as time, and for little
benefit (assuming an equally functional solution can be found).  The
point
about double the space is a rather important one.  I am sure we are not
the only company in the world that has several "nightly build boxes"
that build different levels of the tree, such as the head, currently-in-
system-test, and Keans-own-special-hacking version.  Such build machines
are almost always read-only clients of the version control system.  They
get the source as it stands at the start of the build, do the build, and
they're done.  What possible reason is there to have cached files for
such uses?  Another example.  In something the size of OpenServer, most
developers have their own areas of expertise, and tend to make changes
to specific portions of the tree.  I am sure most companies work the
same
way.  However, each developer needs a full source tree in order to do
prototype builds, or (in my case) needs to have done at least one full
build before partial builds can take place.  But from my perspective,
almost all of the tree is read only.  I only care about the console
driver, or the licensing subsystem, or the Apache port or whatever.  But
the vast majority of the tree I am never going to edit.  Why should I
have duplicates of ALL of that code?  The answer is I shouldn't.

Last, the point about duplicate inodes.  Not all systems out there are
modern, not all of them have things some developers take for granted
(like loadable kernel modules, large file support, practically infinite
numbers of inodes).  Some of us have to contend with smaller systems,
and there is no reason that the filesystem stress should be double what
it currently is, just because a text base is "easy".

Baring all the above in mind, I would like to propose a solution.  First
up is a description of the actual problem domain.  This is a client side
issue, and should have no bearing on the server.  I see all changes
being
completely handled by the client, with nothing (beyond perhaps a default
preference) set in the server (with one caveat, see below).  So, my
proposed 
solution is for the following problem:

  "Design a system that provides all the current functionality of
   the duplicated-contents text-base approach in a way that minimalizes
   the actual duplication of data, or, ideally, eliminates it."

First and foremost, there should be a new config file option, nominally
called "text_base_method".  This can have (currently) four possible
values:
duplicate, compress, copy_on_edit or checksum.  The semantic meanings of
these
values are:

  duplicate - duplicate the file verbatim. IE, as things currently
stand.
  compress - duplicate the file, but compress it, using any compressor
  copy_on_edit - the meat of my proposed change. See below.
  checksum - the bones of my proposed change. See below.

For the "compress" method, it would be nice to allow the user to choose
the compressor they want to use, as opposed to hard-coding a solution
into the client via something like Zlib.  To this end, perhaps there
should
be two other config options: "compress_pipe" and "decompress_pipe",
whose
values are the commands that can be opened as a pipe to compress or
decompress a file, respectively.  Designing this is not the issue at
hand
however.

The third possible value, "copy_on_edit" should be almost
self-explanatory.
It implies a new client side command, "svn edit".  With this text-base
method, when a client retrieves a WC, it simply stores a checksum and
date/size properties for the files in a flat-ASCII file in .svn.  If the
user wants to edit the file, they first issue an "svn edit" command with
the name of the file.  This command then copies over the current file
contents into .svn/text-base, and marks the entry in the flat-ASCII file
as being edited.  This then allows the user to do local diffs, revert
files easily, and do small diffs on commit.  It essentially provides all
of the current text-base functionality, simply delayed.  There are other
advantages to an "svn edit" command.  Without wanting to distract you
and open up a rat-hole discussion (please just CONSIDER these ideas,
lets not debate them in this thread), an "svn edit" would enable us to:

  o  Store at the root a list of all changed files for almost
instantaneous
     determination of changed files in a tree (a BIG issue for large
trees).
     Think of how useful an "svn editing" command would be, that could
     instantly tell you what files you have changed in a tree.
  o  Notify the repository of the intent to edit, such that other users
who
     do an svn edit of the same file can receive a gentle reminder that
they
     MAY be in danger of a conflict
  o  Possibly even provide a respository administrator the option of
enforcing
     the notion of a lock-modify-unlock approach to versioning, while
keeping
     all of Subversion's other features in play

The primary objection I see to this method is the obvious "what happens
if
a user changes a file without first issuing an svn edit".  Well, that
case
could be handled by a policy setting in a server config, or client
config,
or even just in established practice.  If the user makes changes, they
should
be allowed to keep them, but they will pay a small penalty for having
forgotten to do the svn edit first.  They will not be able to do a
revert
or local diff, but they SHOULD still be able to diff the file or make a
commit, as long as they have access to the repository.  It will slightly
increase the client-to-repository traffic, and if that is inconvenient
for the user, then they will soon learn to remember to svn edit.  It
will
even be possible to revert, again, as long as they have access to the
repository.  However, since MOST users are connected (i.e very few do
this stuff on planes or in the space shuttle), this is likely to be a
very small, barely noticable problem.  This is a great segue into the
forth text base mechanism.

The last mechanism is the "checksum".  This text base method assumes
constant access to the repository, and never duplicates files.  All it
does is maintain the flat ASCII database of the files checksum, size
and modification times.  Any attempt to svn diff, revert or commit
requires access to the repository, and the client will retrieve the
original contents from the server and then resume normal operation as
it currently does.  Yes, on commit this implies a double-download
penalty, but for most installations, I bet thats less painful (because
it is rarer) than a fully duplicated source base.  And besides, it
is optional.  It is also better to do a download-then-diff-then-submit
than to submit the entire contents of the changed file and let the
server do the diffs, because this would involve changing the server,
and for people with direct but slow access to the repository, chances
are they are on an ADSL line that has higher downstream than upstream
speeds.

The one thing I cannot decide on (I can go either way on this) is
whether
the options for the text base should be set in the server config file
or really in the client config as I described above.  I kinda like the
idea of a repository maintainer having the ability to control this, but
I also like the idea of the client knowing whats best for their own
particular needs.  I think the best possible approach would be to allow
the server to set the default, and allow clients to over-ride it.
Possibly
even add the ability for the respository maintainer to enforce a
particular
method.  For example, in the server's config:

  default_text_base_method = copy_on_edit
  allow_client_method_override = true # or false to enforce default
method

Anyway ... thats my idea.  Let the flames begin ... I have my asbestos
suit on :)

Kean


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Garrett Rooney <ro...@electricjellyfish.net>.

On Tuesday, December 17, 2002, at 05:04 AM, Nuutti Kotivuori wrote:
>
> I am going to be mean and ignore most of the points you had in your
> message. You can imagine the trivial arguments such as "get a better
> filesystem", "fix your tools", "use 'svn export' for builds, not a
> working copy" and so on to everything else.

just for the record, the current implementation of svn export actually 
does a 'svn checkout', then goes and cleans up the .svn directories, so 
at least for doing the export itself, you still need twice the space.

-garrett


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Rafael Garcia-Suarez <ra...@hexaflux.com>.

Julian Fitzell wrote:
> 
> I nearly suggested 'svn export' earlier but the disadvantage is that 
> you're not going to get any kind of diffy update.  The next day when you 
> need to build HEAD again, you're going to have to a complete checkout. 
> But since you say bandwidth is not an issue for you maybe this is fine.

Excuse me for jumping into the discussion :

I've a project in a subversion repository that (for now) can't be built
into a working copy, because the build process copies spurious .svn/*
files. On the other hand I don't want to do a complete export each time
one of the files is updated. So what do I do ? One of the solutions is
to rsync(1) my wc/ directory to a build/ directory with the command-line
option "--exclude .svn/". That's very fast and works well.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Julian Fitzell <ju...@beta4.com>.

Kean Johnston wrote:
>>message. You can imagine the trivial arguments such as "get a better
>>filesystem", "fix your tools", "use 'svn export' for builds, not a
> 
> Believe you me, I wish I COULD get a better filesystem. But I can't.
> This is an old product. It doesn't even have vnodes, so I can't even
> BUY filesystems like vxfs, let alone port any of the free ones. But
> thank you for the tip on svn export. I missed that, and you are right,
> for build machines that indeed fixes the problem completely. It MAY
> even fix it for developers too. Since most of the tree is read-only
> to most developers, perhaps they can export the tree, and then
> svn checkout those portions where they will be doing real work. Is
> there any problem with mixing and matchign exported and checked out
> trees? I ask before I try because that's easier :)

I nearly suggested 'svn export' earlier but the disadvantage is that 
you're not going to get any kind of diffy update.  The next day when you 
need to build HEAD again, you're going to have to a complete checkout. 
But since you say bandwidth is not an issue for you maybe this is fine.

As for whether they'll play nice together... I'm guessing you won't be 
able to check out parts because they'll be obstructed, but that is a 
pretty minor issue.  Either you can come up with some set of command 
line flags to overwrite them (does --force do this?) or write a set of 
scripts to help you with this process.  You can certainly check out part 
of the repository into the middle of a filesystem that contains an 
exported tree.  The problem, however, is that you can't go to the top 
level and do "svn update".

So it may not be an ideal situation in either case, but it might be 
workable with a little thought (particularly in the case of automated 
build machines, which is what I was originally going to suggest it for).

> [snip]

Julian

-- 
julian@beta4.com
Beta4 Productions (http://www.beta4.com)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Nuutti Kotivuori <na...@iki.fi>.

Kean Johnston wrote:
> Is there any problem with mixing and matchign exported and checked
> out trees? I ask before I try because that's easier :)

There is no trouble having upper directories exported and only the
modified directory as a working copy. But it is going to be a bit more
tricky if you want to have an exported subtree inside a normal working
copy. I don't think we really support this. 

And exported tree is just a tree, it has nothing to do with a working
copy and nothing will recognise it as such. So what it would look like
to subversion is that you are missing all the .svn directories inside
one subtree, which may or may not work so gracefully. If you manually
mangle the entries file some, I think subversion could be convinced to
"forget" the existence of such a directory - and it might work.

And, there's no way to "update" an exported copy - so I don't think
your developers would like that.

>> And this really doesn't sound trivial to get done. Elimination of
>> text-bases is a nice thing - but the elimination of the need to
>> traverse is what makes it a killer feature. I hope you can come up
>> with a reasonable solution to this.
> 
> How about this. All we need is easy access to the very root of a
> tree. I don't know what functions exist to find it but I am sure
> they must exist, or would be trivial to implement. Once we can, from
> within in WC, get an easy pointer to the root, all we need to do is
> store a simple file in the root's admin directory of files edited.

[...]

Yes, that is one way of doing it. But one must take care what happens
when parts of the working copy get moved, updated, removed and
re-added. And 'svn commit' is only one function which needs that
information. Everything, 'svn status', 'svn diff', 'svn revert',
etc. need to have access to that list as well. And the usage ofcourse
differs. For example, 'svn status' should either normally only output
the data for changed files (and not show question marks for unknown
files) or there should be flag for 'svn status' that does that. And
ofcourse 'svn status -v' would force a tree-walk in any case. 'svn
diff' on the other hand should only use the edited information, as
should commit.

I'm not saying it can't be done, I'm saying it's not trivial. So, I
wish you luck in achieving this.

-- Naked

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Kean Johnston <jk...@sco.com>.

> message. You can imagine the trivial arguments such as "get a better
> filesystem", "fix your tools", "use 'svn export' for builds, not a
Believe you me, I wish I COULD get a better filesystem. But I can't.
This is an old product. It doesn't even have vnodes, so I can't even
BUY filesystems like vxfs, let alone port any of the free ones. But
thank you for the tip on svn export. I missed that, and you are right,
for build machines that indeed fixes the problem completely. It MAY
even fix it for developers too. Since most of the tree is read-only
to most developers, perhaps they can export the tree, and then
svn checkout those portions where they will be doing real work. Is
there any problem with mixing and matchign exported and checked out
trees? I ask before I try because that's easier :)

> And this really doesn't sound trivial to get done. Elimination of
> text-bases is a nice thing - but the elimination of the need to
> traverse is what makes it a killer feature. I hope you can come up
> with a reasonable solution to this.

How about this. All we need is easy access to the very root of a
tree. I don't know what functions exist to find it but I am sure
they must exist, or would be trivial to implement. Once we can, from
within in WC, get an easy pointer to the root, all we need to do is
store a simple file in the root's admin directory of files edited.
This would make an "svn editing" command (to show what files are
being edited) a trivial case of cat-ing that file, and a commit
from the root a trivial case of commiting just the files in that
database.  What gets trickier is if you do an svn commit from
somewhere OTHER than the root. At that point, we would either have
to resort back to traversal, or get clever with the entries in
the "being edited" list and determine which of them fall below
the current directory when the svn commit was run.

At all times, of course, the user should be able to commit just
a subset of the edited files. I frequently start work on one
bug, get pulled aside to fixing an escalation or helping with
packaging or whatever, and only need to check in a portion of
the files I have edited. But nothing I am suggesting precludes
that.

The notion of being able to get a list of all changed files quickly
is a very useful one. Especially in the case I described above,
where I may have edited files to solve more than one bug, but I
want each changeset to represent the fix for an individual bug.
Being able to:

  $ svn editing > /tmp/foo
  $ vi /tmp/foo ... remove everything I don't want checked in
  $ svn commit `cat /tmp/foo`

is very useful, and, I suspect, frequently wished for. Yes it
is possible now but it traverses the tree, and when you are
talking large trees, that's a large penalty.

Kean


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Nuutti Kotivuori <na...@iki.fi>.

Kean Johnston wrote:
>   "Design a system that provides all the current functionality of
>    the duplicated-contents text-base approach in a way that
>    minimalizes the actual duplication of data, or, ideally,
>    eliminates it."

[...]

> Anyway ... thats my idea.  Let the flames begin ... I have my
> asbestos suit on :)

I am going to be mean and ignore most of the points you had in your
message. You can imagine the trivial arguments such as "get a better
filesystem", "fix your tools", "use 'svn export' for builds, not a
working copy" and so on to everything else. And for the 'svn edit' use
cases I'd say that it's better not to even mention advisory locks and
developer communication in this context. Let's concentrate on 'svn
edit' behaviour with regard to this feature.

In my opinion, the main point that stands on it's own there is indeed
"detection of changes without a tree-walk". And this is a tough nut to
crack. It will require reworking the entries parsing logic, perhaps
adding new fields there - it will also either require settling to a
model where every directory still has to be traversed or bubbling up
changes to all parent working copies. Perhaps even new files in the
working copy that simply signify a "not-modified" situation.

And this really doesn't sound trivial to get done. Elimination of
text-bases is a nice thing - but the elimination of the need to
traverse is what makes it a killer feature. I hope you can come up
with a reasonable solution to this.

-- Naked

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Mark Bainter <ma...@webtech.dresser.com>.

Bob Gustafson [bobgus@rcnChicago.com] wrote:
> Hmm, the file that the client user wants to edit is already on the client -
> in the workspace.
> 
> If the client user edits a pristine file, we need a mechanism that will
> copy the pristine file to the .svn space *before* the edited file is
> written back to the workspace.
> 
> I don't know how to implement that magic. Need some brainstorming.
> 
> Maybe work in a modified shell space that senses when a write will happen?
> Or maybe make the pristine file read-only and then have the editor (vim?)
> say "This file is read-only, do you want to stash a pristine copy in
> .svn?", or something like that.
> 

While I can appreciate that that would certainly be a great way to
implement it, and would then still not require network connectivity
or bandwidth usage I think it'd be difficult to implement in a portable
fashion, let alone implement it reliably.  

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Bob Gustafson <bo...@rcnChicago.com>.

Hmm, the file that the client user wants to edit is already on the client -
in the workspace.

If the client user edits a pristine file, we need a mechanism that will
copy the pristine file to the .svn space *before* the edited file is
written back to the workspace.

I don't know how to implement that magic. Need some brainstorming.

Maybe work in a modified shell space that senses when a write will happen?
Or maybe make the pristine file read-only and then have the editor (vim?)
say "This file is read-only, do you want to stash a pristine copy in
.svn?", or something like that.

BobG

Kean Johnston wrote:
 Mark Bainter wrote:
>> I have an alternate proposal for copy-on-edit.  Do the checksums, but
>> instead of svn edit, just wait until the user actually tries to run a
>> command that requires the original source, then pull that
>> down and keep it around until the changed file gets checked in.
>Sounds like you want a 5th mechanism ... retrieve_on_edit :)
>
>Kean
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Kean Johnston <jk...@sco.com>.

> I have an alternate proposal for copy-on-edit.  Do the checksums, but
> instead of svn edit, just wait until the user actually tries to run a
> command that requires the original source, then pull that 
> down and keep it around until the changed file gets checked in.
Sounds like you want a 5th mechanism ... retrieve_on_edit :)

Kean


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Mark Bainter <ma...@webtech.dresser.com>.

Kean Johnston [jkj@sco.com] wrote:
> The third possible value, "copy_on_edit" should be almost
> self-explanatory.
> It implies a new client side command, "svn edit".  With this text-base
> method, when a client retrieves a WC, it simply stores a checksum and
> date/size properties for the files in a flat-ASCII file in .svn.  If the
> user wants to edit the file, they first issue an "svn edit" command with
> the name of the file.  This command then copies over the current file
> contents into .svn/text-base, and marks the entry in the flat-ASCII file
> as being edited.  This then allows the user to do local diffs, revert

I agree with you that this is wasteful.  On the typical small opensource
project it doesn't matter much, but it doesn't scale well to large projects.
I do think it is a useful feature though, and I like your approach of 
just making it optional.  

However, I really don't like the concept of having to do a command for
each file I'm going to edit.  This seems like a lot of unnecessary hassle.
I have an alternate proposal for copy-on-edit.  Do the checksums, but
instead of svn edit, just wait until the user actually tries to run a
command that requires the original source, then pull that down and keep
it around until the changed file gets checked in.

This gets you the benefit of not having a complete copy of the sources,
and subsequent calls that require the source would be fast/cheap.

svn co ....
<edits>
svn diff (file gets downloaded)
<edits>
svn diff (runs off of previously pulled pristine copy)
svn commit (diff is sent, pristine copy gets deleted)

I honestly have not even looked at much of the svn code, so I 
have no idea how workable this thought is.  I'm just throwing it
out there for consideration.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Scott Lamb <sl...@slamb.org>.

Kean Johnston wrote:
> These are what I see as the negatives.
> 
> o  Thwarts tools like cscope that scan for all source files
> o  Makes `find . -name \*.[cCh]|xargs grep SOMETHING` much less useful

You must be using a pretty old version. The text-base files are all 
called .svn/text-base/*.svn-base now. So "find . -name '*.[cCh]'" no 
longer pulls them up.

> o  Uses exactly double the space, which is wasted in read-only
> environments
> o  Doubles the workload of any directory-traversal tools
> o  Doubles the inode count (this IS an issue!)

Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Karl Fogel <kf...@newton.ch.collab.net>.

I understand your arguments about the cost of text-bases, and it might
be nice if they were optional.  However, this would require
significant changes to the design of libsvn_wc, and that's not going
to happen before 1.0.

If you can come up with a simple and obviously correct patch to
implement it before then, that would be great.  But I'm pretty certain
that any such patch will not be simple.

By the way, your list of the disadvantages says:

> o  Thwarts tools like cscope that scan for all source files
> o  Makes `find . -name \*.[cCh]|xargs grep SOMETHING` much less useful

No, we protect against these problems by giving those files special
extensions.  You shouldn't be having these problems; if you are,
please report them as bugs.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Kevin Pilch-Bisson <ke...@pilch-bisson.net>.

On Mon, Dec 16, 2002 at 10:17:10PM -0800, Kean Johnston wrote:
> > The prime benefit which motivated the text-base was limiting 
> > the network bandwidth used by commits, as I understand it.
> Would a copy-on-edit model not address that and the speedy
> diff requirement?
> 
> > Well, not exactly.  It turns out that for every file in the working
> > directory, there are four in the admin area--text base, props, prop
> > base, wcprops.  We should probably trim this to two (text base and
> > props+propbase+wcprops), but that still leaves only a 50%
> I don't know the code and I am sure you do, but how genericaly useful
> is the filesystem-inside-DB code? Is it worth thinking about having
> the admin area be such a database rather than actual physical files?
> 
I've thought about that too, why not have props and props-base stored in a
lookup db, instead of our own funky text format?
-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kevin Pilch-Bisson                    http://www.pilch-bisson.net
     "Historically speaking, the presences of wheels in Unix
     has never precluded their reinvention." - Larry Wall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: text-base penalty: A proposed solution

Posted by "Kirby C. Bohling" <kb...@birddog.com>.

On Mon, 2002-12-16 at 19:35, Scott Lamb wrote:
> Kean Johnston wrote:
> >>>cheap, and lets face it, the vast majority of developers are likely
> >>>to be connected to a LAN, not a WAN.
> >>
> >>That may be true for ClearCase and other commercial RCSs, but it is 
> >>definitely not true for Subversion, which intends to replace 
> >>CVS in the 
> >>open source world. I think you're in the minority here.
> > 
> > Not sure how the intention to replace cvs in any way proves that most
> > developers likely to use SVN will be wan connected, not lan connected.
> 
> Almost all open source projects have many developers who are not 
> geographically close. They function over the Internet. Overwhelmingly 
> CVS is used in this way, and given Subversion's goals, it would be 
> unreasonable to assume it won't be used in the same way.
> 

Okay, there is a difference between stating, almost all OSS people use
CVS, versus saying almost all users of CVS are OSS people.

Every development shop I've worked at, except one, uses CVS on every
project they do.  The one I didn't has a home grown, because CVS wasn't
invented there (they are big on "only invented here" technology).  Every
last one of them was on a LAN.  There is a large user base of LAN based
CVS people, claiming there isn't is wrong.  Everybody I know who
develops at a shop I know if uses CVS.  I think that's roughly 10
different places.  One of them is a spin off of a Fortune 500 company,
so they aren't all tiny places either.

I will agree with the first statement.  You're going to have to give me
some hard facts on to get the second on to be true.  Then I'm going to
question how you got those stats.  They'd be interesting to see either
way.

My two cents, is that Subversion is over engineered for low bandwidth
situations, and it drives me insane how slow it is when run on the same
machine.  It takes 2 hours to do things CVS does in 3-5 minutes when
used locally.  It's been slow since the day I first found out about it
almost a year ago (it's gotten lots better, but it's still slow).  It'll
continue to be slow for that use case until it correctness is had, and
speed is a priority over new features and bug fixes.  All that said,
fast/easy branches, atomic commits are more then worth the wait, both
for 1.0 and for the time waiting at a command line.

I've never used CVS over a slow line, so I'm just not very appreciate of
just how cool it is I guess.  

	Thanks,
		Kirby

-- 
Real Programmers view electronic multimedia files with a hex editor.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Kevin Pilch-Bisson <ke...@pilch-bisson.net>.

On Mon, Dec 16, 2002 at 07:35:32PM -0600, Scott Lamb wrote:
> I'm having trouble seeing the "double the disk space" as a significant 
> problem. Are you in an environment in which you develop over the LAN but 
> the extra disk space is a significant expense? Do you know of people who 
> are? It sounds to me like this is a hypothetical situation, not one 
> you've actually encountered. You've also said disk prices are higher in 
> other countries...do you have a figure?
> 
The total of all my working copies at work is somewhere on the order of 50GB.
So to double that, I would need a new disk.  I would really love to see the
duplicate on edit model, especially since it could also improve directory
traversal times dramatically.
-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kevin Pilch-Bisson                    http://www.pilch-bisson.net
     "Historically speaking, the presences of wheels in Unix
     has never precluded their reinvention." - Larry Wall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: text-base penalty: A proposed solution

Posted by Branko Čibej <br...@hermes.si>.

Martin Pool wrote:

>Perhaps a partial solution for people with very large trees would be
>to export the parts they don't edit, and then overlay it with a
>checkout of the parts they actually edit?  I suppose updating the
>nonedited parts will be a pain.
>
Oh, I don't now. If I understand correctly, incremental patches are 
available for the whole kernel, for each significant branch. Moset of 
the patch application could probably be automated.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Martin Pool <mb...@samba.org>.

On 19 Dec 2002, <br...@xbc.nu> wrote:

> This idea has been thrown in the pool before, but it won't work, even
> ignoring the fact that hard links are a Unix thing. The text base is
> _not_ the same as the working file. The working file is text base +
> end-of-line conversions + keyword expansions.

I didn't know that.  Thanks for the information.

> I can't imagine a person actively editing the whole LK tree, for
> example. She'll have to have the whole source available, but only small
> subtree actually has to be a Subversion working copy. The rest can be
> just a (read-only) copy of the sources.

Perhaps a partial solution for people with very large trees would be
to export the parts they don't edit, and then overlay it with a
checkout of the parts they actually edit?  I suppose updating the
nonedited parts will be a pain.

-- 
Martin 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Branko Čibej <br...@xbc.nu>.

Martin Pool wrote:

>I'll just throw this out because nobody has mentioned it yet:
>
>Some kernel developers I know make heavy use of trees of hardlinks.
>If you make sure your editor creates a new file on save then this
>essentially gives you a large number of copy-on-write trees.  Not only
>are they cheap in disk space, but it also saves buffer memory and
>makes a smart diff run quickly.
>
>I wonder how well Subversion would cope with the text-base being
>initially a hardlink to the working copy?  Or with two wcs that are
>mostly hardlinks?
>
This idea has been thrown in the pool before, but it won't work, even
ignoring the fact that hard links are a Unix thing. The text base is
_not_ the same as the working file. The working file is text base +
end-of-line conversions + keyword expansions.

>The drawback is that if your editor doesn't break the link and
>therefore also writes to text-base then things will get very
>confusing.
>
As far as I'm concerned, the fact that Subversion doesn't control the
editor is sufficient reason to drop this idea.

>Some wise person wrote "more computing sins have been committed in the
>name of efficiency (without necessarily achieving it) than for any
>reason."
>
>If your trees are this big it might be time to split them into
>modules.  (I suppose if they're that big and that old then it's
>infeasible though.)
>
I can't imagine a person actively editing the whole LK tree, for
example. She'll have to have the whole source available, but only small
subtree actually has to be a Subversion working copy. The rest can be
just a (read-only) copy of the sources.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Martin Pool <mb...@samba.org>.

I'll just throw this out because nobody has mentioned it yet:

Some kernel developers I know make heavy use of trees of hardlinks.
If you make sure your editor creates a new file on save then this
essentially gives you a large number of copy-on-write trees.  Not only
are they cheap in disk space, but it also saves buffer memory and
makes a smart diff run quickly.

I wonder how well Subversion would cope with the text-base being
initially a hardlink to the working copy?  Or with two wcs that are
mostly hardlinks?

The drawback is that if your editor doesn't break the link and
therefore also writes to text-base then things will get very
confusing.

Some wise person wrote "more computing sins have been committed in the
name of efficiency (without necessarily achieving it) than for any
reason."

If your trees are this big it might be time to split them into
modules.  (I suppose if they're that big and that old then it's
infeasible though.)

-- 
Martin 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Branko Čibej <br...@xbc.nu>.

Kean Johnston wrote:

>>I'd argue that the multiple ways does hamper my usage. To repeat what 
>>Greg Hudson said, the more complex code is more difficult to maintain
>>    
>>
>That's a nebulous statement. We don't KNOW how much it will complicate
>the code because no-one has written it yet. 
>
Ah, but we do. :-)

We know what has to be done to support this model (and don't
misunderstand me -- I strongly believe Subversion should support it),
and we know the state of our working-copy management library. It's a sad
mess, and before we dare fiddle with it too much, we must sit down and
create a design for it -- something we don't really have.

So there it is. We can either add yet another hack, destabilising
libsvn_wc even further -- note that right now, there are way too many
corner cases it doesn't cover satisfactorily -- or wait for the
sobstantial reqrite that's on the way. Any help with the redesign would
be most helpful, of course.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Justin Erenkrantz <je...@apache.org>.

--On Tuesday, December 17, 2002 1:00 AM -0800 Kean Johnston 
<jk...@sco.com> wrote:

> I am still intrigued by the idea of using a DB for the admin area.

If you use a Berkeley DB database for the admin area, then your 
working copy wouldn't be portable.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Kean Johnston <jk...@sco.com>.

> Okay. With these details, this is much more clear to me. I 
> understand it is a real problem for you. And the same for the
> inodes to some extent.
*whew* :)

You know I must be honest, one of the things that I really LIKE
about this list is that people don't take things on faith and
make me really question myself and analyse the situation. I think
debate is very healthy and in the long run, can only be useful,
so thanks :)

I am still intrigued by the idea of using a DB for the admin area.
It would reduce the inode count (although as you pointed out it
can be A problem, just not a HUGE one), but more importantly, it
will help speed up directory traversal. It may make some things
fractionally slower, like local diffs and commits, because if
ALL files (including a full text base) are in the database then
you first need to extract it from the db to run diff on it, but
that penalty is fairly small (my gut tells me).

I am going to spend a few days going through the WC code with
a fine tooth comb, to better judge just how destabalizing it
would be to add the multi-method approach I mentioned in my
original mail. I think once we understand the real impact as
opposed to the perceived impact, that we will be better armed
to make a decision.

If anyone else wants to do this along side of me, that would
be great. Two sets of eyes are always better than one, and
three are better than two.

Kean


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Scott Lamb <sl...@slamb.org>.

Kean Johnston wrote:
>>Almost all open source projects have many developers who are not
> 
> And the vast majority of CVS users are not open source developers,
> I'll warrant. There are some development shops with litterally
> thousands of employees who use CVS. There are more than one of
> that size shop, and there are a plethora of other smaller shops
> all of whom use CVS. Although I don't have real numbers, lets take
> a wild-ass-guess and say there are 100000 people on the plannet who
> use cvs. I'd be VERY surprised if more than about 2000 of those are
> OpenSource developers who *REGULARLY USE CVS* (as opposed to the
> occasional cvs update).  That's 2% of the total cvs market. Maybe
> the guess is several orders of magnitude wrong. Maybe its 10% of
> all CVS users are open source developers. That still leaves 90 000
> people who you are targeting and whose needs you are completely
> ignorning.

Well, you've made me question one of my assumptions, anyway. I've always 
assumed that proprietary shops were much more likely to choose 
ClearCase, Perforce, etc. I can't immediately think of a way to prove 
this one way or the other.

Even if you are right, you might find it a bit difficult to sell changes 
really weighted toward the other model, because anyone working on 
Subversion (itself an open source project) would be naturally biased 
toward them.

>>I'd argue that the multiple ways does hamper my usage. To repeat what 
>>Greg Hudson said, the more complex code is more difficult to maintain
> 
> That's a nebulous statement. We don't KNOW how much it will complicate
> the code because no-one has written it yet. But I am guessing it
> will add SOME complexity to the WC code, but it wont increase it
> by an order of magnitude, so I really don't buy that as an argument.
> If code complexity was a ruler by which you measured suitability,
> we wouldn't have GCC, X11, or even SVN. We'd be with TTY's, shell
> scripts and SCCS.

It may be a nebuluous statement, but it's almost guaranteed to be true. 
Greg Hudson said that from a perspective of a developer. From more of a 
user perspective, it's not uncommon for me to experience working copy 
weirdness. So I'm a bit frightened by the idea of making the existing 
system more complex when it is already not working perfectly for me.

GCC and X11 are mature systems - they weren't originally designed with 
all the complexity they had now. There was much more time for debugging 
and refactoring between stages. Even so, if new alternatives appeared 
that what I needed as well with less code, I'd probably switch.

And I think in those projects there is a lot of amount of resistance to 
complexity without sufficient justification. You've made it pretty clear 
to me anyway how important this change is to you, but it took the 
details in this latest message to do it.

>>Are you in an environment in which you develop over 
>>the LAN but the extra disk space is a significant expense?
> 
> I wouldn't say a "significant" expense but it is certainly
> and avoidable one. In todays economic climate where a lot of
> companies are surviving by the skin of their teeth, it is hard
> to justify buying $600 72G SCSI drives when we already have
> perfectly good workstations that can cope. Moving to a tool
> that would require is to upgrade every deevelopers machine
> just because someone thought the ability to do local diffs
> was a justification for double disk usage is really not on
> in the real world.

Okay. With these details, this is much more clear to me. I understand it 
is a real problem for you. And the same for the inodes to some extent.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Kevin Pilch-Bisson <ke...@pilch-bisson.net>.

On Tue, Dec 17, 2002 at 01:19:26AM -0500, Scott Lenser wrote:
> I question the 100% penalty, although I see some increase.  How much
> of these sizes is due to the sources (for which there is a 100%
> penalty) and how much of these sizes is due to binaries (which presumably
> wouldn't be checked in?  Or are the binaries checked in for some reason?
> 
Well it is still a 100% penalty on storing the sources, not including the
binaries.  This can be a factor for shops that mirror a huge source tree on a
public server so that developers can debug components other than their own
without having to have sources for them.  When you have a daily build
performed by the lab, and you need to archive the sources for each of the last
n builds, if you have to have text-bases, then you suddenly can only archive
n/2 builds.

Also at least one large shop that I know of checks in all the (static) library
files used in the build process, so that developers only have to checkout
their own component to build it (instead of having to build the whole
product).  This can make a big difference.
-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kevin Pilch-Bisson                    http://www.pilch-bisson.net
     "Historically speaking, the presences of wheels in Unix
     has never precluded their reinvention." - Larry Wall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: text-base penalty: A proposed solution

Posted by kb...@gte.net.

Scott Lenser wrote:
>>>I'm having trouble seeing the "double the disk space" as a 
>>>significant problem.
...
> 
> 
> I question the 100% penalty, although I see some increase.  How much
> of these sizes is due to the sources (for which there is a 100%
> penalty) and how much of these sizes is due to binaries (which presumably
> wouldn't be checked in?  Or are the binaries checked in for some reason?

I have a project that is pure perl, no "binaries" (ignoring documentation in 
.doc files or .gif's to support the project, both of which I consider 
"source").  A simple "du" in the wc dir shows 7582 blocks.  Summing all the 
.svn dirs totals 3866 blocks, or 50.99% of the total size.  So I think his 
"double the disk space" comment is quite correct.

Kevin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Kean Johnston <jk...@sco.com>.

> > I must be honest, I find it quite hard to believe that people
> > can defend a system where there is a 100% decrease in
> > efficiency, that in many cases will NEVER be used (refer back to
> > my original post about build machines). That's what the current
> > text-base penalty is ... it is a *100 percent* increase in
> > resource load.
> > 
> 
> I question the 100% penalty, although I see some increase.  How much
> of these sizes is due to the sources (for which there is a 100%
> penalty) and how much of these sizes is due to binaries 
> (which presumably
> wouldn't be checked in?  Or are the binaries checked in for 
> some reason?
98% sources, a few binaries that are checked into the tree as
binhex'ed "text files" because SCCS cant handle binary files.
Nut the vast majority of my build is source, which means it is
pretty darn close to a 100% space increase for a build that never
uses the VCS for anything other than an extraction tool. Even in
the case where there are actual humans editing stuff, since the
tree is so large, and each developer only works in a fraction
of that tree (but needs the full tree present for builds), the
penalty is more noticable.

Bear in mind I am working with a FULL UNIX implementation here,
not just a kernel. Sometimes, for some tasks, I actually need
two full sets of UNIX source code, plus the whole of Java, gcc,
apache and a plethora of open source libraries. It is a LOT of
source. Not as big as some projects, granted, but not trivial
either.

Kean


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Scott Lenser <sl...@cs.cmu.edu>.

> > I'm having trouble seeing the "double the disk space" as a 
> > significant problem.
> If what has been said before doesn't shopw you the problem I am not
> sure there is anything I can say that will convince you there is
> one. I will just point out that YOUR usage is significantly
> different to MY usage. I don't think a tool should force your
> usage model on me, and more than it should force my usage model
> on you. SVN is trying to penetrate a market that is dominated
> by a lightweight tool. Its light on everything except time,
> as some operations take CVS a long time to perform. As things
> currently stand, it seems as if SVN is much "heavier" than
> cvs INCLUDING on time, except in the case of cheap copies. I get
> the sense that because those are fast people assume that the
> rest of it is. Its not.
> 
> > Are you in an environment in which you develop over 
> > the LAN but the extra disk space is a significant expense?
> I wouldn't say a "significant" expense but it is certainly
> and avoidable one. In todays economic climate where a lot of
> companies are surviving by the skin of their teeth, it is hard
> to justify buying $600 72G SCSI drives when we already have
> perfectly good workstations that can cope. Moving to a tool
> that would require is to upgrade every deevelopers machine
> just because someone thought the ability to do local diffs
> was a justification for double disk usage is really not on
> in the real world.
> 
> > are? It sounds to me like this is a hypothetical situation, not one 
> Not at all. Most of our current developers have 18G SCSI hard
> drives. With those, you have just enough room to do a full get,
> build and PI run. That's if we just build OSR5. If we include
> the Java, UnixWare and other open source builds, as SOME of us
> do, then a 36G drive just fits, with about 4G to spare.
> 
> I must be honest, I find it quite hard to believe that people
> can defend a system where there is a 100% decrease in
> efficiency, that in many cases will NEVER be used (refer back to
> my original post about build machines). That's what the current
> text-base penalty is ... it is a *100 percent* increase in
> resource load.
> 

I question the 100% penalty, although I see some increase.  How much
of these sizes is due to the sources (for which there is a 100%
penalty) and how much of these sizes is due to binaries (which presumably
wouldn't be checked in?  Or are the binaries checked in for some reason?

- Scott Lenser




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Kean Johnston <jk...@sco.com>.

> Almost all open source projects have many developers who are not
And the vast majority of CVS users are not open source developers,
I'll warrant. There are some development shops with litterally
thousands of employees who use CVS. There are more than one of
that size shop, and there are a plethora of other smaller shops
all of whom use CVS. Although I don't have real numbers, lets take
a wild-ass-guess and say there are 100000 people on the plannet who
use cvs. I'd be VERY surprised if more than about 2000 of those are
OpenSource developers who *REGULARLY USE CVS* (as opposed to the
occasional cvs update).  That's 2% of the total cvs market. Maybe
the guess is several orders of magnitude wrong. Maybe its 10% of
all CVS users are open source developers. That still leaves 90 000
people who you are targeting and whose needs you are completely
ignorning.

> I'd argue that the multiple ways does hamper my usage. To repeat what 
> Greg Hudson said, the more complex code is more difficult to maintain
That's a nebulous statement. We don't KNOW how much it will complicate
the code because no-one has written it yet. But I am guessing it
will add SOME complexity to the WC code, but it wont increase it
by an order of magnitude, so I really don't buy that as an argument.
If code complexity was a ruler by which you measured suitability,
we wouldn't have GCC, X11, or even SVN. We'd be with TTY's, shell
scripts and SCCS.

> I'm having trouble seeing the "double the disk space" as a 
> significant problem.
If what has been said before doesn't shopw you the problem I am not
sure there is anything I can say that will convince you there is
one. I will just point out that YOUR usage is significantly
different to MY usage. I don't think a tool should force your
usage model on me, and more than it should force my usage model
on you. SVN is trying to penetrate a market that is dominated
by a lightweight tool. Its light on everything except time,
as some operations take CVS a long time to perform. As things
currently stand, it seems as if SVN is much "heavier" than
cvs INCLUDING on time, except in the case of cheap copies. I get
the sense that because those are fast people assume that the
rest of it is. Its not.

> Are you in an environment in which you develop over 
> the LAN but the extra disk space is a significant expense?
I wouldn't say a "significant" expense but it is certainly
and avoidable one. In todays economic climate where a lot of
companies are surviving by the skin of their teeth, it is hard
to justify buying $600 72G SCSI drives when we already have
perfectly good workstations that can cope. Moving to a tool
that would require is to upgrade every deevelopers machine
just because someone thought the ability to do local diffs
was a justification for double disk usage is really not on
in the real world.

> are? It sounds to me like this is a hypothetical situation, not one 
Not at all. Most of our current developers have 18G SCSI hard
drives. With those, you have just enough room to do a full get,
build and PI run. That's if we just build OSR5. If we include
the Java, UnixWare and other open source builds, as SOME of us
do, then a 36G drive just fits, with about 4G to spare.

I must be honest, I find it quite hard to believe that people
can defend a system where there is a 100% decrease in
efficiency, that in many cases will NEVER be used (refer back to
my original post about build machines). That's what the current
text-base penalty is ... it is a *100 percent* increase in
resource load.

> you've actually encountered. You've also said disk prices are 
> higher in 
> other countries...do you have a figure?
When I was living in south africa, a large SCSI hard drive
cost about 60-70% of a senior engineer's monthly salary.
That's a lot higher than it is in this country, or in the UK,
and a lot of development gets done in South Africa. I also have
friends in the Czech Republic where even mice are expensive
let alone SCSI hard drives :)

> Same with the inode problem. What system doesn't have enough 
> inodes for 
> your working copy? Does what Greg Hudson mentioned (going from 4*file 
> extra inodes to 2*file extra inodes) solve the problem?
Yes that would go a long way to solving it. And some surprisingly
modern systems have inode limitations. UnixWare 7 (SVR5) had ISL
defaults that limited you to 64K inodes. Yes you can increase
those by recreating the filesystem, and it was a bug, but still,
there are systems out there that have inode limitations. Its not
a HUGE deal, but again, if every file in the WC is now consuming
5 inodes, that's a 400% increase in resources over CVS.  When you
couple the 100% resource increase in disk space usage with the
400% increase in disk inode usage, you have to admit that makes
svn sound a little less than attractive as a replacement for cvs.

Kean


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Scott Lamb <sl...@slamb.org>.

Kean Johnston wrote:
>>>cheap, and lets face it, the vast majority of developers are likely
>>>to be connected to a LAN, not a WAN.
>>
>>That may be true for ClearCase and other commercial RCSs, but it is 
>>definitely not true for Subversion, which intends to replace 
>>CVS in the 
>>open source world. I think you're in the minority here.
> 
> Not sure how the intention to replace cvs in any way proves that most
> developers likely to use SVN will be wan connected, not lan connected.

Almost all open source projects have many developers who are not 
geographically close. They function over the Internet. Overwhelmingly 
CVS is used in this way, and given Subversion's goals, it would be 
unreasonable to assume it won't be used in the same way.

>>The pristine text base is a real advantage for me. Even though my 
> 
> Then by all means feel free to use it. What I proposed in no way hampers
> your current usage. But by only having one way of doing things it
> DOES hamper my usage.

I'd argue that the multiple ways does hamper my usage. To repeat what 
Greg Hudson said, the more complex code is more difficult to maintain 
and test, slowing other new features more important to me and causing 
regressions.

I'm having trouble seeing the "double the disk space" as a significant 
problem. Are you in an environment in which you develop over the LAN but 
the extra disk space is a significant expense? Do you know of people who 
are? It sounds to me like this is a hypothetical situation, not one 
you've actually encountered. You've also said disk prices are higher in 
other countries...do you have a figure?

Same with the inode problem. What system doesn't have enough inodes for 
your working copy? Does what Greg Hudson mentioned (going from 4*file 
extra inodes to 2*file extra inodes) solve the problem?

Thanks,
Scott

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Kean Johnston <jk...@sco.com>.

> > cheap, and lets face it, the vast majority of developers are likely
> > to be connected to a LAN, not a WAN.
> 
> That may be true for ClearCase and other commercial RCSs, but it is 
> definitely not true for Subversion, which intends to replace 
> CVS in the 
> open source world. I think you're in the minority here.
Not sure how the intention to replace cvs in any way proves that most
developers likely to use SVN will be wan connected, not lan connected.

> The pristine text base is a real advantage for me. Even though my 
Then by all means feel free to use it. What I proposed in no way hampers
your current usage. But by only having one way of doing things it
DOES hamper my usage.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Scott Lamb <sl...@slamb.org>.

Kean Johnston wrote:
>>Are you saying that hardware is going to get more expensive?  It's
>>possible, but it's not the way to bet.  At any rate, in 
>>countries where
>>disk is expensive, bandwidth also tends to be expensive, or so I
>>understand, so that doesn't really affect the tradeoff.
> 
> Internet bandwidth is expensive, not LAN bandwidth. Even in countries
> where computer infrastructure is expensive, the office LAN is relatively
> cheap, and lets face it, the vast majority of developers are likely
> to be connected to a LAN, not a WAN.

That may be true for ClearCase and other commercial RCSs, but it is 
definitely not true for Subversion, which intends to replace CVS in the 
open source world. I think you're in the minority here.

The pristine text base is a real advantage for me. Even though my 
connection to the server is almost always on, the speed advantage of 
"svn diff" over "cvs diff" is noticeable and I run that command frequently.

Scott

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Kean Johnston <jk...@sco.com>.

> Are you saying that hardware is going to get more expensive?  It's
> possible, but it's not the way to bet.  At any rate, in 
> countries where
> disk is expensive, bandwidth also tends to be expensive, or so I
> understand, so that doesn't really affect the tradeoff.
Internet bandwidth is expensive, not LAN bandwidth. Even in countries
where computer infrastructure is expensive, the office LAN is relatively
cheap, and lets face it, the vast majority of developers are likely
to be connected to a LAN, not a WAN.

> We already put a suffix on text-base filenames, so your "find" command
> should be unaffected.  I'm not sure how cscope works.
cscope works on extension. Yes the directory structure I was looking
at was ancient. You can scratch that negative off the list.

> here is the fundamental problem with your solution:
> 
>   The WC code is already really hairy.
Feh ... you should see some of the crap I have to deal with here :)
Hairy code doesn't scare me. But I think it CAN be done realtively
simply. Granted I havent examined ALL of the wc code in detail. I
plan on doing that over the next few nights.

>   * Anything which requires the WC code to behave differently, you'll
> have to find the resources for.
I volunteer.

>   * Anything which requires the WC code to become more complicated (by
> giving it N different ways of operating) will probably get vetoed
before
> 1.0.
That'd be a pity if it can be demonstrated to work.

> I've been thinking that the wc library might want to go 
> through a vtable
That makes sense but may be too destabalising for 1.0.

Kean


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Greg Hudson <gh...@MIT.EDU>.

On Tue, 2002-12-17 at 13:12, Bob Gustafson wrote:
> What is the issue here? I know it has something to do with poluting the
> license structure that currently exists for svn, but..

Two issues:

  * Our license may be incompatible with the GPL.  Or it may not.  Even
if it is incompatible, that may change, so this is the small issue.

  * If a GPL library is a required dependency, then people can't build
proprietary products on top of Subversion, and the project wants to
allow that.

(The license issue comes up on the list periodically; anyone should
search for "GPL" in the mailing list archives before adding to this
sub-thread.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Bob Gustafson <bo...@rcnChicago.com>.

What is the issue here? I know it has something to do with poluting the
license structure that currently exists for svn, but..

Part of Greg Hudson's email of 17 Dec 2002 10:26:31 -0500
>
>  * The only serious alternative to Berkeley DB I know about is gdbm,
>which is GPL, and we don't want Subversion relying on GPL libraries.  I
>imagine gdbm has its own flaws, probably in the performance area.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Justin Erenkrantz <je...@apache.org>.

--On Tuesday, December 17, 2002 10:26 AM -0500 Greg Hudson 
<gh...@MIT.EDU> wrote:

>   * The only serious alternative to Berkeley DB I know about is
> gdbm, which is GPL, and we don't want Subversion relying on GPL

GDBM also isn't thread-safe.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Garrett Rooney <ro...@electricjellyfish.net>.

Greg Hudson wrote:

>It would be kind of nifty to turn ".svn" into a file containing a
>database.  But I wouldn't want to see this happen for the following
>practical reasons:
>

just to throw out another (probably obvious) reason not to change the 
.svn directory into a database of some sort:  it makes the working copy 
far less developer friendly.  binary format databases are harder to work 
with than text files, it's harder to get useful bug reports if a user 
can't open up .svn/entries and tell us exactly what's in it.  overall, 
it significantly raises the bar for someone trying to understand what's 
going on with their working copy.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Greg Hudson <gh...@MIT.EDU>.

On Tue, 2002-12-17 at 01:17, Kean Johnston wrote:
> > The prime benefit which motivated the text-base was limiting 
> > the network bandwidth used by commits, as I understand it.
> Would a copy-on-edit model not address that and the speedy
> diff requirement?

Yup.  You'd get speedier diffs (read one file per directory, instead of
stat all the files; *really* speedy diffs requires another compromise)
and no text base penalty for unedited files, and you also dovetail
nicely into advisory locks.

But for the default mode of operation, we wanted to preserve the CVS
model where you don't have to tell svn before you edit a file.

> I don't know the code and I am sure you do, but how genericaly useful
> is the filesystem-inside-DB code? Is it worth thinking about having
> the admin area be such a database rather than actual physical files?

Oh, I wasn't thinking of using a database.  I was just thinking that
props are generally small.  It's okay to have to rewrite the props base,
props, and wcprops each time you change any of the above.

It would be kind of nifty to turn ".svn" into a file containing a
database.  But I wouldn't want to see this happen for the following
practical reasons:

  * We really want to allow working copies to work well in network
filesystems.  Databases and network filesystems don't mix; if they work
at all, performance usually sucks.

  * Berkeley DB periodically changes its format incompatibly, such that
the same API has different effects in different versions of operating
systems.  Red Hat 8.1's Subversion package might naturally build itself
against DB 4.1, while Red Hat 10.2's package might naturally build
itself against DB 5.  Now everyone's working copy is invalid.  That's no
good.

  * I think Berkeley DB has had an impact on our performance which is
difficult to understand.  (Certainly, before we started using duplicate
keys, it created an O(n^2) temporary disk space usage problem for
checkins of a large file.  That was poor.  But I think it continues to
have an impact today which we can't easily gauge.)

  * Berkeley DB has definitely introduced some scaling limitations of
the form "you have to go in and edit the DB configuration once you hit a
certain point."  Although it's possible we can work around that by doing
less work per DB transaction, that's another example of how it simply
doesn't do what it's supposed to all the time.

  * The only serious alternative to Berkeley DB I know about is gdbm,
which is GPL, and we don't want Subversion relying on GPL libraries.  I
imagine gdbm has its own flaws, probably in the performance area.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: text-base penalty: A proposed solution

Posted by Kean Johnston <jk...@sco.com>.

> The prime benefit which motivated the text-base was limiting 
> the network bandwidth used by commits, as I understand it.
Would a copy-on-edit model not address that and the speedy
diff requirement?

> Well, not exactly.  It turns out that for every file in the working
> directory, there are four in the admin area--text base, props, prop
> base, wcprops.  We should probably trim this to two (text base and
> props+propbase+wcprops), but that still leaves only a 50%
I don't know the code and I am sure you do, but how genericaly useful
is the filesystem-inside-DB code? Is it worth thinking about having
the admin area be such a database rather than actual physical files?

Kean


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: text-base penalty: A proposed solution

Posted by Greg Hudson <gh...@MIT.EDU>.

On Mon, 2002-12-16 at 07:27, Kean Johnston wrote:
> The documentation sings its praises, and posts from
> previous threads seem to indicate that people place a great deal of
> importance on the ability to edit files on aeroplanes

The prime benefit which motivated the text-base was limiting the network
bandwidth used by commits, as I understand it.

> A software solution should not rely on the *CURRENT* costs of hardware
> in *SOME COUNTRIES* as a means of justifying a solution

Are you saying that hardware is going to get more expensive?  It's
possible, but it's not the way to bet.  At any rate, in countries where
disk is expensive, bandwidth also tends to be expensive, or so I
understand, so that doesn't really affect the tradeoff.

> Such thinking thwarts innovation, which is usually a prime motivator
> for any software package in the first place.

On the contrary, most of the researchers I've talked to seem to believe
that innovation is easiest when cycle-counting isn't necessary.

> o  Thwarts tools like cscope that scan for all source files
> o  Makes `find . -name \*.[cCh]|xargs grep SOMETHING` much less useful

We already put a suffix on text-base filenames, so your "find" command
should be unaffected.  I'm not sure how cscope works.

> o  Doubles the workload of any directory-traversal tools
> o  Doubles the inode count (this IS an issue!)

Well, not exactly.  It turns out that for every file in the working
directory, there are four in the admin area--text base, props, prop
base, wcprops.  We should probably trim this to two (text base and
props+propbase+wcprops), but that still leaves only a 50% inode penalty
from the text base.  (Combining the text base with the props is bad
because then you have to rewrite the text base on each propset command.)

All that said, I'd like to eliminate the text base penalty as well.  But
here is the fundamental problem with your solution:

  The WC code is already really hairy.

As a consequence:

  * Anything which requires the WC code to behave differently, you'll
have to find the resources for.

  * Anything which requires the WC code to become more complicated (by
giving it N different ways of operating) will probably get vetoed before
1.0.

I've been thinking that the wc library might want to go through a vtable
just like the ra layer does (and like the fs layer is expected to). 
Maybe it's too hard to make a single implementation work well for
everyone.  I can imagine implementations which:

  * Are just like the current one, with separable subdirs and no
requirement for alerting svn before editing a file.

  * Stash all admin data in some separate location, are not at all
separable, require an alert to svn before editing a file, and support
fast scanning for changes.

  * Are actually repository transactions mounted via NFS or Samba from
an SVN server.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org