You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Peter N. Lundblad" <pe...@famlundblad.se> on 2005/11/27 22:29:36 UTC

Status of wc-propcaching branch

Hi,

I'd like to give a short status report on the wc-propcaching branch. The
original plan for wc-propcaching is now implemented. And it seems to work
pretty well.

The following things have changed regarding how properties are stored:
- There is no base-props file if there are no properties.
- There is no working props file if there are no prop changes.
- Thrre new fields have been added to the entries file:
  - has-props keeps track of wheter the entry has any (working) props.
  - cached-props: is a space-separated list of property names.
    If a property is mentioned here, the working props for this entry has
    a property of this name.  Only svn:needs-lock, svn_special and
    svn:externals may be present in this field.
  - prop-mods: Is true or false (attribute absent) depending on whether
    this entry has property modifications.
- The prop-time fields isn't present anymore.

The WC format number has been bumped to 6 (*) and loggy auto-upgrading
from earlier formats is implemented.  Functions that don't require a
write-lock on the WC directory work with old format WCs.

*) We already have a WC format bump in 1.4, but I added another one to not
break trunk-using developers' working copies.  Format numbers are cheap.

I've done some performance experiments (see below).

It has been suggested to store properties in one single file per
directory, both for regular props and wcprops.  I think that seems like a
good idea, but I think it falls outside of the scope of the propcaching
branch.  Also, I want to merge this work as early as possible in the 1.4
cycle to get it wider tested.

What we are waiting for now is that Erik wants to require checksums on all
files. Since that would require even another WC format bump, we think it
is best to do that on wc-propcaching before merging.

So, in short, I think wc-propcaching is approaching its merge back to
trunk and I want to encourege people to review it. If no one objects, I
want to merge as soon as Erik's work is done.  (Also, if anyone has a good
reason to add (or remove) any property from the cached-props field, it is
easier to do so before people start using this code in their working
copies.)

Now for some performance numbers:

I've done some experiments. I'm not very experienced in this area and I
only did tests on my local system (a Pentium celeron 1.7 GHz with 256 MB
of RAM), so these numbers are just hints giving indications of what
performance improvements one could expect. The tests are done on Linux
2.6.8 using the ext3 filesystem.

I have tested with one checked out GCC tree (from
svn://gcc.gnu.org/svn/gcc/trunk), one GCC tree with svn:eol-style property
added on each file and one Subversion working copy. Note that
svn:eol-style is not cached. The reason for using it was because none of
these commands read this property (I tested with a WC without any
modifications). The interesting difference is that when there were no
props (which is almost the case in the GCC tree), we don't need to read
the property files in some cases (in the old format), but instead use the
file size to detect that the property file is empty.

I tested three commands, which are common local operations: status,
wc-to-wc diff and commit. I tested with no local modifications at all in
the WC (that's why I call commit a local operation:-). This is the case I
want to modify, because most of the time, most of the working copy will be
unmodified.

In each column below, there are two numbers.  The first indicates the
performance impprovements when the disk cache was flushed. The second
number is an approx. average of four runs of the same command after the
first one, i.e. when the data is in memory.

            GCC tree         GCC tree w/prop          svn tree
svn st      11%  33%         64%  92%                 53%  25%
svn diff    12% 27%          30%  45%                 40%  0% (*)
svn ci      40%  86%         80%  95%                 78% 50%

*) There were small time differences, but it was hard to measure.


One could go into detail analyzing these numbers, but I'm not sure how
much that would give us. There are too many factors that can have an
effect on the numbers. If someone want to test this in more depth, feel
free to. It would also be nice to have some final test results on Windows.

(If you want my raw data for some reason, just ask.)

What I think is interesting is that we have improved performance for all
operations. On some operations (i.e. commit), we have dramatic
improvements.  In summary, I feel that this work has been worth it.

Comments, flames, questions?
Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Mon, 28 Nov 2005, Greg Hudson wrote:

> On Mon, 2005-11-28 at 17:36 +0000, Malcolm Rowe wrote:
> > If we need to cache more properties, bump the format, and re-populate
> > the cache from the properties - like we do when maybe-upgrading to the
> > wc-propcaching format in the first place.
>
> If we can design to avoid bumping the wc format, I think we should.
> Compatibility code is irritating to maintain (code to upgrade the wc
> format as well as code to handle the old format for read-only
> operations), and there's also some end-user pain associated with
> upgrades which involve wc format bumps.
>
The real problem as I see it is that the WC is not backwards compatible
after a bump. If a user uses several clients, we force her to upgrade all
clients at once, which may not always be possible.

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by Greg Hudson <gh...@MIT.EDU>.
On Mon, 2005-11-28 at 17:36 +0000, Malcolm Rowe wrote:
> If we need to cache more properties, bump the format, and re-populate
> the cache from the properties - like we do when maybe-upgrading to the
> wc-propcaching format in the first place.

If we can design to avoid bumping the wc format, I think we should.
Compatibility code is irritating to maintain (code to upgrade the wc
format as well as code to handle the old format for read-only
operations), and there's also some end-user pain associated with
upgrades which involve wc format bumps.

(I'm *not* saying we shouldn't bump the wc format in 1.4.  The
performance advantages of wc-propcaching are compelling enough to
warrant a bump.  But such bumps are not free.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Mon, Nov 28, 2005 at 11:44:50AM -0500, Greg Hudson wrote:
>   1. (Current) Declare the presence of three hardcoded properties.
> 

Maybe I'm missing something.  Why can't we use our knowledge of what the
wc format is to disambiguate which properties are cached and which aren't?

If we need to cache more properties, bump the format, and re-populate
the cache from the properties - like we do when maybe-upgrading to the
wc-propcaching format in the first place.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Mon, 28 Nov 2005, Greg Hudson wrote:

>   3. Declare once per entries file which properties we are caching the
> presence of, and later declare which of those is present.
>

To me, this means adding another field to the svn_wc_entry_t struct, a
pointer to a string of cachable props for this entry. This is inherited
form the THIS_DIR entry. This is quite cheap; a pointer per entry and a
string per directory.  We could have it in the access baton, but since
entries aren't always read when opening a baton (depth 0), this would
complicate other code.

>   4. Declare the presence of all properties.
>
This is the simple and ellegant solution.

> I'd advocate choosing between (3) and (4) based on how complex (3) comes
> out to be.
>
I agree that 3 and 4 are the best choices. I'll give 3 a shot and if it
doesn't turn out to be too complex, I'll use it. Else, I vote for 4. I
think 4 is only a problem in more or less pathological cases, but OTOH we
don't need to design more non-scalability than we need to:-)

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Tue, 29 Nov 2005, Peter N. Lundblad wrote:

> On Tue, 29 Nov 2005, Julian Foad wrote:
>
> > When I said "It seems to me that a generic cache is strongly preferably from a
> > design point of view," how "generic" do we need?  I think I meant one that is
> > extensible while keeping the format of the initial implementation, and
> > preferably extensible in a back-and-forward-compatible manner.  It doesn't, for
> > instance, need to be able to cache properties that our libraries don't know
> > about - that would be fairly pointless.
> >
> >
> Oh, I understand what you meant now. r17557 makes the presence/absence
> cache generic.  This turned out to be quite simple after all.
>
And the consequence of this is that I consider the work on wc-propcaching
to be complete (modulo bug fixes of course). If no one raises objections,
I will merge this back to trunk soon. When that's done, everyone who uses
the trunk code will get her WCs upgraded to format 6.  Be prepared:-)

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Tue, 29 Nov 2005, Julian Foad wrote:

> When I said "It seems to me that a generic cache is strongly preferably from a
> design point of view," how "generic" do we need?  I think I meant one that is
> extensible while keeping the format of the initial implementation, and
> preferably extensible in a back-and-forward-compatible manner.  It doesn't, for
> instance, need to be able to cache properties that our libraries don't know
> about - that would be fairly pointless.
>
>
Oh, I understand what you meant now. r17557 makes the presence/absence
cache generic.  This turned out to be quite simple after all.

Thanks for the suggestions,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by Julian Foad <ju...@btopenworld.com>.
When I said "It seems to me that a generic cache is strongly preferably from a 
design point of view," how "generic" do we need?  I think I meant one that is 
extensible while keeping the format of the initial implementation, and 
preferably extensible in a back-and-forward-compatible manner.  It doesn't, for 
instance, need to be able to cache properties that our libraries don't know 
about - that would be fairly pointless.


Greg Hudson wrote:
> 
> I'm also concerned that listing all existing properties might be
> overdoing things.  Let me step back and list the options I've seen
> presented:
> 
>   1. (Current) Declare the presence of three hardcoded properties.
> 
>   2. Declare the presence and absence of three hardcoded properties.

I'm not sure exactly what you mean by these two.

(1) In the current (branch) code, the field in the "entries" file declares the 
presence (explicitly) or absence (implicitly) of each of the three properties. 
  It is "hard-coded" in that the reader needs to know what the writer was thinking.

I think by (2) you perhaps were thinking of the scheme I mentioned in which a 
list of present properties and a list of absent properties are both given.

   cached-props-present = "svn:special"
   cached-props-absent = "svn:needs-lock svn:externals"

That's not a "hard-coded" scheme because the reader doesn't have to know 
anything about which properties were put there, it just looks to see if the 
property it wants is in either of the lists.  Different writers (different 
versions of client software) can put different lists of property names into 
this cache, and all readers will understand it without knowing who wrote it. 
This representation is semantically equivalent to the alternatives I gave such as:

   props-presence-cache = "svn:needs-lock=0 svn:special=1 svn:externals=0"



>   3. Declare once per entries file which properties we are caching the
> presence of, and later declare which of those is present.

In this scheme, the per-file field can use a compact bit-sequence to represent 
which of the properties are present and which absent, e.g. "010" if there are 
three properties being considered.

>   4. Declare the presence of all properties.

By this, I assume you mean: for each item, list the names of all the properties 
(presumably only "svn:" properties) that the item has.  Anything property whose 
name is not in the list is known to be absent.  This is more verbose (a longer 
string) than (3).

This would make the "has-props" boolean redundant by replacing it with more 
information, so it could be called "props".  Examples:

   props = ""
   props = "svn:special"
   props = "svn:ignore svn:externals"

This field might as well be required.  Calling it a "cache" would be a bit 
undeserved as (if present) it is required to always be complete and up to date.


5. Here's a simplification of (3).  Since the Subversion client libraries know 
which "svn:" properties have any meaning in the current version, we don't need 
such a generic cache, and the list of properties can be fixed.  Define (as part 
of the specification of the WC format) that the sequence of properties is

   svn:needs-lock, svn:special, svn:externals

and then have:

   props-presence = "010"

for a file which has svn:special but neither of the others.

In future, we can add another property to the end of the list whenever we like. 
  That will be extensible without a format bump.  If a new reader knows about a 
fourth property, but the string in the WC only has three entries, then the 
fourth property is simply not yet cached.  This scheme is more efficient in 
space and time than the current field and, I think, the other suggestions.


- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by Greg Hudson <gh...@MIT.EDU>.
On Mon, 2005-11-28 at 16:57 +0100, Peter N. Lundblad wrote:
> > It seems to me that a generic cache is strongly preferably from a design
> > point of view.

> My only concern is that the entries file will grow even bigger, but that
> might be over-pesimistic.

I'm also concerned that listing all existing properties might be
overdoing things.  Let me step back and list the options I've seen
presented:

  1. (Current) Declare the presence of three hardcoded properties.

  2. Declare the presence and absence of three hardcoded properties.

  3. Declare once per entries file which properties we are caching the
presence of, and later declare which of those is present.

  4. Declare the presence of all properties.

Any of (2)-(4) would allow us to start caching the presence of
additional properties without a format bump--the information in the
entries file would be enough to determine whether or not we know about
the new property.

(2) has the disadvantage that every entry pays a penalty for every
hardcoded property in the list, even if the working copy has no
properties.  Right now that penalty might be small enough to ignore; as
the list grows, it might not.

I like (3), but maybe it's too much code for the benefit.

(4) isn't bad from a total space perspective.  Storing each property
name twice instead of once in the .svn area is pretty minor.  The main
risk is that a single file with a bunch of custom properties on it would
penalize the processing of any file in the directory.  If we're thinking
about storing all properties for a directory in a single file, we're
probably willing to take that risk anyway.

(1) wouldn't be the end of the world.  It would be a little clunky if we
want to start caching more props (we'd have to add a new field like
"more-cached-props" instead of modifying the cached-props field).  But
I'd advocate choosing between (3) and (4) based on how complex (3) comes
out to be.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Mon, 28 Nov 2005, Julian Foad wrote:

> Peter N. Lundblad wrote:
> > I'd like to give a short status report on the wc-propcaching branch. The
> > original plan for wc-propcaching is now implemented. And it seems to work
> > pretty well.
> >
> > The following things have changed regarding how properties are stored:
>
> Is this set of changes, and/or the new situation, documented anywhere more
> permanent than here?  I'm going to review this as if it's a log message.
>
They are in the log messages:-) But seriously, I want to work on a new
document describing the current working copy format, because it would be
really helpful to have a complete overview somewhere. I want to get this
in first, though.


 >
> > - There is no base-props file if there are no properties.
>
> Excellent.  Presumably "if there are no _base_ properties", so this file can
> never be present and empty?  (An alternative could be "if there are no base or
> working properties".)
>
Yes, this is what I meant. No base-props file if no base-props.

> > - There is no working props file if there are no prop changes.
>
> Excellent.  So, this file can be present and empty, iff there were base props
> and they have all been deleted locally?
>
Correct.

> > - Thrre new fields have been added to the entries file:
> >   - has-props keeps track of wheter the entry has any (working) props.
>
> So, to check I understand properly, this is a boolean field, present iff:
>    (working-props file is present and non-empty)
>    || ((working-props file is absent)
>        && (base-props file is present (and, by definition, non-empty)))
> ?

Yes. has-props reflects the state of the working props. So your boolean
statement above is what's implemented.

>
> >   - cached-props: is a space-separated list of property names.
> >     If a property is mentioned here, the working props for this entry has
> >     a property of this name.  Only svn:needs-lock, svn_special and
> >     svn:externals may be present in this field.
>
> And if one of those three properties is not mentioned, does that mean the
> property is not present in the working props?  So this caches the presence or
> absence of those three particular properties?  (See below.)
>
Exactly.

>
> >   - prop-mods: Is true or false (attribute absent) depending on whether
> >     this entry has property modifications.
>
> Wouldn't the name "has-prop-mods" be better for a boolean (complementing
> "has-props")?  Otherwise its name implies it's a list of modifications.
>
Good idea.

> So, this is present iff the "working props" files is present?
>
Yes, you could say that it caches the presence of that file:-)

> > What we are waiting for now is that Erik wants to require checksums on all
> > files. Since that would require even another WC format bump, we think it
> > is best to do that on wc-propcaching before merging.
>
> That makes me uncomfortable.  Maybe it's a small and simple change, but
> it's got nothing to do with prop-caching.  Saying that format numbers
> are cheap, and then doing this to avoid another bump, is inconsistent.

Hehe, caught me there:-)


> Is there any other way we could work around this format-number-bumping
> issue?  Please could we either keep bumps for released versions, and
> provide developers with an alternative way to get their WCs upgraded and
> working yet end up with a format number "5", or just do a bump for each
> new WC feature?  Mixing different features in the same branch could get
> ugly and isn't scaleable.
>
I think the latter alternative is ellegant. The format number doesn't mean
anything outside the WC internals and asking developers to hack their
working copies (or recheck them out) isn't very nice.

Erik, do you agree to move your checksum work outside the wc-propcaching
branch?


>
> We have:
>
>    cached-props = "svn:special"
>
> means
>
>    svn:needs-lock is absent, svn:special is present, svn:externals is absent.
>
> Note that it means certain properties are absent, that aren't mentioned
> in it, as well as meaning that certain properties are present (that are
> mentioned in it).
>

> The library has built-in knowledge of which three specific properties
> they are.

>   Calling the field "cached-props" seems wrong.  It implies a generic
> cache, in which the presence or absence of an item should indicate only
> whether that item happens to have been cached.  Writing the name of the
> property for one boolean state ("present") but not for the other state
> ("absent") is asymmetric.
>
We are assymetric for boolean attributes, mostly to save space.  Naming
absent properties for each entry just for symmetry is silly; let's find
another solutioin here.

[...]

>    cached-props-names = "svn:needs-lock svn:special svn:externals"
>    cached-props-presence = "0 1 0"
>
Move the first of these at the top-level (or the this-dir entry) and you
have ghudson's suggestion. This might be an alternative.


> It seems to me that a generic cache is strongly preferably from a design
> point of view.
>
My only concern is that the entries file will grow even bigger, but that
might be over-pesimistic. From an implementation POV, I rather do this
simpler solution. Maybe we shouldn't bother if people store 1000 props,
then they're deemed anyway.  I'll give this some time to think about, but
I may well end up caching the existence of all props.

Thanks for the comments,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by Greg Hudson <gh...@MIT.EDU>.
On Mon, 2005-11-28 at 16:31 +0100, Peter N. Lundblad wrote:
> Yes, this is a problem with the current design. Do we expect to want to
> change the list of properties often enough to justify the extra
> overhead/complexity?

The only specific future need I've thought of is
http://subversion.tigris.org/issues/show_bug.cgi?id=1975 ("svn switch
does not update keywords"), but the fix for that would require caching
the value of svn:keywords, not just its presence.

One possible future need is metadata versioning, if we ever put that in.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Mon, 28 Nov 2005, Greg Hudson wrote:

> On Mon, 2005-11-28 at 13:58 +0000, Julian Foad wrote:
> > Given that I'm late with saying this, and you've already implemented a cache
> > for three specific values, may I persuade you to at least change the field so
> > that it doesn't look like a generic cache?
>
> Ah, this is a good point.  It worried me that we wouldn't be able to add
> properties to the cache without a format bump once the code went live,
> and Julian points out a possible way to rectify this (by including
> absence as well as presence).
>
Yes, this is a problem with the current design. Do we expect to want to
change the list of properties often enough to justify the extra
overhead/complexity?

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by Greg Hudson <gh...@MIT.EDU>.
On Mon, 2005-11-28 at 13:58 +0000, Julian Foad wrote:
> Given that I'm late with saying this, and you've already implemented a cache 
> for three specific values, may I persuade you to at least change the field so 
> that it doesn't look like a generic cache?

Ah, this is a good point.  It worried me that we wouldn't be able to add
properties to the cache without a format bump once the code went live,
and Julian points out a possible way to rectify this (by including
absence as well as presence).

If we're concerned about space used by the entries file and/or the
processing time necessary to read each entry, we could adopt an approach
where the entries file has a declaration at the top which says "I am
caching the existence of properties foo, bar, and baz", and then each
cache entry is just a string "010" or something.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by Julian Foad <ju...@btopenworld.com>.
Peter N. Lundblad wrote:
> Hi,
> 
> I'd like to give a short status report on the wc-propcaching branch. The
> original plan for wc-propcaching is now implemented. And it seems to work
> pretty well.
> 
> The following things have changed regarding how properties are stored:

Is this set of changes, and/or the new situation, documented anywhere more 
permanent than here?  I'm going to review this as if it's a log message.


> - There is no base-props file if there are no properties.

Excellent.  Presumably "if there are no _base_ properties", so this file can 
never be present and empty?  (An alternative could be "if there are no base or 
working properties".)

> - There is no working props file if there are no prop changes.

Excellent.  So, this file can be present and empty, iff there were base props 
and they have all been deleted locally?

> - Thrre new fields have been added to the entries file:
>   - has-props keeps track of wheter the entry has any (working) props.

So, to check I understand properly, this is a boolean field, present iff:
   (working-props file is present and non-empty)
   || ((working-props file is absent)
       && (base-props file is present (and, by definition, non-empty)))
?

>   - cached-props: is a space-separated list of property names.
>     If a property is mentioned here, the working props for this entry has
>     a property of this name.  Only svn:needs-lock, svn_special and
>     svn:externals may be present in this field.

And if one of those three properties is not mentioned, does that mean the 
property is not present in the working props?  So this caches the presence or 
absence of those three particular properties?  (See below.)


>   - prop-mods: Is true or false (attribute absent) depending on whether
>     this entry has property modifications.

Wouldn't the name "has-prop-mods" be better for a boolean (complementing 
"has-props")?  Otherwise its name implies it's a list of modifications.

So, this is present iff the "working props" files is present?

> - The prop-time fields isn't present anymore.

Excellent.


> It has been suggested to store properties in one single file per
> directory, both for regular props and wcprops.  I think that seems like a
> good idea, but I think it falls outside of the scope of the propcaching
> branch.  Also, I want to merge this work as early as possible in the 1.4
> cycle to get it wider tested.

Agreed.

> What we are waiting for now is that Erik wants to require checksums on all
> files. Since that would require even another WC format bump, we think it
> is best to do that on wc-propcaching before merging.

That makes me uncomfortable.  Maybe it's a small and simple change, but it's 
got nothing to do with prop-caching.  Saying that format numbers are cheap, and 
then doing this to avoid another bump, is inconsistent.  Is there any other way 
we could work around this format-number-bumping issue?  Please could we either 
keep bumps for released versions, and provide developers with an alternative 
way to get their WCs upgraded and working yet end up with a format number "5", 
or just do a bump for each new WC feature?  Mixing different features in the 
same branch could get ugly and isn't scaleable.


> So, in short, I think wc-propcaching is approaching its merge back to
> trunk and I want to encourege people to review it. If no one objects, I
> want to merge as soon as Erik's work is done.  (Also, if anyone has a good
> reason to add (or remove) any property from the cached-props field, it is
> easier to do so before people start using this code in their working
> copies.)

Apologies for making this comment at this late stage.

We have:

   cached-props = "svn:special"

means

   svn:needs-lock is absent, svn:special is present, svn:externals is absent.

Note that it means certain properties are absent, that aren't mentioned in it, 
as well as meaning that certain properties are present (that are mentioned in it).

The library has built-in knowledge of which three specific properties they are. 
  Calling the field "cached-props" seems wrong.  It implies a generic cache, in 
which the presence or absence of an item should indicate only whether that item 
happens to have been cached.  Writing the name of the property for one boolean 
state ("present") but not for the other state ("absent") is asymmetric.

I feel that either the field should be named so as to identify what it is 
caching, e.g.:

   has-needslock-special-externals = "0 1 0"

or it should be a generic cache, e.g.:

   props-presence-cache = "svn:needs-lock=0 svn:special=1 svn:externals=0"

or
   cached-props-present = "svn:special"
   cached-props-absent = "svn:needs-lock svn:externals"

or
   cached-props-names = "svn:needs-lock svn:special svn:externals"
   cached-props-presence = "0 1 0"

It seems to me that a generic cache is strongly preferably from a design point 
of view.

Given that I'm late with saying this, and you've already implemented a cache 
for three specific values, may I persuade you to at least change the field so 
that it doesn't look like a generic cache?


> In each column below, there are two numbers.  The first indicates the
> performance impprovements when the disk cache was flushed. The second
> number is an approx. average of four runs of the same command after the
> first one, i.e. when the data is in memory.
> 
>             GCC tree         GCC tree w/prop          svn tree
> svn st      11%  33%         64%  92%                 53%  25%
> svn diff    12% 27%          30%  45%                 40%  0% (*)
> svn ci      40%  86%         80%  95%                 78% 50%

By "performance improvements" you seem to be talking about speed here; the disk 
space is the other important factor.  Since you describe some of these as 
"dramatic", I assume these percentages are reduction in wall-clock time, so 
that "95%" means twenty times faster, rather than speed increases in which case 
"95%" would mean nearly twice as fast.


> What I think is interesting is that we have improved performance for all
> operations. On some operations (i.e. commit), we have dramatic
> improvements.  In summary, I feel that this work has been worth it.

Yup, it certainly both feels intuitively and looks from the numbers that this 
is very worthwhile.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by Daniel Berlin <db...@dberlin.org>.
On Mon, 2005-11-28 at 09:42 +0530, Madan U S wrote:
> Daniel Berlin <db...@dberlin.org> writes:
> 
> > >             GCC tree         GCC tree w/prop          svn tree
> > > svn st      11%  33%         64%  92%                 53%  25%
> > > svn diff    12% 27%          30%  45%                 40%  0% (*)
> > > svn ci      40%  86%         80%  95%                 78% 50%
> > > 
> > > *) There were small time differences, but it was hard to measure.
> > > 
> > > 
> > > One could go into detail analyzing these numbers, but I'm not sure how
> > > much that would give us. There are too many factors that can have an
> > > effect on the numbers. If someone want to test this in more depth, feel
> > 
> 
> -------------8<--------------------8<-----------------
> > Do you happen to have numbers from svn update?
> > 
> > That is probably a more common operation than svn update for the gcc
> > tree :)
> -------------8<--------------------8<-----------------
> 
> Did you mean svn up is more common than st/diff/ci ?

Yes
I actually meant to write 

"That is probably a more common operation for the gcc tree :)"

> Am I missing something here?
> 
> Regards,
> Madan.
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by Madan U S <ma...@collab.net>.
Daniel Berlin <db...@dberlin.org> writes:

> >             GCC tree         GCC tree w/prop          svn tree
> > svn st      11%  33%         64%  92%                 53%  25%
> > svn diff    12% 27%          30%  45%                 40%  0% (*)
> > svn ci      40%  86%         80%  95%                 78% 50%
> > 
> > *) There were small time differences, but it was hard to measure.
> > 
> > 
> > One could go into detail analyzing these numbers, but I'm not sure how
> > much that would give us. There are too many factors that can have an
> > effect on the numbers. If someone want to test this in more depth, feel
> 

-------------8<--------------------8<-----------------
> Do you happen to have numbers from svn update?
> 
> That is probably a more common operation than svn update for the gcc
> tree :)
-------------8<--------------------8<-----------------

Did you mean svn up is more common than st/diff/ci ?
Am I missing something here?

Regards,
Madan.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Mon, 28 Nov 2005, Daniel Berlin wrote:

> > Note that using a filesystem where the node type isn't stored in the
> > directory, you might get other performance characteristics.
>
>
> >  There, you'll
> > have another stat per file, and disk block reading to get all stat
> > information.
>
> Not anymore. At least, not for update.
>
Uh, my point was that I was testing 1.2 -> wc-propcaching. Your
improvement will probably have an effect on those filesystems, but not on
mine:-) Sorry if I was unclear...

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by Daniel Berlin <db...@dberlin.org>.
> Note that using a filesystem where the node type isn't stored in the
> directory, you might get other performance characteristics.


>  There, you'll
> have another stat per file, and disk block reading to get all stat
> information.

Not anymore. At least, not for update.

--Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Sun, 27 Nov 2005, Daniel Berlin wrote:

>
> >             GCC tree         GCC tree w/prop          svn tree
> > svn st      11%  33%         64%  92%                 53%  25%
> > svn diff    12% 27%          30%  45%                 40%  0% (*)
> > svn ci      40%  86%         80%  95%                 78% 50%
> >
> > *) There were small time differences, but it was hard to measure.
> >
> >
> > One could go into detail analyzing these numbers, but I'm not sure how
> > much that would give us. There are too many factors that can have an
> > effect on the numbers. If someone want to test this in more depth, feel
>
> Do you happen to have numbers from svn update?
>
> That is probably a more common operation than svn update for the gcc
> tree :)
>
>
>
>
Yeah, I did some testing with a local repository today just out of
curiousity. I both tested with an 1.3 build from trunk some weeks ago and
the latest 1.2 sources. It makes some improvements, but not that much. I
didn't investigate, but does svn up read all properties? Maybe it is the
cached svn:externals that shows?

Note that using a filesystem where the node type isn't stored in the
directory, you might get other performance characteristics. There, you'll
have another stat per file, and disk block reading to get all stat
information.

But I made another nice observation while upgrading my WCs from the
revision with no properties to the revision with svn:eol-style=native for
each file. I took 10 minutes to update that using the old client and 4
minutes when using wc-propcaching. This is a nice number as well, although
changing the properties on each file (only) shouldn't be that a common
operation.

I've noticed that for update, we rewrite the entries files even if nothing
was changed. We might want to get rid of that...

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Status of wc-propcaching branch

Posted by Daniel Berlin <db...@dberlin.org>.
>             GCC tree         GCC tree w/prop          svn tree
> svn st      11%  33%         64%  92%                 53%  25%
> svn diff    12% 27%          30%  45%                 40%  0% (*)
> svn ci      40%  86%         80%  95%                 78% 50%
> 
> *) There were small time differences, but it was hard to measure.
> 
> 
> One could go into detail analyzing these numbers, but I'm not sure how
> much that would give us. There are too many factors that can have an
> effect on the numbers. If someone want to test this in more depth, feel

Do you happen to have numbers from svn update?

That is probably a more common operation than svn update for the gcc
tree :)





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org