You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Fuhrmann <st...@wandisco.com> on 2015/10/11 14:25:47 UTC

Efficient revprop access in lib_repos

[This is the rationale and additional documentation to an upcoming
set of commits.]

When changing a revprop, we provide the following visibility guarantees:

1. A request that ends before the set_revprop started, sees the old value
(duh!).
2. A request that starts after the set_revprop completes, sees the new
value.
3. A request that starts before the set_revprop completes, may see the old
or
   the new value and may not be consistent about it.

The reporter in lib_repos exploits 3. by keeping a hash of the dates
and authors for all revisions it encountered so far.  Not only saves
that 50+% of revprop lookups but also guarantees consistent properties
for all reported nodes - at the expense of ~80 bytes/reported revision.

I'd like to expand on that by giving the FS API users more control on
whether they need to get the latest revprop state (e.g. at the beginning
of a lib_repos report) or not (e.g. follow-up requests during the same
lib_repos report).  That will allow the FS layer to read whole revprop
packs and simply deliver their contents during 'svn log' instead of
re-opening the same files over and over again.

I'll commit a patch set that

* introduces the notion of a barrier in svn_fs_revision_prop2 and
  svn_fs_revision_proplist2 (default: read latest from disk) plus
  a new explicit barier function svn_fs_refresh_rev_props,
* updates the lib_repos queries to only use one barrier, and
* implements them in the FS backends.

Note that it is perfectly legal for an FS to ignore the extra flag
and always fetch the data from disk.  So, this how our backends will
implement it:

* BDB will ignore the flag and always deliver data as today.

* FSFS will use a *temporary*, svn_fs_t-local revprop cache, keyed
  by a UUID. Whenever there is a barrier, the UUID gets cleaned and
  any cache lookup would miss (it should not even try to use the cache).
  Upon the first non-barrier request, a new UUID gets created and the
  cache will be populated whenever we read revprops from disk.

* FSX will only check for a new revprop generation at a barrier.
  That eliminates the requirement to keep the revprop gen file open,
  IOW, revprop caching works with the "normal" file open/read/close
  pattern.

As a result in FSFS, we get most of the benefits of revprop caching,
fewer file operations and lower CPU load, without introducing complex
cache invalidation schemes.  This is significant in packed repos
but will also benefit non-packed revs in some operations.

I'll roughly commit the changes in the following order:

* minor cleanup of the FSFS code to keep the relevant change small
* update FS vtable, defaulting to current behavior
* implement the new behavior in FSFS
* bump FS API
* update lib_repos queries one-by-one for review and testability
* implement the new behavior in FSX

-- Stefan^2.

Re: Efficient revprop access in lib_repos

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Sun, Oct 11, 2015 at 3:29 PM, Bert Huijben <be...@qqmail.nl> wrote:

> Why a uuid and not just some int64 atomic incrementation value that is
> cached ir similar.
>

You are right, anything "unique enough" will do
and the code can easily be changed to using
and i64 later. I'll do that right after my commit
queue for revprop handling runs empty.

UUIDs are just handy and implementing a
process-wide shared i64 counter requires some
extra coding.


> Generating a uuid is not something that is not measurable performance
> wise. It might take some expensive interproces communication or even
> hardware access.
>

OTOH, the UUID generation is not time-critical.
It will only happen if a repos layer operation
explicitly allows cached revprops and then it
will only happen once for the whole e.g. report.

-- Stefan^2.


>
> Bert
> ------------------------------
> From: Stefan Fuhrmann <st...@wandisco.com>
> Sent: ‎11-‎10-‎2015 14:25
> To: Subversion Development <de...@subversion.apache.org>
> Subject: Efficient revprop access in lib_repos
>
> [This is the rationale and additional documentation to an upcoming
> set of commits.]
>
> When changing a revprop, we provide the following visibility guarantees:
>
> 1. A request that ends before the set_revprop started, sees the old value
> (duh!).
> 2. A request that starts after the set_revprop completes, sees the new
> value.
> 3. A request that starts before the set_revprop completes, may see the old
> or
>    the new value and may not be consistent about it.
>
> The reporter in lib_repos exploits 3. by keeping a hash of the dates
> and authors for all revisions it encountered so far.  Not only saves
> that 50+% of revprop lookups but also guarantees consistent properties
> for all reported nodes - at the expense of ~80 bytes/reported revision.
>
> I'd like to expand on that by giving the FS API users more control on
> whether they need to get the latest revprop state (e.g. at the beginning
> of a lib_repos report) or not (e.g. follow-up requests during the same
> lib_repos report).  That will allow the FS layer to read whole revprop
> packs and simply deliver their contents during 'svn log' instead of
> re-opening the same files over and over again.
>
> I'll commit a patch set that
>
> * introduces the notion of a barrier in svn_fs_revision_prop2 and
>   svn_fs_revision_proplist2 (default: read latest from disk) plus
>   a new explicit barier function svn_fs_refresh_rev_props,
> * updates the lib_repos queries to only use one barrier, and
> * implements them in the FS backends.
>
> Note that it is perfectly legal for an FS to ignore the extra flag
> and always fetch the data from disk.  So, this how our backends will
> implement it:
>
> * BDB will ignore the flag and always deliver data as today.
>
> * FSFS will use a *temporary*, svn_fs_t-local revprop cache, keyed
>   by a UUID. Whenever there is a barrier, the UUID gets cleaned and
>   any cache lookup would miss (it should not even try to use the cache).
>   Upon the first non-barrier request, a new UUID gets created and the
>   cache will be populated whenever we read revprops from disk.
>
> * FSX will only check for a new revprop generation at a barrier.
>   That eliminates the requirement to keep the revprop gen file open,
>   IOW, revprop caching works with the "normal" file open/read/close
>   pattern.
>
> As a result in FSFS, we get most of the benefits of revprop caching,
> fewer file operations and lower CPU load, without introducing complex
> cache invalidation schemes.  This is significant in packed repos
> but will also benefit non-packed revs in some operations.
>
> I'll roughly commit the changes in the following order:
>
> * minor cleanup of the FSFS code to keep the relevant change small
> * update FS vtable, defaulting to current behavior
> * implement the new behavior in FSFS
> * bump FS API
> * update lib_repos queries one-by-one for review and testability
> * implement the new behavior in FSX
>
> -- Stefan^2.
>

RE: Efficient revprop access in lib_repos

Posted by Bert Huijben <be...@qqmail.nl>.
Why a uuid and not just some int64 atomic incrementation value that is cached ir similar.

Generating a uuid is not something that is not measurable performance wise. It might take some expensive interproces communication or even hardware access.

Bert

-----Original Message-----
From: "Stefan Fuhrmann" <st...@wandisco.com>
Sent: ‎11-‎10-‎2015 14:25
To: "Subversion Development" <de...@subversion.apache.org>
Subject: Efficient revprop access in lib_repos

[This is the rationale and additional documentation to an upcoming

set of commits.]


When changing a revprop, we provide the following visibility guarantees:

1. A request that ends before the set_revprop started, sees the old value (duh!).
2. A request that starts after the set_revprop completes, sees the new value.
3. A request that starts before the set_revprop completes, may see the old or
   the new value and may not be consistent about it.

The reporter in lib_repos exploits 3. by keeping a hash of the dates
and authors for all revisions it encountered so far.  Not only saves
that 50+% of revprop lookups but also guarantees consistent properties
for all reported nodes - at the expense of ~80 bytes/reported revision.

I'd like to expand on that by giving the FS API users more control on
whether they need to get the latest revprop state (e.g. at the beginning
of a lib_repos report) or not (e.g. follow-up requests during the same
lib_repos report).  That will allow the FS layer to read whole revprop
packs and simply deliver their contents during 'svn log' instead of
re-opening the same files over and over again.

I'll commit a patch set that

* introduces the notion of a barrier in svn_fs_revision_prop2 and
  svn_fs_revision_proplist2 (default: read latest from disk) plus
  a new explicit barier function svn_fs_refresh_rev_props,
* updates the lib_repos queries to only use one barrier, and
* implements them in the FS backends.

Note that it is perfectly legal for an FS to ignore the extra flag
and always fetch the data from disk.  So, this how our backends will
implement it:

* BDB will ignore the flag and always deliver data as today.

* FSFS will use a *temporary*, svn_fs_t-local revprop cache, keyed
  by a UUID. Whenever there is a barrier, the UUID gets cleaned and
  any cache lookup would miss (it should not even try to use the cache).
  Upon the first non-barrier request, a new UUID gets created and the
  cache will be populated whenever we read revprops from disk.

* FSX will only check for a new revprop generation at a barrier.
  That eliminates the requirement to keep the revprop gen file open,
  IOW, revprop caching works with the "normal" file open/read/close
  pattern.

As a result in FSFS, we get most of the benefits of revprop caching,
fewer file operations and lower CPU load, without introducing complex
cache invalidation schemes.  This is significant in packed repos
but will also benefit non-packed revs in some operations.

I'll roughly commit the changes in the following order:

* minor cleanup of the FSFS code to keep the relevant change small
* update FS vtable, defaulting to current behavior
* implement the new behavior in FSFS
* bump FS API
* update lib_repos queries one-by-one for review and testability
* implement the new behavior in FSX

-- Stefan^2.

Re: Efficient revprop access in lib_repos

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Mon, Oct 12, 2015 at 12:10 AM, Bert Huijben <be...@qqmail.nl> wrote:

>                 Hi,
>
>
>
> Another reply:
>
>
>
> How does this affect multiple processes accessing the same repository?
>

For FSFS:

The effect is the same as using 2 different svn_fs_t instances
because the cache uses a svn_fs_t-specific prefix (that UUID)
in the key, such that the contents is not visible to any other
svn_fs_t instance. Using the process-global membuffer cache
for storage merely ensures fair memory usage / prioritization.

Within a svn_fs_t, the cache has a "dead man switch". Whenever
you write a revprop or read it with REFRESH set, the cache
gets invalidated by resetting the UUID. That means, only for
as long as e.g. the repos layer explicitly says "use the cache"
for every revprop read, will data be cached at all.

BDB simply ignores the new feature and operates as before.
FSX has a different revprop caching mechanism and will use
the new API to manage that cache more efficiently.


>  This caching looks like a nice speedup in some common cases, but I’m
> wondering if we are adding new race conditions on revision property
> updates. We had to add specific atomic revprop change apis to resolve/limit
> some older race conditions and this looks at the surface like it might
> introduce a few new cases.
>

That is a valid concern. However, svn_fs_fs__change_rev_prop
via change_rev_prop_body always reads the current revprop data
from disk and invalidates the cache. So, within FSFS, propset
does not depend on cached data and cache invalidation ensures
that it will "see" its own revprop changes.

Outside the FS backends, only users of svn_fs_revision_prop2
and svn_fs_revision_proplist2 may be affected. Everyone outside
lib_repos calls those with REFRESH set, i.e. there is no risk of
using a stale result to short-circuit a propset. Within lib_repos,
only svn_repos_fs_change_rev_prop4 reads *and* writes revprops.
And this function explicitly requests the latest revprop data.

Therefore, I think the write path is pretty safe. If this patch series
introduced a problem then it is one of the read queries in lib_repos
not returning the latest data when they should. And those invocations
are easy to review.

-- Stefan^2.


> *From:* Stefan Fuhrmann [mailto:stefan.fuhrmann@wandisco.com]
> *Sent:* zondag 11 oktober 2015 14:26
> *To:* Subversion Development <de...@subversion.apache.org>
> *Subject:* Efficient revprop access in lib_repos
>
>
>
> [This is the rationale and additional documentation to an upcoming
>
> set of commits.]
>
>
> When changing a revprop, we provide the following visibility guarantees:
>
> 1. A request that ends before the set_revprop started, sees the old value
> (duh!).
> 2. A request that starts after the set_revprop completes, sees the new
> value.
> 3. A request that starts before the set_revprop completes, may see the old
> or
>    the new value and may not be consistent about it.
>

RE: Efficient revprop access in lib_repos

Posted by Bert Huijben <be...@qqmail.nl>.
                Hi,

 

Another reply:

 

How does this affect multiple processes accessing the same repository?

 

This caching looks like a nice speedup in some common cases, but I’m wondering if we are adding new race conditions on revision property updates. We had to add specific atomic revprop change apis to resolve/limit some older race conditions and this looks at the surface like it might introduce a few new cases.

 

                Bert

 

From: Stefan Fuhrmann [mailto:stefan.fuhrmann@wandisco.com] 
Sent: zondag 11 oktober 2015 14:26
To: Subversion Development <de...@subversion.apache.org>
Subject: Efficient revprop access in lib_repos

 

[This is the rationale and additional documentation to an upcoming

set of commits.]


When changing a revprop, we provide the following visibility guarantees:

1. A request that ends before the set_revprop started, sees the old value (duh!).
2. A request that starts after the set_revprop completes, sees the new value.
3. A request that starts before the set_revprop completes, may see the old or
   the new value and may not be consistent about it.