You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2011/02/08 00:20:38 UTC

svn commit: r1068217 - /subversion/trunk/subversion/libsvn_subr/cache-membuffer.c

Author: stefan2
Date: Mon Feb  7 23:20:38 2011
New Revision: 1068217

URL: http://svn.apache.org/viewvc?rev=1068217&view=rev
Log:
Fix a number of typos etc. plus add a short paragraph how svn_cache__t
instances relate to the big singleton membuffer cache instance.

* subversion/libsvn_subr/cache-membuffer.c: improve commentary

Modified:
    subversion/trunk/subversion/libsvn_subr/cache-membuffer.c

Modified: subversion/trunk/subversion/libsvn_subr/cache-membuffer.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/cache-membuffer.c?rev=1068217&r1=1068216&r2=1068217&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_subr/cache-membuffer.c (original)
+++ subversion/trunk/subversion/libsvn_subr/cache-membuffer.c Mon Feb  7 23:20:38 2011
@@ -32,57 +32,63 @@
 #include "private/svn_dep_compat.h"
 
 /*
+ * This svn_cache__t implementation actually consists of two parts:
+ * a shared (per-process) singleton membuffer cache instance and shallow
+ * svn_cache__t frontend instances that each use different key spaces.
+ * For data management, they all forward to the singleton membuffer cache.
+ *
  * A membuffer cache consists of two parts:
  *
- * 1. A linear data buffer containing a cached items in a serialized
+ * 1. A linear data buffer containing cached items in a serialized
  *    representation. There may be arbitrary gaps between entries.
  * 2. A directory of cache entries. This is organized similar to CPU
  *    data caches: for every possible key, there is exactly one group
  *    of entries that may contain the header info for an item with
  *    that given key. The result is a GROUP_SIZE-way associative cache.
  *
- * Only the begnnings of these two data parts are addressed through a
- * native pointer. All other references are expressed as offsets to
- * these array pointers. With that design, it is relatively easy to
- * share the same data structure between different processes and / or
- * to persist them on disk.
+ * Only the start address of these two data parts are given as a native
+ * pointer. All other references are expressed as offsets to these pointers.
+ * With that design, it is relatively easy to share the same data structure
+ * between different processes and / or to persist them on disk. These
+ * out-of-process features have not been implemented, yet.
  *
  * The data buffer usage information is implicitly given by the directory
- * entries. Every *used* entry has a reference to the previous and the
- * next used dictionary entry - in the order their item data is stored
- * in the data buffer. So, removing data, for instance, is done simply
- * by unlinking it from the chain, marking it as unused and possibly
- * adjusting global list pointers.
+ * entries. Every USED entry has a reference to the previous and the next
+ * used dictionary entry and this double-linked list is ordered by the
+ * offsets of their item data within the the data buffer. So removing data,
+ * for instance, is done simply by unlinking it from the chain, implicitly
+ * marking the entry as well as the data buffer section previously
+ * associated to it as unused.
  *
- * Insertion can occur at one position. It is marked by its offset in
- * the data buffer plus the index of the first used entry equal or
- * larger that position. If this gap is too small to accomodate the
- * new item, the insertion window is extended as described below. The
- * new entry will always be inserted at the bottom end of the window
- * and since the next used entry is known, properly sorted insertion
- * is possible.
+ * Insertion can occur at only one, sliding position. It is marked by its
+ * offset in the data buffer plus the index of the first used entry at or
+ * behind that position. If this gap is too small to accomodate the new
+ * item, the insertion window is extended as described below. The new entry
+ * will always be inserted at the bottom end of the window and since the
+ * next used entry is known, properly sorted insertion is possible.
  *
  * To make the cache perform robustly in a wide range of usage scenarios,
- * a randomized variant of LFU is used. Every item holds a (read)
- * hit counter and there is a global (read) hit counter. The more hits
- * an entry has in relation to the average, the more it is likely to be
- * kept using a rand()-based condition. The test is applied only to the
- * entry at the end of the insertion window. If it doesn't get evicted,
- * its being moved to the begin of that window and this window is moved.
+ * a randomized variant of LFU is used. Every item holds a read hit counter
+ * and there is a global read hit counter. The more hits an entry has in
+ * relation to the average, the more it is likely to be kept using a rand()-
+ * based condition. The test is applied only to the entry following the
+ * insertion window. If it doesn't get evicted, it is moved to the begin of
+ * that window and the window is moved.
  *
- * Moreover, the entry's hits get halfed to make that entry more likely
- * to be removed the next time, the sliding insertion / auto-removal
- * window comes by. As a result, frequently used entries are likely not
- * to be dropped until they get not used for a while. Also, even a cache
- * thrashing situation about 50% of the content survives every 50% of the
- * cache being re-written with new entries. For details on the fine-
- * tuning involved, see the comments in ensure_data_insertable().
+ * Moreover, the entry's hits get halfed to make that entry more likely to
+ * be removed the next time the sliding insertion / removal window comes by.
+ * As a result, frequently used entries are likely not to be dropped until
+ * they get not used for a while. Also, even a cache thrashing situation
+ * about 50% of the content survives every 50% of the cache being re-written
+ * with new entries. For details on the fine-tuning involved, see the
+ * comments in ensure_data_insertable().
  *
- * To limit the entry size and management overhead, the actual item keys
- * will not be stored but only their MD5 checksums, instead. This is
- * reasonably safe to do since users have only limited control over the
- * full keys, even if these are folder paths. So, it is very hard to
- * construct colliding keys.
+ * To limit the entry size and management overhead, not the actual item keys
+ * but only their MD5 checksums will not be stored. This is reasonably safe
+ * to do since users have only limited control over the full keys, even if
+ * these contain folder paths. So, it is very hard to deliberately construct
+ * colliding keys. Random checksum collisions can be shown to be extremely
+ * unlikely.
  *
  * All access to the cached data needs to be serialized. Because we want
  * to scale well despite that bottleneck, we simply segment the cache into