You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2013/05/16 19:51:30 UTC

svn commit: r1483471 - /subversion/trunk/subversion/libsvn_subr/hash.c

Author: stefan2
Date: Thu May 16 17:51:30 2013
New Revision: 1483471

URL: http://svn.apache.org/r1483471
Log:
Follow-up to r1483434:  Tweak our hash function to increase the
likelihood of higher bits in a 32 bit key chunk to have any
impact on the result.

Found by: rhuijben

* subversion/libsvn_subr/hash.c
  (hashfunc_compatible): cheaply fold the current chunk before
                         feeding it into the; update comment 

Modified:
    subversion/trunk/subversion/libsvn_subr/hash.c

Modified: subversion/trunk/subversion/libsvn_subr/hash.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/hash.c?rev=1483471&r1=1483470&r2=1483471&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_subr/hash.c (original)
+++ subversion/trunk/subversion/libsvn_subr/hash.c Thu May 16 17:51:30 2013
@@ -576,20 +576,16 @@ svn_hash__get_bool(apr_hash_t *hash, con
 
 /*** Optimized hash function ***/
 
-/* Optimized version of apr_hashfunc_default in APR 1.4.5 and earlier.
- * It assumes that the CPU has 32-bit multiplications with high throughput
- * of at least 1 operation every 3 cycles. Latency is not an issue. Another
- * optimization is a mildly unrolled main loop and breaking the dependency
- * chain within the loop.
- *
- * Note that most CPUs including Intel Atom, VIA Nano, ARM feature the
- * assumed pipelined multiplication circuitry. They can do one MUL every
- * or every other cycle.
- *
- * The performance is ultimately limited by the fact that most CPUs can
- * do only one LOAD and only one BRANCH operation per cycle. The best we
- * can do is to process one character per cycle - provided the processor
- * is wide enough to do 1 LOAD, COMPARE, BRANCH, MUL and ADD per cycle.
+/* apr_hashfunc_t optimized for the key that we use in SVN: paths and 
+ * property names.  Its primary goal is speed for keys of known length.
+ * 
+ * Since strings tend to spawn large value spaces (usually differ in many
+ * bits with differences spanning a larger section of the key), we can be
+ * quite sloppy extracting a hash value.  The more keys there are in a
+ * hash container, the more bits of the value returned by this function
+ * will be used.  For a small number of string keys, choosing bits from any 
+ * any fix location close to the tail of those keys would usually be good
+ * enough to prevent high collision rates.
  */
 static unsigned int
 hashfunc_compatible(const char *char_key, apr_ssize_t *klen)
@@ -605,8 +601,11 @@ hashfunc_compatible(const char *char_key
 #if SVN_UNALIGNED_ACCESS_IS_OK
     for (p = key, i = *klen; i >= 4; i-=4, p+=4)
       {
-        hash = hash * 33 * 33 * 33 * 33
-             + *(apr_uint32_t *)p;
+        apr_uint32_t chunk = *(apr_uint32_t *)p;
+
+        /* the ">> 17" part gives upper bits in the chunk a chance to make
+           some impact as well */
+        hash = hash * 33 * 33 * 33 * 33 + chunk + (chunk >> 17);
       }
 #else
     for (p = key, i = *klen; i >= 4; i-=4, p+=4)