You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2013/05/16 19:51:30 UTC
svn commit: r1483471 - /subversion/trunk/subversion/libsvn_subr/hash.c
Author: stefan2
Date: Thu May 16 17:51:30 2013
New Revision: 1483471
URL: http://svn.apache.org/r1483471
Log:
Follow-up to r1483434: Tweak our hash function to increase the
likelihood of higher bits in a 32 bit key chunk to have any
impact on the result.
Found by: rhuijben
* subversion/libsvn_subr/hash.c
(hashfunc_compatible): cheaply fold the current chunk before
feeding it into the; update comment
Modified:
subversion/trunk/subversion/libsvn_subr/hash.c
Modified: subversion/trunk/subversion/libsvn_subr/hash.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/hash.c?rev=1483471&r1=1483470&r2=1483471&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_subr/hash.c (original)
+++ subversion/trunk/subversion/libsvn_subr/hash.c Thu May 16 17:51:30 2013
@@ -576,20 +576,16 @@ svn_hash__get_bool(apr_hash_t *hash, con
/*** Optimized hash function ***/
-/* Optimized version of apr_hashfunc_default in APR 1.4.5 and earlier.
- * It assumes that the CPU has 32-bit multiplications with high throughput
- * of at least 1 operation every 3 cycles. Latency is not an issue. Another
- * optimization is a mildly unrolled main loop and breaking the dependency
- * chain within the loop.
- *
- * Note that most CPUs including Intel Atom, VIA Nano, ARM feature the
- * assumed pipelined multiplication circuitry. They can do one MUL every
- * or every other cycle.
- *
- * The performance is ultimately limited by the fact that most CPUs can
- * do only one LOAD and only one BRANCH operation per cycle. The best we
- * can do is to process one character per cycle - provided the processor
- * is wide enough to do 1 LOAD, COMPARE, BRANCH, MUL and ADD per cycle.
+/* apr_hashfunc_t optimized for the key that we use in SVN: paths and
+ * property names. Its primary goal is speed for keys of known length.
+ *
+ * Since strings tend to spawn large value spaces (usually differ in many
+ * bits with differences spanning a larger section of the key), we can be
+ * quite sloppy extracting a hash value. The more keys there are in a
+ * hash container, the more bits of the value returned by this function
+ * will be used. For a small number of string keys, choosing bits from any
+ * any fix location close to the tail of those keys would usually be good
+ * enough to prevent high collision rates.
*/
static unsigned int
hashfunc_compatible(const char *char_key, apr_ssize_t *klen)
@@ -605,8 +601,11 @@ hashfunc_compatible(const char *char_key
#if SVN_UNALIGNED_ACCESS_IS_OK
for (p = key, i = *klen; i >= 4; i-=4, p+=4)
{
- hash = hash * 33 * 33 * 33 * 33
- + *(apr_uint32_t *)p;
+ apr_uint32_t chunk = *(apr_uint32_t *)p;
+
+ /* the ">> 17" part gives upper bits in the chunk a chance to make
+ some impact as well */
+ hash = hash * 33 * 33 * 33 * 33 + chunk + (chunk >> 17);
}
#else
for (p = key, i = *klen; i >= 4; i-=4, p+=4)