You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2012/09/22 17:14:25 UTC

svn commit: r1388816 - in /subversion/branches/10Gb/subversion: include/private/svn_pseudo_md5.h libsvn_subr/cache-membuffer.c libsvn_subr/pseudo_md5.c tests/libsvn_subr/checksum-test.c

Author: stefan2
Date: Sat Sep 22 15:14:25 2012
New Revision: 1388816

URL: http://svn.apache.org/viewvc?rev=1388816&view=rev
Log:
On the 10Gb branch:  Introduce MD5-based hash functions optimized
for short input lengths.  Use these to speed up membuffer access.

* subversion/include/private/svn_pseudo_md5.h: new header
  (svn__pseudo_md5_15,
   svn__pseudo_md5_31,
   svn__pseudo_md5_63): declare new private API

* subversion/libsvn_subr/pseudo_md5.c: new source
  (svn__pseudo_md5_15,
   svn__pseudo_md5_31,
   svn__pseudo_md5_63): implement new private API

* subversion/libsvn_subr/cache-membuffer.c
  (combine_key): use the new hash API

* subversion/tests/libsvn_subr/checksum-test.c
  (test_pseudo_md5): new test case
  (test_funcs): register it

Added:
    subversion/branches/10Gb/subversion/include/private/svn_pseudo_md5.h
    subversion/branches/10Gb/subversion/libsvn_subr/pseudo_md5.c
Modified:
    subversion/branches/10Gb/subversion/libsvn_subr/cache-membuffer.c
    subversion/branches/10Gb/subversion/tests/libsvn_subr/checksum-test.c

Added: subversion/branches/10Gb/subversion/include/private/svn_pseudo_md5.h
URL: http://svn.apache.org/viewvc/subversion/branches/10Gb/subversion/include/private/svn_pseudo_md5.h?rev=1388816&view=auto
==============================================================================
--- subversion/branches/10Gb/subversion/include/private/svn_pseudo_md5.h (added)
+++ subversion/branches/10Gb/subversion/include/private/svn_pseudo_md5.h Sat Sep 22 15:14:25 2012
@@ -0,0 +1,83 @@
+/**
+ * @copyright
+ * ====================================================================
+ *    Licensed to the Apache Software Foundation (ASF) under one
+ *    or more contributor license agreements.  See the NOTICE file
+ *    distributed with this work for additional information
+ *    regarding copyright ownership.  The ASF licenses this file
+ *    to you under the Apache License, Version 2.0 (the
+ *    "License"); you may not use this file except in compliance
+ *    with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *    Unless required by applicable law or agreed to in writing,
+ *    software distributed under the License is distributed on an
+ *    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *    KIND, either express or implied.  See the License for the
+ *    specific language governing permissions and limitations
+ *    under the License.
+ * ====================================================================
+ * @endcopyright
+ *
+ * @file svn_pseudo_md5.h
+ * @brief Subversion hash sum calculation for runtime data (only)
+ */
+
+#ifndef SVN_PSEUDO_MD5_H
+#define SVN_PSEUDO_MD5_H
+
+#include <apr.h>        /* for apr_uint32_t */
+
+#ifdef __cplusplus
+extern "C" {
+#endif /* __cplusplus */
+
+
+/**
+ * Calculates a hash sum for 15 bytes in @a x and returns it in @a digest.
+ * The most significant byte in @a x must be 0 (independent of being on a
+ * little or big endian machine).
+ *
+ * @note Use for runtime data hashing only.
+ *
+ * @note The output is NOT an MD5 digest shares has the same basic
+ *       cryptographic properties.  Collisions with proper MD5 on the same
+ *       or other input data is equally unlikely as any MD5 collision.
+ */
+void svn__pseudo_md5_15(apr_uint32_t digest[4],
+                        const apr_uint32_t x[4]);
+
+/**
+ * Calculates a hash sum for 31 bytes in @a x and returns it in @a digest.
+ * The most significant byte in @a x must be 0 (independent of being on a
+ * little or big endian machine).
+ *
+ * @note Use for runtime data hashing only.
+ *
+ * @note The output is NOT an MD5 digest shares has the same basic
+ *       cryptographic properties.  Collisions with proper MD5 on the same
+ *       or other input data is equally unlikely as any MD5 collision.
+ */
+void svn__pseudo_md5_31(apr_uint32_t digest[4],
+                        const apr_uint32_t x[8]);
+
+/**
+ * Calculates a hash sum for 63 bytes in @a x and returns it in @a digest.
+ * The most significant byte in @a x must be 0 (independent of being on a
+ * little or big endian machine).
+ *
+ * @note Use for runtime data hashing only.
+ *
+ * @note The output is NOT an MD5 digest shares has the same basic
+ *       cryptographic properties.  Collisions with proper MD5 on the same
+ *       or other input data is equally unlikely as any MD5 collision.
+ */
+void svn__pseudo_md5_63(apr_uint32_t digest[4],
+                        const apr_uint32_t x[16]);
+
+#ifdef __cplusplus
+}
+#endif /* __cplusplus */
+
+#endif /* SVN_PSEUDO_MD5_H */

Modified: subversion/branches/10Gb/subversion/libsvn_subr/cache-membuffer.c
URL: http://svn.apache.org/viewvc/subversion/branches/10Gb/subversion/libsvn_subr/cache-membuffer.c?rev=1388816&r1=1388815&r2=1388816&view=diff
==============================================================================
--- subversion/branches/10Gb/subversion/libsvn_subr/cache-membuffer.c (original)
+++ subversion/branches/10Gb/subversion/libsvn_subr/cache-membuffer.c Sat Sep 22 15:14:25 2012
@@ -33,6 +33,7 @@
 #include "svn_string.h"
 #include "private/svn_dep_compat.h"
 #include "private/svn_mutex.h"
+#include "private/svn_pseudo_md5.h"
 
 /*
  * This svn_cache__t implementation actually consists of two parts:
@@ -1713,7 +1714,31 @@ combine_key(svn_membuffer_cache_t *cache
   if (key_len == APR_HASH_KEY_STRING)
     key_len = strlen((const char *) key);
 
-  apr_md5((unsigned char*)cache->combined_key, key, key_len);
+  if (key_len < 16)
+    {
+      apr_uint32_t data[4] = { 0 };
+      memcpy(data, key, key_len);
+
+      svn__pseudo_md5_15((apr_uint32_t *)cache->combined_key, data);
+    }
+  else if (key_len < 32)
+    {
+      apr_uint32_t data[8] = { 0 };
+      memcpy(data, key, key_len);
+
+      svn__pseudo_md5_31((apr_uint32_t *)cache->combined_key, data);
+    }
+  else if (key_len < 64)
+    {
+      apr_uint32_t data[16] = { 0 };
+      memcpy(data, key, key_len);
+
+      svn__pseudo_md5_63((apr_uint32_t *)cache->combined_key, data);
+    }
+  else
+    {
+      apr_md5((unsigned char*)cache->combined_key, key, key_len);
+    }
 
   cache->combined_key[0] ^= cache->prefix[0];
   cache->combined_key[1] ^= cache->prefix[1];

Added: subversion/branches/10Gb/subversion/libsvn_subr/pseudo_md5.c
URL: http://svn.apache.org/viewvc/subversion/branches/10Gb/subversion/libsvn_subr/pseudo_md5.c?rev=1388816&view=auto
==============================================================================
--- subversion/branches/10Gb/subversion/libsvn_subr/pseudo_md5.c (added)
+++ subversion/branches/10Gb/subversion/libsvn_subr/pseudo_md5.c Sat Sep 22 15:14:25 2012
@@ -0,0 +1,422 @@
+/*
+ * This is work is derived from material Copyright RSA Data Security, Inc.
+ *
+ * The RSA copyright statement and Licence for that original material is
+ * included below. This is followed by the Apache copyright statement and
+ * licence for the modifications made to that material.
+ */
+
+/* MD5C.C - RSA Data Security, Inc., MD5 message-digest algorithm
+ */
+
+/* Copyright (C) 1991-2, RSA Data Security, Inc. Created 1991. All
+   rights reserved.
+
+   License to copy and use this software is granted provided that it
+   is identified as the "RSA Data Security, Inc. MD5 Message-Digest
+   Algorithm" in all material mentioning or referencing this software
+   or this function.
+
+   License is also granted to make and use derivative works provided
+   that such works are identified as "derived from the RSA Data
+   Security, Inc. MD5 Message-Digest Algorithm" in all material
+   mentioning or referencing the derived work.
+
+   RSA Data Security, Inc. makes no representations concerning either
+   the merchantability of this software or the suitability of this
+   software for any particular purpose. It is provided "as is"
+   without express or implied warranty of any kind.
+
+   These notices must be retained in any copies of any part of this
+   documentation and/or software.
+ */
+
+/* Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * The apr_md5_encode() routine uses much code obtained from the FreeBSD 3.0
+ * MD5 crypt() function, which is licenced as follows:
+ * ----------------------------------------------------------------------------
+ * "THE BEER-WARE LICENSE" (Revision 42):
+ * <ph...@login.dknet.dk> wrote this file.  As long as you retain this notice you
+ * can do whatever you want with this stuff. If we meet some day, and you think
+ * this stuff is worth it, you can buy me a beer in return.   Poul-Henning Kamp
+ * ----------------------------------------------------------------------------
+ */
+
+/*
+ * pseudo_md5.c:  md5-esque hash sum calculation for short data blocks.
+ *                Code taken and adapted from the APR (see licenses above).
+ */
+#include "svn_checksum.h"
+
+/* Constants for MD5 calculation.
+ */
+
+#define S11 7
+#define S12 12
+#define S13 17
+#define S14 22
+#define S21 5
+#define S22 9
+#define S23 14
+#define S24 20
+#define S31 4
+#define S32 11
+#define S33 16
+#define S34 23
+#define S41 6
+#define S42 10
+#define S43 15
+#define S44 21
+
+/* F, G, H and I are basic MD5 functions.
+ */
+#define F(x, y, z) (((x) & (y)) | ((~x) & (z)))
+#define G(x, y, z) (((x) & (z)) | ((y) & (~z)))
+#define H(x, y, z) ((x) ^ (y) ^ (z))
+#define I(x, y, z) ((y) ^ ((x) | (~z)))
+
+/* ROTATE_LEFT rotates x left n bits.
+ */
+#if defined(_MSC_VER) && _MSC_VER >= 1310
+#pragma intrinsic(_rotl)
+#define ROTATE_LEFT(x, n) (_rotl(x,n))
+#else
+#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32-(n))))
+#endif
+
+/* FF, GG, HH, and II transformations for rounds 1, 2, 3, and 4.
+ * Rotation is separate from addition to prevent recomputation.
+ */
+#define FF(a, b, c, d, x, s, ac) { \
+ (a) += F ((b), (c), (d)) + (x) + (apr_uint32_t)(ac); \
+ (a) = ROTATE_LEFT ((a), (s)); \
+ (a) += (b); \
+  }
+#define GG(a, b, c, d, x, s, ac) { \
+ (a) += G ((b), (c), (d)) + (x) + (apr_uint32_t)(ac); \
+ (a) = ROTATE_LEFT ((a), (s)); \
+ (a) += (b); \
+  }
+#define HH(a, b, c, d, x, s, ac) { \
+ (a) += H ((b), (c), (d)) + (x) + (apr_uint32_t)(ac); \
+ (a) = ROTATE_LEFT ((a), (s)); \
+ (a) += (b); \
+  }
+#define II(a, b, c, d, x, s, ac) { \
+ (a) += I ((b), (c), (d)) + (x) + (apr_uint32_t)(ac); \
+ (a) = ROTATE_LEFT ((a), (s)); \
+ (a) += (b); \
+  }
+
+/* The idea of the functions below is as follows:
+ * 
+ * - The core MD5 algorithm does not assume that the "important" data
+ *   is at the begin of the encryption block, followed by e.g. 0.
+ *   Instead, all bits are equally relevant.
+ * 
+ * - If some bytes in the input are known to be 0, we may hard-code them.
+ *   With the previous property, it is safe to move them to the upper end
+ *   of the encryption block to maximize the number of steps that can be
+ *   pre-calculated.
+ *
+ * - Variable-length streams will use the upper 8 byte of the last
+ *   encryption block to store the stream length in bits (to make 0, 00,
+ *   000, ... etc. produce different hash sums).
+ *
+ * - We will hash at most 63 bytes, i.e. 504 bits.  In the standard stream
+ *   implementation, the upper 6 bytes of the last encryption block would
+ *   be 0.  We will put at least one non-NULL value in the last 4 bytes.
+ *   Therefore, our input will always be different to a standard MD5 stream
+ *   implementation in either block count, content or both.
+ *
+ * - Our length indicator also varies with the number bytes in the input.
+ *   Hence, different pseudo-MD5 input length produces different output
+ *   (with "cryptographic probability") even if the content is all 0 or
+ *   otherwise identical.
+ *
+ * - Collisions between pseudo-MD5 and pseudo-MD5 as well as pseudo-MD5
+ *   and standard MD5 are as likely as any other MD5 collision.
+ */
+  
+void svn__pseudo_md5_15(apr_uint32_t digest[4],
+                        const apr_uint32_t x[4])
+{
+    apr_uint32_t a = 0x67452301;
+    apr_uint32_t b = 0xefcdab89;
+    apr_uint32_t c = 0x98badcfe;
+    apr_uint32_t d = 0x10325476;
+
+    /* make sure byte 63 gets the marker independently of BE / LE */
+    apr_uint32_t x3n = x[3] ^ 0xffffffff;
+    
+    /* Round 1 */
+    FF(a, b, c, d, 0,    S11, 0xd76aa478); /* 1 */
+    FF(d, a, b, c, 0,    S12, 0xe8c7b756); /* 2 */
+    FF(c, d, a, b, 0,    S13, 0x242070db); /* 3 */
+    FF(b, c, d, a, 0,    S14, 0xc1bdceee); /* 4 */
+    FF(a, b, c, d, 0,    S11, 0xf57c0faf); /* 5 */
+    FF(d, a, b, c, 0,    S12, 0x4787c62a); /* 6 */
+    FF(c, d, a, b, 0,    S13, 0xa8304613); /* 7 */
+    FF(b, c, d, a, 0,    S14, 0xfd469501); /* 8 */
+    FF(a, b, c, d, 0,    S11, 0x698098d8); /* 9 */
+    FF(d, a, b, c, 0,    S12, 0x8b44f7af); /* 10 */
+    FF(c, d, a, b, 0,    S13, 0xffff5bb1); /* 11 */
+    FF(b, c, d, a, 0,    S14, 0x895cd7be); /* 12 */
+    FF(a, b, c, d, x[0], S11, 0x6b901122); /* 13 */
+    FF(d, a, b, c, x[1], S12, 0xfd987193); /* 14 */
+    FF(c, d, a, b, x[2], S13, 0xa679438e); /* 15 */
+    FF(b, c, d, a, x3n,  S14, 0x49b40821); /* 16 */
+
+    /* Round 2 */
+    GG(a, b, c, d, 0,    S21, 0xf61e2562); /* 17 */
+    GG(d, a, b, c, 0,    S22, 0xc040b340); /* 18 */
+    GG(c, d, a, b, 0,    S23, 0x265e5a51); /* 19 */
+    GG(b, c, d, a, 0,    S24, 0xe9b6c7aa); /* 20 */
+    GG(a, b, c, d, 0,    S21, 0xd62f105d); /* 21 */
+    GG(d, a, b, c, 0,    S22, 0x2441453);  /* 22 */
+    GG(c, d, a, b, x3n,  S23, 0xd8a1e681); /* 23 */
+    GG(b, c, d, a, 0,    S24, 0xe7d3fbc8); /* 24 */
+    GG(a, b, c, d, 0,    S21, 0x21e1cde6); /* 25 */
+    GG(d, a, b, c, x[2], S22, 0xc33707d6); /* 26 */
+    GG(c, d, a, b, 0,    S23, 0xf4d50d87); /* 27 */
+    GG(b, c, d, a, 0,    S24, 0x455a14ed); /* 28 */
+    GG(a, b, c, d, x[1], S21, 0xa9e3e905); /* 29 */
+    GG(d, a, b, c, 0,    S22, 0xfcefa3f8); /* 30 */
+    GG(c, d, a, b, 0,    S23, 0x676f02d9); /* 31 */
+    GG(b, c, d, a, x[0], S24, 0x8d2a4c8a); /* 32 */
+
+    /* Round 3 */
+    HH(a, b, c, d, 0,    S31, 0xfffa3942); /* 33 */
+    HH(d, a, b, c, 0,    S32, 0x8771f681); /* 34 */
+    HH(c, d, a, b, 0,    S33, 0x6d9d6122); /* 35 */
+    HH(b, c, d, a, x[2], S34, 0xfde5380c); /* 36 */
+    HH(a, b, c, d, 0,    S31, 0xa4beea44); /* 37 */
+    HH(d, a, b, c, 0,    S32, 0x4bdecfa9); /* 38 */
+    HH(c, d, a, b, 0,    S33, 0xf6bb4b60); /* 39 */
+    HH(b, c, d, a, 0,    S34, 0xbebfbc70); /* 40 */
+    HH(a, b, c, d, x[1], S31, 0x289b7ec6); /* 41 */
+    HH(d, a, b, c, 0,    S32, 0xeaa127fa); /* 42 */
+    HH(c, d, a, b, 0,    S33, 0xd4ef3085); /* 43 */
+    HH(b, c, d, a, 0,    S34, 0x4881d05);  /* 44 */
+    HH(a, b, c, d, 0,    S31, 0xd9d4d039); /* 45 */
+    HH(d, a, b, c, x[0], S32, 0xe6db99e5); /* 46 */
+    HH(c, d, a, b, x3n,  S33, 0x1fa27cf8); /* 47 */
+    HH(b, c, d, a, 0,    S34, 0xc4ac5665); /* 48 */
+
+    /* Round 4 */
+    II(a, b, c, d, 0,    S41, 0xf4292244); /* 49 */
+    II(d, a, b, c, 0,    S42, 0x432aff97); /* 50 */
+    II(c, d, a, b, x[2], S43, 0xab9423a7); /* 51 */
+    II(b, c, d, a, 0,    S44, 0xfc93a039); /* 52 */
+    II(a, b, c, d, x[0], S41, 0x655b59c3); /* 53 */
+    II(d, a, b, c, 0,    S42, 0x8f0ccc92); /* 54 */
+    II(c, d, a, b, 0,    S43, 0xffeff47d); /* 55 */
+    II(b, c, d, a, 0,    S44, 0x85845dd1); /* 56 */
+    II(a, b, c, d, 0,    S41, 0x6fa87e4f); /* 57 */
+    II(d, a, b, c, x3n,  S42, 0xfe2ce6e0); /* 58 */
+    II(c, d, a, b, 0,    S43, 0xa3014314); /* 59 */
+    II(b, c, d, a, x[1], S44, 0x4e0811a1); /* 60 */
+    II(a, b, c, d, 0,    S41, 0xf7537e82); /* 61 */
+    II(d, a, b, c, 0,    S42, 0xbd3af235); /* 62 */
+    II(c, d, a, b, 0,    S43, 0x2ad7d2bb); /* 63 */
+    II(b, c, d, a, 0,    S44, 0xeb86d391); /* 64 */
+
+    digest[0] = a;
+    digest[1] = b;
+    digest[2] = c;
+    digest[3] = d;
+}
+
+void svn__pseudo_md5_31(apr_uint32_t digest[4],
+                        const apr_uint32_t x[8])
+{
+    apr_uint32_t a = 0x67452301;
+    apr_uint32_t b = 0xefcdab89;
+    apr_uint32_t c = 0x98badcfe;
+    apr_uint32_t d = 0x10325476;
+
+    /* make sure byte 63 gets the marker independently of BE / LE */
+    apr_uint32_t x7n = x[7] ^ 0xfefefefe;
+    
+    /* Round 1 */
+    FF(a, b, c, d, 0,    S11, 0xd76aa478); /* 1 */
+    FF(d, a, b, c, 0,    S12, 0xe8c7b756); /* 2 */
+    FF(c, d, a, b, 0,    S13, 0x242070db); /* 3 */
+    FF(b, c, d, a, 0,    S14, 0xc1bdceee); /* 4 */
+    FF(a, b, c, d, 0,    S11, 0xf57c0faf); /* 5 */
+    FF(d, a, b, c, 0,    S12, 0x4787c62a); /* 6 */
+    FF(c, d, a, b, 0,    S13, 0xa8304613); /* 7 */
+    FF(b, c, d, a, 0,    S14, 0xfd469501); /* 8 */
+    FF(a, b, c, d, x[0], S11, 0x698098d8); /* 9 */
+    FF(d, a, b, c, x[1], S12, 0x8b44f7af); /* 10 */
+    FF(c, d, a, b, x[2], S13, 0xffff5bb1); /* 11 */
+    FF(b, c, d, a, x[3], S14, 0x895cd7be); /* 12 */
+    FF(a, b, c, d, x[4], S11, 0x6b901122); /* 13 */
+    FF(d, a, b, c, x[5], S12, 0xfd987193); /* 14 */
+    FF(c, d, a, b, x[6], S13, 0xa679438e); /* 15 */
+    FF(b, c, d, a, x7n,  S14, 0x49b40821); /* 16 */
+
+    /* Round 2 */
+    GG(a, b, c, d, 0,    S21, 0xf61e2562); /* 17 */
+    GG(d, a, b, c, 0,    S22, 0xc040b340); /* 18 */
+    GG(c, d, a, b, x[3], S23, 0x265e5a51); /* 19 */
+    GG(b, c, d, a, 0,    S24, 0xe9b6c7aa); /* 20 */
+    GG(a, b, c, d, 0,    S21, 0xd62f105d); /* 21 */
+    GG(d, a, b, c, x[2], S22, 0x2441453);  /* 22 */
+    GG(c, d, a, b, x7n,  S23, 0xd8a1e681); /* 23 */
+    GG(b, c, d, a, 0,    S24, 0xe7d3fbc8); /* 24 */
+    GG(a, b, c, d, x[1], S21, 0x21e1cde6); /* 25 */
+    GG(d, a, b, c, x[6], S22, 0xc33707d6); /* 26 */
+    GG(c, d, a, b, 0,    S23, 0xf4d50d87); /* 27 */
+    GG(b, c, d, a, x[0], S24, 0x455a14ed); /* 28 */
+    GG(a, b, c, d, x[5], S21, 0xa9e3e905); /* 29 */
+    GG(d, a, b, c, 0,    S22, 0xfcefa3f8); /* 30 */
+    GG(c, d, a, b, 0,    S23, 0x676f02d9); /* 31 */
+    GG(b, c, d, a, x[4], S24, 0x8d2a4c8a); /* 32 */
+
+    /* Round 3 */
+    HH(a, b, c, d, 0,    S31, 0xfffa3942); /* 33 */
+    HH(d, a, b, c, x[0], S32, 0x8771f681); /* 34 */
+    HH(c, d, a, b, x[3], S33, 0x6d9d6122); /* 35 */
+    HH(b, c, d, a, x[6], S34, 0xfde5380c); /* 36 */
+    HH(a, b, c, d, 0,    S31, 0xa4beea44); /* 37 */
+    HH(d, a, b, c, 0,    S32, 0x4bdecfa9); /* 38 */
+    HH(c, d, a, b, 0,    S33, 0xf6bb4b60); /* 39 */
+    HH(b, c, d, a, x[2], S34, 0xbebfbc70); /* 40 */
+    HH(a, b, c, d, x[5], S31, 0x289b7ec6); /* 41 */
+    HH(d, a, b, c, 0,    S32, 0xeaa127fa); /* 42 */
+    HH(c, d, a, b, 0,    S33, 0xd4ef3085); /* 43 */
+    HH(b, c, d, a, 0,    S34, 0x4881d05);  /* 44 */
+    HH(a, b, c, d, x[1], S31, 0xd9d4d039); /* 45 */
+    HH(d, a, b, c, x[4], S32, 0xe6db99e5); /* 46 */
+    HH(c, d, a, b, x7n,  S33, 0x1fa27cf8); /* 47 */
+    HH(b, c, d, a, 0,    S34, 0xc4ac5665); /* 48 */
+
+    /* Round 4 */
+    II(a, b, c, d, 0,    S41, 0xf4292244); /* 49 */
+    II(d, a, b, c, 0,    S42, 0x432aff97); /* 50 */
+    II(c, d, a, b, x[6], S43, 0xab9423a7); /* 51 */
+    II(b, c, d, a, 0,    S44, 0xfc93a039); /* 52 */
+    II(a, b, c, d, x[4], S41, 0x655b59c3); /* 53 */
+    II(d, a, b, c, 0,    S42, 0x8f0ccc92); /* 54 */
+    II(c, d, a, b, x[2], S43, 0xffeff47d); /* 55 */
+    II(b, c, d, a, 0,    S44, 0x85845dd1); /* 56 */
+    II(a, b, c, d, x[0], S41, 0x6fa87e4f); /* 57 */
+    II(d, a, b, c, x7n,  S42, 0xfe2ce6e0); /* 58 */
+    II(c, d, a, b, 0,    S43, 0xa3014314); /* 59 */
+    II(b, c, d, a, x[5], S44, 0x4e0811a1); /* 60 */
+    II(a, b, c, d, 0,    S41, 0xf7537e82); /* 61 */
+    II(d, a, b, c, x[3], S42, 0xbd3af235); /* 62 */
+    II(c, d, a, b, 0,    S43, 0x2ad7d2bb); /* 63 */
+    II(b, c, d, a, x[1], S44, 0xeb86d391); /* 64 */
+
+    digest[0] = a;
+    digest[1] = b;
+    digest[2] = c;
+    digest[3] = d;
+}
+
+void svn__pseudo_md5_63(apr_uint32_t digest[4],
+                        const apr_uint32_t x[16])
+{
+    apr_uint32_t a = 0x67452301;
+    apr_uint32_t b = 0xefcdab89;
+    apr_uint32_t c = 0x98badcfe;
+    apr_uint32_t d = 0x10325476;
+
+    /* make sure byte 63 gets the marker independently of BE / LE */
+    apr_uint32_t x15n = x[15] ^ 0xfcfcfcfc;
+    
+    /* Round 1 */
+    FF(a, b, c, d, x[0],  S11, 0xd76aa478); /* 1 */
+    FF(d, a, b, c, x[1],  S12, 0xe8c7b756); /* 2 */
+    FF(c, d, a, b, x[2],  S13, 0x242070db); /* 3 */
+    FF(b, c, d, a, x[3],  S14, 0xc1bdceee); /* 4 */
+    FF(a, b, c, d, x[4],  S11, 0xf57c0faf); /* 5 */
+    FF(d, a, b, c, x[5],  S12, 0x4787c62a); /* 6 */
+    FF(c, d, a, b, x[6],  S13, 0xa8304613); /* 7 */
+    FF(b, c, d, a, x[7],  S14, 0xfd469501); /* 8 */
+    FF(a, b, c, d, x[8],  S11, 0x698098d8); /* 9 */
+    FF(d, a, b, c, x[9],  S12, 0x8b44f7af); /* 10 */
+    FF(c, d, a, b, x[10], S13, 0xffff5bb1); /* 11 */
+    FF(b, c, d, a, x[11], S14, 0x895cd7be); /* 12 */
+    FF(a, b, c, d, x[12], S11, 0x6b901122); /* 13 */
+    FF(d, a, b, c, x[13], S12, 0xfd987193); /* 14 */
+    FF(c, d, a, b, x[14], S13, 0xa679438e); /* 15 */
+    FF(b, c, d, a, x15n,  S14, 0x49b40821); /* 16 */
+
+    /* Round 2 */
+    GG(a, b, c, d, x[1],  S21, 0xf61e2562); /* 17 */
+    GG(d, a, b, c, x[6],  S22, 0xc040b340); /* 18 */
+    GG(c, d, a, b, x[11], S23, 0x265e5a51); /* 19 */
+    GG(b, c, d, a, x[0],  S24, 0xe9b6c7aa); /* 20 */
+    GG(a, b, c, d, x[5],  S21, 0xd62f105d); /* 21 */
+    GG(d, a, b, c, x[10], S22, 0x2441453);  /* 22 */
+    GG(c, d, a, b, x15n,  S23, 0xd8a1e681); /* 23 */
+    GG(b, c, d, a, x[4],  S24, 0xe7d3fbc8); /* 24 */
+    GG(a, b, c, d, x[9],  S21, 0x21e1cde6); /* 25 */
+    GG(d, a, b, c, x[14], S22, 0xc33707d6); /* 26 */
+    GG(c, d, a, b, x[3],  S23, 0xf4d50d87); /* 27 */
+    GG(b, c, d, a, x[8],  S24, 0x455a14ed); /* 28 */
+    GG(a, b, c, d, x[13], S21, 0xa9e3e905); /* 29 */
+    GG(d, a, b, c, x[2],  S22, 0xfcefa3f8); /* 30 */
+    GG(c, d, a, b, x[7],  S23, 0x676f02d9); /* 31 */
+    GG(b, c, d, a, x[12], S24, 0x8d2a4c8a); /* 32 */
+
+    /* Round 3 */
+    HH(a, b, c, d, x[5],  S31, 0xfffa3942); /* 33 */
+    HH(d, a, b, c, x[8],  S32, 0x8771f681); /* 34 */
+    HH(c, d, a, b, x[11], S33, 0x6d9d6122); /* 35 */
+    HH(b, c, d, a, x[14], S34, 0xfde5380c); /* 36 */
+    HH(a, b, c, d, x[1],  S31, 0xa4beea44); /* 37 */
+    HH(d, a, b, c, x[4],  S32, 0x4bdecfa9); /* 38 */
+    HH(c, d, a, b, x[7],  S33, 0xf6bb4b60); /* 39 */
+    HH(b, c, d, a, x[10], S34, 0xbebfbc70); /* 40 */
+    HH(a, b, c, d, x[13], S31, 0x289b7ec6); /* 41 */
+    HH(d, a, b, c, x[0],  S32, 0xeaa127fa); /* 42 */
+    HH(c, d, a, b, x[3],  S33, 0xd4ef3085); /* 43 */
+    HH(b, c, d, a, x[6],  S34, 0x4881d05);  /* 44 */
+    HH(a, b, c, d, x[9],  S31, 0xd9d4d039); /* 45 */
+    HH(d, a, b, c, x[12], S32, 0xe6db99e5); /* 46 */
+    HH(c, d, a, b, x15n,  S33, 0x1fa27cf8); /* 47 */
+    HH(b, c, d, a, x[2],  S34, 0xc4ac5665); /* 48 */
+
+    /* Round 4 */
+    II(a, b, c, d, x[0],  S41, 0xf4292244); /* 49 */
+    II(d, a, b, c, x[7],  S42, 0x432aff97); /* 50 */
+    II(c, d, a, b, x[14], S43, 0xab9423a7); /* 51 */
+    II(b, c, d, a, x[5],  S44, 0xfc93a039); /* 52 */
+    II(a, b, c, d, x[12], S41, 0x655b59c3); /* 53 */
+    II(d, a, b, c, x[3],  S42, 0x8f0ccc92); /* 54 */
+    II(c, d, a, b, x[10], S43, 0xffeff47d); /* 55 */
+    II(b, c, d, a, x[1],  S44, 0x85845dd1); /* 56 */
+    II(a, b, c, d, x[8],  S41, 0x6fa87e4f); /* 57 */
+    II(d, a, b, c, x15n,  S42, 0xfe2ce6e0); /* 58 */
+    II(c, d, a, b, x[6],  S43, 0xa3014314); /* 59 */
+    II(b, c, d, a, x[13], S44, 0x4e0811a1); /* 60 */
+    II(a, b, c, d, x[4],  S41, 0xf7537e82); /* 61 */
+    II(d, a, b, c, x[11], S42, 0xbd3af235); /* 62 */
+    II(c, d, a, b, x[2],  S43, 0x2ad7d2bb); /* 63 */
+    II(b, c, d, a, x[9],  S44, 0xeb86d391); /* 64 */
+
+    digest[0] = a;
+    digest[1] = b;
+    digest[2] = c;
+    digest[3] = d;
+}

Modified: subversion/branches/10Gb/subversion/tests/libsvn_subr/checksum-test.c
URL: http://svn.apache.org/viewvc/subversion/branches/10Gb/subversion/tests/libsvn_subr/checksum-test.c?rev=1388816&r1=1388815&r2=1388816&view=diff
==============================================================================
--- subversion/branches/10Gb/subversion/tests/libsvn_subr/checksum-test.c (original)
+++ subversion/branches/10Gb/subversion/tests/libsvn_subr/checksum-test.c Sat Sep 22 15:14:25 2012
@@ -24,6 +24,7 @@
 #include <apr_pools.h>
 
 #include "svn_error.h"
+#include "private/svn_pseudo_md5.h"
 
 #include "../svn_test.h"
 
@@ -80,6 +81,38 @@ test_checksum_empty(apr_pool_t *pool)
   return SVN_NO_ERROR;
 }
 
+static svn_error_t *
+test_pseudo_md5(apr_pool_t *pool)
+{
+  apr_uint32_t input[16] = { 0 };
+  apr_uint32_t digest_15[4] = { 0 };
+  apr_uint32_t digest_31[4] = { 0 };
+  apr_uint32_t digest_63[4] = { 0 };
+  svn_checksum_t *checksum;
+
+  /* input is all 0s but the hash shall be different
+     (due to different input sizes)*/
+  svn__pseudo_md5_15(digest_15, input);
+  svn__pseudo_md5_31(digest_31, input);
+  svn__pseudo_md5_63(digest_63, input);
+
+  SVN_TEST_ASSERT(memcmp(digest_15, digest_31, sizeof(digest_15)));
+  SVN_TEST_ASSERT(memcmp(digest_15, digest_63, sizeof(digest_15)));
+  SVN_TEST_ASSERT(memcmp(digest_31, digest_63, sizeof(digest_15)));
+
+  /* the checksums shall also be different from "proper" MD5 */
+  SVN_ERR(svn_checksum(&checksum, svn_checksum_md5, input, 15, pool));
+  SVN_TEST_ASSERT(memcmp(digest_15, checksum->digest, sizeof(digest_15)));
+  
+  SVN_ERR(svn_checksum(&checksum, svn_checksum_md5, input, 31, pool));
+  SVN_TEST_ASSERT(memcmp(digest_31, checksum->digest, sizeof(digest_15)));
+
+  SVN_ERR(svn_checksum(&checksum, svn_checksum_md5, input, 63, pool));
+  SVN_TEST_ASSERT(memcmp(digest_63, checksum->digest, sizeof(digest_15)));
+
+  return SVN_NO_ERROR;
+}
+
 /* An array of all test functions */
 struct svn_test_descriptor_t test_funcs[] =
   {
@@ -88,5 +121,7 @@ struct svn_test_descriptor_t test_funcs[
                    "checksum parse"),
     SVN_TEST_PASS2(test_checksum_empty,
                    "checksum emptiness"),
+    SVN_TEST_PASS2(test_pseudo_md5,
+                   "pseudo-md5 compatibility"),
     SVN_TEST_NULL
   };



Re: svn commit: r1388816 - in /subversion/branches/10Gb/subversion: include/private/svn_pseudo_md5.h libsvn_subr/cache-membuffer.c libsvn_subr/pseudo_md5.c tests/libsvn_subr/checksum-test.c

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Sun, Sep 23, 2012 at 2:58 PM, Stefan Sperling <st...@elego.de> wrote:

> On Sun, Sep 23, 2012 at 02:49:16PM +0200, Stefan Fuhrmann wrote:
> > I downloaded the original version from some admittedly
> > obscure location in the rather shady party of the interwebs:
> >
> >
> https://svn.apache.org/repos/asf/apr/apr-util/tags/1.3.12/crypto/apr_md5.c
> >
> > -- Stefan^2.
>
> To comply with the licence of the code you've added you'll need to
> update the NOTICE file on your branch as well:
> https://svn.apache.org/repos/asf//subversion/branches/10Gb/NOTICE
>
> See https://svn.apache.org/repos/asf/apr/apr-util/tags/1.3.12/NOTICE
>

Thanks for noticing. Committed in r1389044.

-- Stefan^2.

-- 
*

Join us this October at Subversion Live
2012<http://www.wandisco.com/svn-live-2012>
 for two days of best practice SVN training, networking, live demos,
committer meet and greet, and more! Space is limited, so get signed up
today<http://www.wandisco.com/svn-live-2012>
!
*

Re: svn commit: r1388816 - in /subversion/branches/10Gb/subversion: include/private/svn_pseudo_md5.h libsvn_subr/cache-membuffer.c libsvn_subr/pseudo_md5.c tests/libsvn_subr/checksum-test.c

Posted by Stefan Sperling <st...@elego.de>.
On Sun, Sep 23, 2012 at 02:49:16PM +0200, Stefan Fuhrmann wrote:
> I downloaded the original version from some admittedly
> obscure location in the rather shady party of the interwebs:
> 
> https://svn.apache.org/repos/asf/apr/apr-util/tags/1.3.12/crypto/apr_md5.c
> 
> -- Stefan^2.

To comply with the licence of the code you've added you'll need to
update the NOTICE file on your branch as well:
https://svn.apache.org/repos/asf//subversion/branches/10Gb/NOTICE

See https://svn.apache.org/repos/asf/apr/apr-util/tags/1.3.12/NOTICE

Re: svn commit: r1388816 - in /subversion/branches/10Gb/subversion: include/private/svn_pseudo_md5.h libsvn_subr/cache-membuffer.c libsvn_subr/pseudo_md5.c tests/libsvn_subr/checksum-test.c

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Sat, Sep 22, 2012 at 8:12 PM, Blair Zajac <bl...@orcaware.com> wrote:

> On 09/22/2012 08:14 AM, stefan2@apache.org wrote:
>
>> Author: stefan2
>> Date: Sat Sep 22 15:14:25 2012
>> New Revision: 1388816
>>
>> URL: http://svn.apache.org/viewvc?**rev=1388816&view=rev<http://svn.apache.org/viewvc?rev=1388816&view=rev>
>> Log:
>> On the 10Gb branch:  Introduce MD5-based hash functions optimized
>> for short input lengths.  Use these to speed up membuffer access.
>>
>
> How much faster is it than a plain MD5?
>

The 16-byte version is twice as fast as the MD5 core
due to the fact that we know much of the input to be 0
and can hard-code it as such. In addition to that, the
APR implementation has a ~100% overhead (setting
up the context etc.) for strings that fit into a single
encoding block. In total, the pseudo-MD5 code is
3..4 times as fast as apr_md5.


> If we only need it for hashing, did you look at using a more well known
> hashing function, e.g. FNV [1] or murmur [2]?
>

FNV-128 would be just as slow as pseudo-MD5 as it
takes one iteration per byte and about 17(?) operations
per iteration.

murmur is not exactly "well-known" as it is quite new.

However, non of that really matters. The key is that we
need cryptographic strength for the hashes we use in
membuffer because they are the *only* identification for
any object stored therein. Basically the same scheme
as the SHA-1 usage in our working copy.


> Also, can you include URLs where you downloaded the code from in the log
> message and code.
>

So, you did not read the code ;) Simply read the first
65 lines of pseudo_md5.c

I downloaded the original version from some admittedly
obscure location in the rather shady party of the interwebs:

https://svn.apache.org/repos/asf/apr/apr-util/tags/1.3.12/crypto/apr_md5.c

-- Stefan^2.

-- 
*

Join us this October at Subversion Live
2012<http://www.wandisco.com/svn-live-2012>
 for two days of best practice SVN training, networking, live demos,
committer meet and greet, and more! Space is limited, so get signed up
today<http://www.wandisco.com/svn-live-2012>
!
*

Re: svn commit: r1388816 - in /subversion/branches/10Gb/subversion: include/private/svn_pseudo_md5.h libsvn_subr/cache-membuffer.c libsvn_subr/pseudo_md5.c tests/libsvn_subr/checksum-test.c

Posted by Blair Zajac <bl...@orcaware.com>.
On 09/22/2012 08:14 AM, stefan2@apache.org wrote:
> Author: stefan2
> Date: Sat Sep 22 15:14:25 2012
> New Revision: 1388816
>
> URL: http://svn.apache.org/viewvc?rev=1388816&view=rev
> Log:
> On the 10Gb branch:  Introduce MD5-based hash functions optimized
> for short input lengths.  Use these to speed up membuffer access.

How much faster is it than a plain MD5?

If we only need it for hashing, did you look at using a more well known 
hashing function, e.g. FNV [1] or murmur [2]?

Also, can you include URLs where you downloaded the code from in the log 
message and code.

Blair

[1] http://isthe.com/chongo/tech/comp/fnv/
[2] https://sites.google.com/site/murmurhash/