You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@apr.apache.org by bo...@apache.org on 2012/01/28 16:54:22 UTC

svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Author: bojan
Date: Sat Jan 28 15:54:22 2012
New Revision: 1237078

URL: http://svn.apache.org/viewvc?rev=1237078&view=rev
Log:
Further improve hash randomisation:
Fix naming of a static function.
Randomise final hash produced by any hash function, using default hash
function and seed.

Modified:
    apr/apr/trunk/tables/apr_hash.c

Modified: apr/apr/trunk/tables/apr_hash.c
URL: http://svn.apache.org/viewvc/apr/apr/trunk/tables/apr_hash.c?rev=1237078&r1=1237077&r2=1237078&view=diff
==============================================================================
--- apr/apr/trunk/tables/apr_hash.c (original)
+++ apr/apr/trunk/tables/apr_hash.c Sat Jan 28 15:54:22 2012
@@ -106,7 +106,7 @@ APR_DECLARE(apr_hash_t *) apr_hash_make(
     ht->seed = (unsigned int)((now >> 32) ^ now ^ (apr_uintptr_t)pool ^
                               (apr_uintptr_t)ht ^ (apr_uintptr_t)&now) - 1;
     ht->array = alloc_array(ht, ht->max);
-    ht->hash_func = NULL;
+    ht->hash_func = apr_hashfunc_default;
 
     return ht;
 }
@@ -207,9 +207,8 @@ static void expand_array(apr_hash_t *ht)
     ht->max = new_max;
 }
 
-static unsigned int apr_hashfunc_default_internal(const char *char_key,
-                                                  apr_ssize_t *klen,
-                                                  unsigned int hash)
+static unsigned int hashfunc_default(const char *char_key, apr_ssize_t *klen,
+                                     unsigned int hash)
 {
     const unsigned char *key = (const unsigned char *)char_key;
     const unsigned char *p;
@@ -271,7 +270,7 @@ static unsigned int apr_hashfunc_default
 APR_DECLARE_NONSTD(unsigned int) apr_hashfunc_default(const char *char_key,
                                                       apr_ssize_t *klen)
 {
-    return apr_hashfunc_default_internal(char_key, klen, 0);
+    return hashfunc_default(char_key, klen, 0);
 }
 
 /*
@@ -290,11 +289,10 @@ static apr_hash_entry_t **find_entry(apr
 {
     apr_hash_entry_t **hep, *he;
     unsigned int hash;
+    apr_ssize_t hlen = sizeof(hash);
 
-    if (ht->hash_func)
-        hash = ht->hash_func(key, &klen);
-    else
-        hash = apr_hashfunc_default_internal(key, &klen, ht->seed);
+    hash = ht->hash_func(key, &klen);
+    hash = hashfunc_default((char *)&hash, &hlen, ht->seed);
 
     /* scan linked list */
     for (hep = &ht->array[hash & ht->max], he = *hep;
@@ -435,6 +433,7 @@ APR_DECLARE(apr_hash_t *) apr_hash_merge
     apr_hash_entry_t *iter;
     apr_hash_entry_t *ent;
     unsigned int i, j, k, hash;
+    apr_ssize_t hlen = sizeof(hash);
 
 #if APR_POOL_DEBUG
     /* we don't copy keys and values, so it's necessary that
@@ -484,11 +483,8 @@ APR_DECLARE(apr_hash_t *) apr_hash_merge
 
     for (k = 0; k <= overlay->max; k++) {
         for (iter = overlay->array[k]; iter; iter = iter->next) {
-            if (res->hash_func)
-                hash = res->hash_func(iter->key, &iter->klen);
-            else
-                hash = apr_hashfunc_default_internal(iter->key, &iter->klen,
-                                                     res->seed);
+            hash = res->hash_func(iter->key, &iter->klen);
+            hash = hashfunc_default((char*)&hash, &hlen, res->seed);
             i = hash & res->max;
             for (ent = res->array[i]; ent; ent = ent->next) {
                 if ((ent->klen == iter->klen) &&



Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Rainer Jung <ra...@kippdata.de>.
On 30.01.2012 23:07, Bojan Smojver wrote:
> ------- Original message -------
>> From: Jim Jagielski
>
>> Any intent to T&R apr 1.4.6 to address this issue?
>
> Wasn't really thinking of it, but if it would help, sure. A bit short on
> time today due to work stuff though.

I did run the test suite against httpd trunk build with apr and apr-util 
1.4.x heads and reallyall modules.

No test failures :)

Having a 1.4.6 woud be nice due to some other small fixes and the long 
time we had no release. I didn't check Bugzilla for critical issues or 
low hanging fruit though.

Regards,

Rainer

Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Bojan Smojver <bo...@rexursive.com>.
------- Original message -------
> From: Jim Jagielski

> Any intent to T&R apr 1.4.6 to address this issue?

Wasn't really thinking of it, but if it would help, sure. A bit short on 
time today due to work stuff though.

--
Bojan 

Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Jim Jagielski <ji...@jaguNET.com>.
Any intent to T&R apr 1.4.6 to address this issue?

Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Bojan Smojver <bo...@rexursive.com>.
On Mon, 2012-01-30 at 02:08 +0100, Branko Čibej wrote:
> I'm not even close to being an expert on hashing functions

Yeah, same here.

Anyhow, I think what we have now is good enough. By default, we put in
some randomness, so brute force attacks will be somewhat harder. People
that want to use a different hash function can work on figuring out
their own way of making things better.

-- 
Bojan


Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Branko Čibej <br...@apache.org>.
On 30.01.2012 00:15, Bojan Smojver wrote:
> On Mon, 2012-01-30 at 09:59 +1100, Bojan Smojver wrote:
>> With my latest patch, I get that fixed (i.e. the final hash is
>> perturbed a lot better).
> Hmm, different problems emerge. The hash then has a tendency to produce
> either all odd or even indexes (i.e. lower bits used to address
> buckets).

Just as I said -- the hash function we have is not meant to be perfect,
or even secure, but to allow reasonable insertion and retrieval times.
Running the same result through it again will not make any appreciabble
difference. to the distribution.

You'd have to make the second pass with a different hash function, or at
least a different prime multiplier. I'm still not sure that'd be worth
the extra cycles, however. I'm not even close to being an expert on
hashing functions, but all multiply/modulo hashes apparently tend to
behave similarly.

-- Brane


Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Bojan Smojver <bo...@rexursive.com>.
On Mon, 2012-01-30 at 10:15 +1100, Bojan Smojver wrote:
> I think I'll need to work on a different approach.

Reverting r1237078 is the answer. We should not be hashing the hash,
only the original strings, using the random seed.

Obviously, folks using custom hash functions will have to come up with
their own random seeding within those functions.

-- 
Bojan


Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Bojan Smojver <bo...@rexursive.com>.
On Mon, 2012-01-30 at 09:59 +1100, Bojan Smojver wrote:
> With my latest patch, I get that fixed (i.e. the final hash is
> perturbed a lot better).

Hmm, different problems emerge. The hash then has a tendency to produce
either all odd or even indexes (i.e. lower bits used to address
buckets).

I think this whole idea of hashing the hash is costing us in loss of
randomness. Which makes sense. We just hashed a string and reduced the
number of its bits significantly. Then we took that (which is also
always the same length) and reduced it again. In the process, we
overcooked.

I think I'll need to work on a different approach.

-- 
Bojan


Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Bojan Smojver <bo...@rexursive.com>.
On Mon, 2012-01-30 at 09:37 +1100, Bojan Smojver wrote:
> I'm going to run some tests to see whether I get final hashes
> that have similar upper bits. 

The test reveals that many upper bits are the same without the latest
patch. With my latest patch, I get that fixed (i.e. the final hash is
perturbed a lot better).

So, I think I should apply it.

PS. Because we are starting from the same starting point (the seed), it
is not surprising that it takes a bit more to perturb the final hash
better. The *33 is essentially <<5 + original value. So, it takes a few
more cycles to get all 32 bits (my platform's unsigned int) perturbed.

-- 
Bojan


Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Bojan Smojver <bo...@rexursive.com>.
On Sun, 2012-01-29 at 20:23 +0100, Branko Čibej wrote:
> This is overkill.

Possibly. I'm going to run some tests to see whether I get final hashes
that have similar upper bits.

-- 
Bojan


Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Branko Čibej <br...@apache.org>.
On 29.01.2012 16:34, Bojan Smojver wrote:
> On Mon, 2012-01-30 at 02:22 +1100, Bojan Smojver wrote:
>> we could run the hash value produced by hashing the strings through
>> the hash function twice

This is overkill. If the current hash function isn't good enough,
running stuff through it twice isn't going to help. I posit that the
results are good enough now, since the default implementation isn't
meant to be cryptographically secure, let alone unattackable. Spending
more time grinding on the same set of bits isn't going to make a badly
designed application any more robust.

-- Brane


Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Bojan Smojver <bo...@rexursive.com>.
On Mon, 2012-01-30 at 02:22 +1100, Bojan Smojver wrote:
> we could run the hash value produced by hashing the strings through
> the hash function twice

Like this.

-- 
Bojan

Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Bojan Smojver <bo...@rexursive.com>.
------- Original message -------
> From: Branko Čibej

> A good, seeded cryptographic hash function would scatter the values much
> better than this does.

Just to make the hash more resistant to having the same content in upper bits (because we start from a common random seed), we could run the hash value produced by hashing the strings through the hash function twice. It may make a difference for larger hash tables.

--
Bojan

Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Branko Čibej <br...@apache.org>.
On 29.01.2012 13:34, Ruediger Pluem wrote:
>
> bojan@apache.org wrote:
>> Author: bojan
>> Date: Sat Jan 28 15:54:22 2012
>> New Revision: 1237078
>>
>> URL: http://svn.apache.org/viewvc?rev=1237078&view=rev
>> Log:
>> Further improve hash randomisation:
>> Fix naming of a static function.
>> Randomise final hash produced by any hash function, using default hash
>> function and seed.
>>
>> Modified:
>>     apr/apr/trunk/tables/apr_hash.c
>>
>> Modified: apr/apr/trunk/tables/apr_hash.c
>> URL: http://svn.apache.org/viewvc/apr/apr/trunk/tables/apr_hash.c?rev=1237078&r1=1237077&r2=1237078&view=diff
>> ==============================================================================
>> --- apr/apr/trunk/tables/apr_hash.c (original)
>> +++ apr/apr/trunk/tables/apr_hash.c Sat Jan 28 15:54:22 2012
>> @@ -290,11 +289,10 @@ static apr_hash_entry_t **find_entry(apr
>>  {
>>      apr_hash_entry_t **hep, *he;
>>      unsigned int hash;
>> +    apr_ssize_t hlen = sizeof(hash);
>>  
>> -    if (ht->hash_func)
>> -        hash = ht->hash_func(key, &klen);
>> -    else
>> -        hash = apr_hashfunc_default_internal(key, &klen, ht->seed);
>> +    hash = ht->hash_func(key, &klen);
>> +    hash = hashfunc_default((char *)&hash, &hlen, ht->seed);
> Don't we have the same issue here as with the XOR version of the patch?
> If two different keys (key1, key2) result in the same hash value

"same hash value" is less common than "same hash slot," so no, in
general, that's not the case.
A good, seeded cryptographic hash function would scatter the values much
better than this does. However, I hardly think that the default
implementation should concern itself with that. If someone needs a more
secure hash table, they can always provide their own hash function.

It's not as if our randomization is likely to make these hash tables
suddenly vastly more secure. It just makes the described attack harder
to do, with minimal cost to performance and none to API compatibility.

-- Brane


Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Bojan Smojver <bo...@rexursive.com>.
------- Original message -------
> From: Ruediger Pluem

> Don't we have the same issue here as with the XOR version of the patch?
> If two different keys (key1, key2) result in the same hash value (so 
> ht->hash_func(key1, &klen1) == ht->hash_func(key2,
> &klen2);) doesn't hashfunc_default result in the same hash value for both 
> keys since the input value to hash
> is the same?

For two keys to cause a collision, the modulo (i.e. the index) has to be 
the same, not necessarily the whole hash. So, with xor, we don't gain 
anything by using seed against the final hash (modulo will change, but in 
the same way for both hash values - the collision will move to a different 
bucket). With the hash function, as long as the hash is not completely the 
same, we will most likely get a different modulo and therefore a different 
index into the table.

Make sense?

--
Bojan 

Re: svn commit: r1237078 - /apr/apr/trunk/tables/apr_hash.c

Posted by Ruediger Pluem <rp...@apache.org>.

bojan@apache.org wrote:
> Author: bojan
> Date: Sat Jan 28 15:54:22 2012
> New Revision: 1237078
> 
> URL: http://svn.apache.org/viewvc?rev=1237078&view=rev
> Log:
> Further improve hash randomisation:
> Fix naming of a static function.
> Randomise final hash produced by any hash function, using default hash
> function and seed.
> 
> Modified:
>     apr/apr/trunk/tables/apr_hash.c
> 
> Modified: apr/apr/trunk/tables/apr_hash.c
> URL: http://svn.apache.org/viewvc/apr/apr/trunk/tables/apr_hash.c?rev=1237078&r1=1237077&r2=1237078&view=diff
> ==============================================================================
> --- apr/apr/trunk/tables/apr_hash.c (original)
> +++ apr/apr/trunk/tables/apr_hash.c Sat Jan 28 15:54:22 2012

> @@ -290,11 +289,10 @@ static apr_hash_entry_t **find_entry(apr
>  {
>      apr_hash_entry_t **hep, *he;
>      unsigned int hash;
> +    apr_ssize_t hlen = sizeof(hash);
>  
> -    if (ht->hash_func)
> -        hash = ht->hash_func(key, &klen);
> -    else
> -        hash = apr_hashfunc_default_internal(key, &klen, ht->seed);
> +    hash = ht->hash_func(key, &klen);
> +    hash = hashfunc_default((char *)&hash, &hlen, ht->seed);

Don't we have the same issue here as with the XOR version of the patch?
If two different keys (key1, key2) result in the same hash value (so ht->hash_func(key1, &klen1) == ht->hash_func(key2,
&klen2);) doesn't hashfunc_default result in the same hash value for both keys since the input value to hash
is the same?

>  
>      /* scan linked list */
>      for (hep = &ht->array[hash & ht->max], he = *hep;

Regards

Rüdiger