You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ivan Zhakov <iv...@visualsvn.com> on 2015/05/13 19:04:26 UTC

[VOTE] Merging 1.9-cache-improvements to /trunk

[adjusting subject to make it valid vote thread]

On 13 May 2015 at 19:23, Stefan Fuhrmann <st...@wandisco.com> wrote:
> Hi there,
>
> Ivan has reviewed my recent membuffer cache
> key handling changes, corrected and backported
> them on the 1.9-cache-improvements branch.
>
> I reviewed it and I'm +1 on merging it to /trunk -
> hoping we may even get it into 1.9. Since this
> touches a sensitive part of the server code, I'd
> like to see 2 more +1s for the branch->/trunk
> merge.
>
+1.

PS: I think detailed log message will be useful for reviewers. I'll
make it tomorrow if you didn't outstrip me.

-- 
Ivan Zhakov

RE: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Ivan Zhakov [mailto:ivan@visualsvn.com]
> Sent: woensdag 13 mei 2015 19:04
> To: Stefan Fuhrmann
> Cc: Subversion Development
> Subject: [VOTE] Merging 1.9-cache-improvements to /trunk
> 
> [adjusting subject to make it valid vote thread]
> 
> On 13 May 2015 at 19:23, Stefan Fuhrmann
> <st...@wandisco.com> wrote:
> > Hi there,
> >
> > Ivan has reviewed my recent membuffer cache key handling changes,
> > corrected and backported them on the 1.9-cache-improvements branch.
> >
> > I reviewed it and I'm +1 on merging it to /trunk - hoping we may even
> > get it into 1.9. Since this touches a sensitive part of the server
> > code, I'd like to see 2 more +1s for the branch->/trunk merge.
> >
> +1.
> 
> PS: I think detailed log message will be useful for reviewers. I'll make it
> tomorrow if you didn't outstrip me.

+1 for merging back to trunk (after initial review)

I'll perform a more careful review later, for the backport to 1.9.0(/.x)

	Bert
> 
> --
> Ivan Zhakov


Re: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 14 May 2015 at 15:37, Branko Čibej <br...@wandisco.com> wrote:
> On 13.05.2015 19:04, Ivan Zhakov wrote:
>> [adjusting subject to make it valid vote thread]
>>
>> On 13 May 2015 at 19:23, Stefan Fuhrmann <st...@wandisco.com> wrote:
>>> Hi there,
>>>
>>> Ivan has reviewed my recent membuffer cache
>>> key handling changes, corrected and backported
>>> them on the 1.9-cache-improvements branch.
>>>
>>> I reviewed it and I'm +1 on merging it to /trunk -
>>> hoping we may even get it into 1.9. Since this
>>> touches a sensitive part of the server code, I'd
>>> like to see 2 more +1s for the branch->/trunk
>>> merge.
>>>
>> +1.
>>
>> PS: I think detailed log message will be useful for reviewers. I'll
>> make it tomorrow if you didn't outstrip me.
>
> -0 because:
>
> $ make
> .../subversion/libsvn_subr/cache-membuffer.c:2626:59: warning: implicit conversion loses integer
>       precision: 'unsigned long' to 'apr_uint32_t' (aka 'unsigned int') [-Wshorten-64-to-32]
>   cache->combined_key.entry_key.key_len = aligned_key_len + prefix_len;
>                                         ~ ~~~~~~~~~~~~~~~~^~~~~~~~~~~~
> .../subversion/libsvn_subr/cache-membuffer.c:2664:58: warning: implicit conversion loses integer
>       precision: 'unsigned long' to 'apr_uint32_t' (aka 'unsigned int') [-Wshorten-64-to-32]
>       cache->combined_key.entry_key.key_len = prefix_len + 16;
>                                             ~ ~~~~~~~~~~~^~~~
> .../subversion/libsvn_subr/cache-membuffer.c:3161:37: warning: implicit conversion loses integer
>       precision: 'apr_size_t' (aka 'unsigned long') to 'apr_uint32_t' (aka 'unsigned int') [-Wshorten-64-to-32]
>   cache->prefix.entry_key.key_len = prefix_len;
>                                   ~ ^~~~~~~~~~
> 3 warnings generated.
>
> +1 if these warnings get fixed before or as part of the merge without
> adding casts.
>

I suggest attached patch to resolve these warnings.

Log message:
[[[
* subversion/libsvn_subr/cache-membuffer.c
  (entry_key_t): Change type of KEY_LEN field to apr_size_t instead of
   apr_uint32_t.
]]]


-- 
Ivan Zhakov

Re: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 17 May 2015 at 17:41, Stefan Fuhrmann <st...@wandisco.com> wrote:
> On Thu, May 14, 2015 at 2:37 PM, Branko Čibej <br...@wandisco.com> wrote:
>>
>> On 13.05.2015 19:04, Ivan Zhakov wrote:
>> > [adjusting subject to make it valid vote thread]
>> >
>> > On 13 May 2015 at 19:23, Stefan Fuhrmann <st...@wandisco.com>
>> > wrote:
>> >> Hi there,
>> >>
>> >> Ivan has reviewed my recent membuffer cache
>> >> key handling changes, corrected and backported
>> >> them on the 1.9-cache-improvements branch.
>> >>
>> >> I reviewed it and I'm +1 on merging it to /trunk -
>> >> hoping we may even get it into 1.9. Since this
>> >> touches a sensitive part of the server code, I'd
>> >> like to see 2 more +1s for the branch->/trunk
>> >> merge.
>> >>
>> > +1.
>> >
>> > PS: I think detailed log message will be useful for reviewers. I'll
>> > make it tomorrow if you didn't outstrip me.
>>
>> -0 because:
>>
>> $ make
>> .../subversion/libsvn_subr/cache-membuffer.c:2626:59: warning: implicit
>> conversion loses integer
>>       precision: 'unsigned long' to 'apr_uint32_t' (aka 'unsigned int')
>> [-Wshorten-64-to-32]
>>   cache->combined_key.entry_key.key_len = aligned_key_len + prefix_len;
>>                                         ~ ~~~~~~~~~~~~~~~~^~~~~~~~~~~~
>> .../subversion/libsvn_subr/cache-membuffer.c:2664:58: warning: implicit
>> conversion loses integer
>>       precision: 'unsigned long' to 'apr_uint32_t' (aka 'unsigned int')
>> [-Wshorten-64-to-32]
>>       cache->combined_key.entry_key.key_len = prefix_len + 16;
>>                                             ~ ~~~~~~~~~~~^~~~
>> .../subversion/libsvn_subr/cache-membuffer.c:3161:37: warning: implicit
>> conversion loses integer
>>       precision: 'apr_size_t' (aka 'unsigned long') to 'apr_uint32_t' (aka
>> 'unsigned int') [-Wshorten-64-to-32]
>>   cache->prefix.entry_key.key_len = prefix_len;
>>                                   ~ ^~~~~~~~~~
>> 3 warnings generated.
>
>
> Ugh!
>
> As of r1679863, all item and key sizes are represented
> as size_t throughout cache-membuffer.c.
>
> The reason why I first tried sticking with u32 was that
> larger entry_t structs mean fewer of them fit into the
> index hash. Per group, there are 10 entries on /trunk,
> 8 on the 1.9-cache-improvements branch and only 7
> for the 1.10 equivalent. So, 1.10 will be able to keep
> only 30% cache entries than 1.9rc1. Still ok-ish but it
> is a tight fit.
AFAIK structures usually aligned to sizeof(size_t), so using
apr_uint32_t doesn't reduce memory usage.

PS: Thanks for docstring update.

-- 
Ivan Zhakov

Re: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Thu, May 14, 2015 at 2:37 PM, Branko Čibej <br...@wandisco.com> wrote:

> On 13.05.2015 19:04, Ivan Zhakov wrote:
> > [adjusting subject to make it valid vote thread]
> >
> > On 13 May 2015 at 19:23, Stefan Fuhrmann <st...@wandisco.com>
> wrote:
> >> Hi there,
> >>
> >> Ivan has reviewed my recent membuffer cache
> >> key handling changes, corrected and backported
> >> them on the 1.9-cache-improvements branch.
> >>
> >> I reviewed it and I'm +1 on merging it to /trunk -
> >> hoping we may even get it into 1.9. Since this
> >> touches a sensitive part of the server code, I'd
> >> like to see 2 more +1s for the branch->/trunk
> >> merge.
> >>
> > +1.
> >
> > PS: I think detailed log message will be useful for reviewers. I'll
> > make it tomorrow if you didn't outstrip me.
>
> -0 because:
>
> $ make
> .../subversion/libsvn_subr/cache-membuffer.c:2626:59: warning: implicit
> conversion loses integer
>       precision: 'unsigned long' to 'apr_uint32_t' (aka 'unsigned int')
> [-Wshorten-64-to-32]
>   cache->combined_key.entry_key.key_len = aligned_key_len + prefix_len;
>                                         ~ ~~~~~~~~~~~~~~~~^~~~~~~~~~~~
> .../subversion/libsvn_subr/cache-membuffer.c:2664:58: warning: implicit
> conversion loses integer
>       precision: 'unsigned long' to 'apr_uint32_t' (aka 'unsigned int')
> [-Wshorten-64-to-32]
>       cache->combined_key.entry_key.key_len = prefix_len + 16;
>                                             ~ ~~~~~~~~~~~^~~~
> .../subversion/libsvn_subr/cache-membuffer.c:3161:37: warning: implicit
> conversion loses integer
>       precision: 'apr_size_t' (aka 'unsigned long') to 'apr_uint32_t' (aka
> 'unsigned int') [-Wshorten-64-to-32]
>   cache->prefix.entry_key.key_len = prefix_len;
>                                   ~ ^~~~~~~~~~
> 3 warnings generated.
>

Ugh!

As of r1679863, all item and key sizes are represented
as size_t throughout cache-membuffer.c.

The reason why I first tried sticking with u32 was that
larger entry_t structs mean fewer of them fit into the
index hash. Per group, there are 10 entries on /trunk,
8 on the 1.9-cache-improvements branch and only 7
for the 1.10 equivalent. So, 1.10 will be able to keep
only 30% cache entries than 1.9rc1. Still ok-ish but it
is a tight fit.


> +1 if these warnings get fixed before or as part of the merge without
> adding casts.
>

Even done away with older casts :)

Going to merge to /trunk now with your and Bert's vote.

-- Stefan^2.

Re: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Sat, May 16, 2015 at 7:45 PM, Ivan Zhakov <iv...@visualsvn.com> wrote:

> On 16 May 2015 at 07:48, Stefan Fuhrmann <st...@wandisco.com>
> wrote:
> > On Fri, May 15, 2015 at 7:25 PM, Philip Martin <
> philip.martin@wandisco.com>
> > wrote:
> >> Another issue: find_entry() now calls drop_entry() in more cases and can
> >> now call it when find_empty==FALSE during read operations.  On Unix when
> >> using the read-write lock this means the cache gets modified while only
> >> holding a read lock, not a write lock, and that can corrupt the cache.
> >>
> As far I understand find_entry(find_empty == TRUE) requires write-lock
> for cache, while only read-lock is required for find_empty == FALSE.
> Is correct? It would be nice to document this in docstring.
>

Done in r1679866.

-- Stefan^2.

Re: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 16 May 2015 at 07:48, Stefan Fuhrmann <st...@wandisco.com> wrote:
> On Fri, May 15, 2015 at 7:25 PM, Philip Martin <ph...@wandisco.com>
> wrote:
>>
>> Philip Martin <ph...@wandisco.com> writes:
>>
>> > Philip Martin <ph...@wandisco.com> writes:
>> >
>> >> Prompted by the warnings I think there are some issues to fix.  For
>> >> APR_HASH_KEY_STRING keys there is no protection against abnormally long
>> >> keys.  combined_long_key() will allocate strlen() memory even if it is
>> >> many GB.  The item will not get cached if key+data length is more than
>> >> 4GB but the memory for the key, which could be more than 4GB, will be
>> >> permanently allocated in the cache.  There is also a problem with
>> >> overflow in membuffer_cache_set_internal() when calculating key+data
>> >> length, although in practice a key large enough to trigger this will
>> >> probably fail memory allocation first.
>> >
>> > Another issue:
>> >
>> > static svn_boolean_t
>> > entry_keys_match(const entry_key_t *lhs,
>> >                  const entry_key_t *rhs)
>> > {
>> >   return (lhs->fingerprint[0] == rhs->fingerprint[0])
>> >       && (lhs->fingerprint[1] == rhs->fingerprint[1])
>> >       && (lhs->key_len == rhs->key_len);
>> > }
>> >
>> > I think the key_len comparison is wrong and should be removed.  If two
>> > keys have fingerprints that collide it does not mean the key lengths are
>> > the same.  When attempting to use two keys with colliding fingerprints
>> > the behaviour when the key lengths vary should be same as when the key
>> > lengths are the same and the key data varies.
>
>
> Not necessary. It is true that we could decide to *only* use
> the fingerprint as discriminator. But the implementation here
> includes the key length; it reduces the number of conflicts
> that result in entries that evict each other.
>
> Most fixed-length keys are 16 bytes long and for them, the
> fingerprint is almost identical to the key and conflicts are
> extremely unlikely. So, varible-length keys are the one that
> may have fingerprint conflicts now due to the weakness of
> FNV1. For them, however, the key_len (often derived from
> paths) has a good chance of being different.
>
> Note that if the key_len differs, it *cannot* be the same key.
> Hence, including the key_len does not create false negatives.
>
+1.

>> Another issue: find_entry() now calls drop_entry() in more cases and can
>> now call it when find_empty==FALSE during read operations.  On Unix when
>> using the read-write lock this means the cache gets modified while only
>> holding a read lock, not a write lock, and that can corrupt the cache.
>>
As far I understand find_entry(find_empty == TRUE) requires write-lock
for cache, while only read-lock is required for find_empty == FALSE.
Is correct? It would be nice to document this in docstring.

>> If we use read-write locks then when read detects a fingerprint
>> collision it cannot modify the cache to clear the collision, it will
>> have to return NULL and leave the entry in the cache.
>
>
> You are absolutely right. Fixed in r1679681.
>

PS: Just to make things clear: branch has been changed so my +1
doesn't apply to it.

-- 
Ivan Zhakov

Re: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Fri, May 15, 2015 at 7:25 PM, Philip Martin <ph...@wandisco.com>
wrote:

> Philip Martin <ph...@wandisco.com> writes:
>
> > Philip Martin <ph...@wandisco.com> writes:
> >
> >> Prompted by the warnings I think there are some issues to fix.  For
> >> APR_HASH_KEY_STRING keys there is no protection against abnormally long
> >> keys.  combined_long_key() will allocate strlen() memory even if it is
> >> many GB.  The item will not get cached if key+data length is more than
> >> 4GB but the memory for the key, which could be more than 4GB, will be
> >> permanently allocated in the cache.  There is also a problem with
> >> overflow in membuffer_cache_set_internal() when calculating key+data
> >> length, although in practice a key large enough to trigger this will
> >> probably fail memory allocation first.
> >
> > Another issue:
> >
> > static svn_boolean_t
> > entry_keys_match(const entry_key_t *lhs,
> >                  const entry_key_t *rhs)
> > {
> >   return (lhs->fingerprint[0] == rhs->fingerprint[0])
> >       && (lhs->fingerprint[1] == rhs->fingerprint[1])
> >       && (lhs->key_len == rhs->key_len);
> > }
> >
> > I think the key_len comparison is wrong and should be removed.  If two
> > keys have fingerprints that collide it does not mean the key lengths are
> > the same.  When attempting to use two keys with colliding fingerprints
> > the behaviour when the key lengths vary should be same as when the key
> > lengths are the same and the key data varies.
>

Not necessary. It is true that we could decide to *only* use
the fingerprint as discriminator. But the implementation here
includes the key length; it reduces the number of conflicts
that result in entries that evict each other.

Most fixed-length keys are 16 bytes long and for them, the
fingerprint is almost identical to the key and conflicts are
extremely unlikely. So, varible-length keys are the one that
may have fingerprint conflicts now due to the weakness of
FNV1. For them, however, the key_len (often derived from
paths) has a good chance of being different.

Note that if the key_len differs, it *cannot* be the same key.
Hence, including the key_len does not create false negatives.


> Another issue: find_entry() now calls drop_entry() in more cases and can
> now call it when find_empty==FALSE during read operations.  On Unix when
> using the read-write lock this means the cache gets modified while only
> holding a read lock, not a write lock, and that can corrupt the cache.
>
> If we use read-write locks then when read detects a fingerprint
> collision it cannot modify the cache to clear the collision, it will
> have to return NULL and leave the entry in the cache.
>

You are absolutely right. Fixed in r1679681.

-- Stefan^2.

Re: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> Philip Martin <ph...@wandisco.com> writes:
>
>> Prompted by the warnings I think there are some issues to fix.  For
>> APR_HASH_KEY_STRING keys there is no protection against abnormally long
>> keys.  combined_long_key() will allocate strlen() memory even if it is
>> many GB.  The item will not get cached if key+data length is more than
>> 4GB but the memory for the key, which could be more than 4GB, will be
>> permanently allocated in the cache.  There is also a problem with
>> overflow in membuffer_cache_set_internal() when calculating key+data
>> length, although in practice a key large enough to trigger this will
>> probably fail memory allocation first.
>
> Another issue:
>
> static svn_boolean_t
> entry_keys_match(const entry_key_t *lhs,
>                  const entry_key_t *rhs)
> {
>   return (lhs->fingerprint[0] == rhs->fingerprint[0])
>       && (lhs->fingerprint[1] == rhs->fingerprint[1])
>       && (lhs->key_len == rhs->key_len);
> }
>
> I think the key_len comparison is wrong and should be removed.  If two
> keys have fingerprints that collide it does not mean the key lengths are
> the same.  When attempting to use two keys with colliding fingerprints
> the behaviour when the key lengths vary should be same as when the key
> lengths are the same and the key data varies.

Another issue: find_entry() now calls drop_entry() in more cases and can
now call it when find_empty==FALSE during read operations.  On Unix when
using the read-write lock this means the cache gets modified while only
holding a read lock, not a write lock, and that can corrupt the cache.

If we use read-write locks then when read detects a fingerprint
collision it cannot modify the cache to clear the collision, it will
have to return NULL and leave the entry in the cache.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> Prompted by the warnings I think there are some issues to fix.  For
> APR_HASH_KEY_STRING keys there is no protection against abnormally long
> keys.  combined_long_key() will allocate strlen() memory even if it is
> many GB.  The item will not get cached if key+data length is more than
> 4GB but the memory for the key, which could be more than 4GB, will be
> permanently allocated in the cache.  There is also a problem with
> overflow in membuffer_cache_set_internal() when calculating key+data
> length, although in practice a key large enough to trigger this will
> probably fail memory allocation first.

Another issue:

static svn_boolean_t
entry_keys_match(const entry_key_t *lhs,
                 const entry_key_t *rhs)
{
  return (lhs->fingerprint[0] == rhs->fingerprint[0])
      && (lhs->fingerprint[1] == rhs->fingerprint[1])
      && (lhs->key_len == rhs->key_len);
}

I think the key_len comparison is wrong and should be removed.  If two
keys have fingerprints that collide it does not mean the key lengths are
the same.  When attempting to use two keys with colliding fingerprints
the behaviour when the key lengths vary should be same as when the key
lengths are the same and the key data varies.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Philip Martin <ph...@wandisco.com>.
Philip Martin <ph...@wandisco.com> writes:

> Prompted by the warnings I think there are some issues to fix.  For
> APR_HASH_KEY_STRING keys there is no protection against abnormally long
> keys.  combined_long_key() will allocate strlen() memory even if it is
> many GB.  The item will not get cached if key+data length is more than
> 4GB but the memory for the key, which could be more than 4GB, will be
> permanently allocated in the cache.  There is also a problem with
> overflow in membuffer_cache_set_internal() when calculating key+data
> length, although in practice a key large enough to trigger this will
> probably fail memory allocation first.

If enforce a runtime limit on key size, both fixed and variable, making
it a positive number less than MAX_ITEM_SIZE then even in 32 bits we can
detect overflow with something like this:

    apr_size_t item_size, key_len.
    svn_boolean_t too_big = item_size > MAX_ITEM_SIZE - key_len;

So a patch something like:

Index: subversion/libsvn_subr/cache-membuffer.c
===================================================================
--- subversion/libsvn_subr/cache-membuffer.c	(revision 1679506)
+++ subversion/libsvn_subr/cache-membuffer.c	(working copy)
@@ -197,7 +197,7 @@ typedef struct entry_key_t
 
   /* Length of the full key.  This value is aligned to ITEM_ALIGNMENT to
    * make sure the subsequent item content is properly aligned. */
-  apr_uint32_t key_len;
+  apr_size_t key_len;
 } entry_key_t;
 
 /* A full key, i.e. the combination of the cache's key prefix with some
@@ -2013,14 +2013,19 @@ membuffer_cache_set_internal(svn_membuffer_t *cach
                              apr_pool_t *scratch_pool)
 {
   cache_level_t *level;
-  apr_size_t size = item_size + to_find->entry_key.key_len;
+  apr_size_t size;
+  svn_boolean_t too_big
+    = item_size > MAX_ITEM_SIZE - to_find->entry_key.key_len;
 
   /* first, look for a previous entry for the given key */
   entry_t *entry = find_entry(cache, group_index, to_find, FALSE);
 
+  if (!too_big)
+    size = item_size + to_find->entry_key.key_len;
+
   /* if there is an old version of that entry and the new data fits into
    * the old spot, just re-use that space. */
-  if (entry && ALIGN_VALUE(entry->size) >= size && buffer)
+  if (entry && !too_big && ALIGN_VALUE(entry->size) >= size && buffer)
     {
       /* Careful! We need to cast SIZE to the full width of CACHE->DATA_USED
        * lest we run into trouble with 32 bit underflow *not* treated as a
@@ -2052,8 +2057,9 @@ membuffer_cache_set_internal(svn_membuffer_t *cach
 
   /* if necessary, enlarge the insertion window.
    */
-  level = buffer ? select_level(cache, size, priority) : NULL;
-  if (level)
+  if (!too_big)
+    level = buffer ? select_level(cache, size, priority) : NULL;
+  if (!too_big && level)
     {
       /* Remove old data for this key, if that exists.
        * Get an unused entry for the key and and initialize it with
@@ -2477,10 +2483,12 @@ membuffer_cache_set_partial_internal(svn_membuffer
            */
           if (item_data != orig_data)
             {
+              svn_boolean_t too_big = item_size > MAX_ITEM_SIZE - key_len;
+              
               /* Remove the old entry and try to make space for the new one.
                */
               drop_entry(cache, entry);
-              if (   (cache->max_entry_size >= item_size + key_len)
+              if (!too_big
                   && ensure_data_insertable_l1(cache, item_size + key_len))
                 {
                   /* Write the new entry.
@@ -2598,12 +2606,18 @@ typedef struct svn_membuffer_cache_t
   svn_mutex__t *mutex;
 } svn_membuffer_cache_t;
 
+/* Largest fixed-size key and longest APR_HASH_KEY_STRING key allowed,
+ * must be less than MAX_ITEM_SIZE.  An arbitrary number much higher
+ * than the expected key sizes of couple of hundred bytes at most.
+ */
+#define MAX_KEY_SIZE 50000
+
 /* Basically calculate a hash value for KEY of length KEY_LEN, combine it
  * with the CACHE->PREFIX and write the result in CACHE->COMBINED_KEY.
  * This could replace combine_key() entirely but we actually use it only
  * when the quick path failed.
  */
-static void
+static svn_error_t *
 combine_long_key(svn_membuffer_cache_t *cache,
                  const void *key,
                  apr_ssize_t key_len)
@@ -2615,7 +2629,12 @@ combine_long_key(svn_membuffer_cache_t *cache,
 
   /* handle variable-length keys */
   if (key_len == APR_HASH_KEY_STRING)
-    key_len = strlen((const char *) key);
+    {
+      key_len = strlen((const char *) key);
+      if (key_len > MAX_KEY_SIZE)
+        return svn_error_create(SVN_ERR_UNSUPPORTED_FEATURE, NULL,
+                                _("Key too big for membuffer-based cache"));
+    }
 
   /* Combine keys. */
   aligned_key_len = ALIGN_VALUE(key_len);
@@ -2636,12 +2655,14 @@ combine_long_key(svn_membuffer_cache_t *cache,
     ^= cache->prefix.entry_key.fingerprint[0];
   cache->combined_key.entry_key.fingerprint[1]
     ^= cache->prefix.entry_key.fingerprint[1];
+
+  return SVN_NO_ERROR;
 }
 
 /* Basically calculate a hash value for KEY of length KEY_LEN, combine it
  * with the CACHE->PREFIX and write the result in CACHE->COMBINED_KEY.
  */
-static void
+static svn_error_t *
 combine_key(svn_membuffer_cache_t *cache,
             const void *key,
             apr_ssize_t key_len)
@@ -2678,8 +2699,13 @@ combine_key(svn_membuffer_cache_t *cache,
   else
     {
       /* longer or variably sized keys */
-      combine_long_key(cache, key, key_len);
+      SVN_ERR(combine_long_key(cache, key, key_len));
     }
+
+  cache->combined_key.entry_key.fingerprint[0] = 0x01010183;
+  cache->combined_key.entry_key.fingerprint[1] = 0x0302030F;
+
+  return SVN_NO_ERROR;
 }
 
 /* Implement svn_cache__vtable_t.get (not thread-safe)
@@ -2707,7 +2733,7 @@ svn_membuffer_cache_get(void **value_p,
   /* construct the full, i.e. globally unique, key by adding
    * this cache instances' prefix
    */
-  combine_key(cache, key, cache->key_len);
+  SVN_ERR(combine_key(cache, key, cache->key_len));
 
   /* Look the item up. */
   SVN_ERR(membuffer_cache_get(cache->membuffer,
@@ -2744,7 +2770,7 @@ svn_membuffer_cache_has_key(svn_boolean_t *found,
   /* construct the full, i.e. globally unique, key by adding
    * this cache instances' prefix
    */
-  combine_key(cache, key, cache->key_len);
+  SVN_ERR(combine_key(cache, key, cache->key_len));
 
   /* Look the item up. */
   SVN_ERR(membuffer_cache_has_key(cache->membuffer,
@@ -2774,7 +2800,7 @@ svn_membuffer_cache_set(void *cache_void,
   /* construct the full, i.e. globally unique, key by adding
    * this cache instances' prefix
    */
-  combine_key(cache, key, cache->key_len);
+  SVN_ERR(combine_key(cache, key, cache->key_len));
 
   /* (probably) add the item to the cache. But there is no real guarantee
    * that the item will actually be cached afterwards.
@@ -2824,7 +2850,7 @@ svn_membuffer_cache_get_partial(void **value_p,
       return SVN_NO_ERROR;
     }
 
-  combine_key(cache, key, cache->key_len);
+  SVN_ERR(combine_key(cache, key, cache->key_len));
   SVN_ERR(membuffer_cache_get_partial(cache->membuffer,
                                       &cache->combined_key,
                                       value_p,
@@ -2852,7 +2878,7 @@ svn_membuffer_cache_set_partial(void *cache_void,
 
   if (key != NULL)
     {
-      combine_key(cache, key, cache->key_len);
+      SVN_ERR(combine_key(cache, key, cache->key_len));
       SVN_ERR(membuffer_cache_set_partial(cache->membuffer,
                                           &cache->combined_key,
                                           func,
@@ -3126,6 +3152,10 @@ svn_cache__create_membuffer_cache(svn_cache__t **c
   svn_cache__t *wrapper = apr_pcalloc(result_pool, sizeof(*wrapper));
   svn_membuffer_cache_t *cache = apr_pcalloc(result_pool, sizeof(*cache));
 
+  if (klen != APR_HASH_KEY_STRING && klen > MAX_KEY_SIZE)
+    return svn_error_create(SVN_ERR_UNSUPPORTED_FEATURE, NULL,
+                            _("Key too big for membuffer-based cache"));
+
   /* initialize our internal cache header
    */
   cache->membuffer = membuffer;

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Philip Martin <ph...@wandisco.com>.
Branko Čibej <br...@wandisco.com> writes:

> -0 because:
>
> $ make
> .../subversion/libsvn_subr/cache-membuffer.c:2626:59: warning: implicit conversion loses integer
>       precision: 'unsigned long' to 'apr_uint32_t' (aka 'unsigned int') [-Wshorten-64-to-32]
>   cache->combined_key.entry_key.key_len = aligned_key_len + prefix_len;
>                                         ~ ~~~~~~~~~~~~~~~~^~~~~~~~~~~~
> .../subversion/libsvn_subr/cache-membuffer.c:2664:58: warning: implicit conversion loses integer
>       precision: 'unsigned long' to 'apr_uint32_t' (aka 'unsigned int') [-Wshorten-64-to-32]
>       cache->combined_key.entry_key.key_len = prefix_len + 16;
>                                             ~ ~~~~~~~~~~~^~~~
> .../subversion/libsvn_subr/cache-membuffer.c:3161:37: warning: implicit conversion loses integer
>       precision: 'apr_size_t' (aka 'unsigned long') to 'apr_uint32_t' (aka 'unsigned int') [-Wshorten-64-to-32]
>   cache->prefix.entry_key.key_len = prefix_len;
>                                   ~ ^~~~~~~~~~
> 3 warnings generated.
>
> +1 if these warnings get fixed before or as part of the merge without
> adding casts.

I'd like to see this code on trunk but I am reluctant to vote because I
don't think a change like this should require a vote.

Prompted by the warnings I think there are some issues to fix.  For
APR_HASH_KEY_STRING keys there is no protection against abnormally long
keys.  combined_long_key() will allocate strlen() memory even if it is
many GB.  The item will not get cached if key+data length is more than
4GB but the memory for the key, which could be more than 4GB, will be
permanently allocated in the cache.  There is also a problem with
overflow in membuffer_cache_set_internal() when calculating key+data
length, although in practice a key large enough to trigger this will
probably fail memory allocation first.

In practice keys are short, a 6GB key probably indicates that a bug has
already occurred and that memory is corrupt. However, the code should be
more defensive.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Branko Čibej <br...@wandisco.com>.
On 13.05.2015 19:04, Ivan Zhakov wrote:
> [adjusting subject to make it valid vote thread]
>
> On 13 May 2015 at 19:23, Stefan Fuhrmann <st...@wandisco.com> wrote:
>> Hi there,
>>
>> Ivan has reviewed my recent membuffer cache
>> key handling changes, corrected and backported
>> them on the 1.9-cache-improvements branch.
>>
>> I reviewed it and I'm +1 on merging it to /trunk -
>> hoping we may even get it into 1.9. Since this
>> touches a sensitive part of the server code, I'd
>> like to see 2 more +1s for the branch->/trunk
>> merge.
>>
> +1.
>
> PS: I think detailed log message will be useful for reviewers. I'll
> make it tomorrow if you didn't outstrip me.

-0 because:

$ make
.../subversion/libsvn_subr/cache-membuffer.c:2626:59: warning: implicit conversion loses integer
      precision: 'unsigned long' to 'apr_uint32_t' (aka 'unsigned int') [-Wshorten-64-to-32]
  cache->combined_key.entry_key.key_len = aligned_key_len + prefix_len;
                                        ~ ~~~~~~~~~~~~~~~~^~~~~~~~~~~~
.../subversion/libsvn_subr/cache-membuffer.c:2664:58: warning: implicit conversion loses integer
      precision: 'unsigned long' to 'apr_uint32_t' (aka 'unsigned int') [-Wshorten-64-to-32]
      cache->combined_key.entry_key.key_len = prefix_len + 16;
                                            ~ ~~~~~~~~~~~^~~~
.../subversion/libsvn_subr/cache-membuffer.c:3161:37: warning: implicit conversion loses integer
      precision: 'apr_size_t' (aka 'unsigned long') to 'apr_uint32_t' (aka 'unsigned int') [-Wshorten-64-to-32]
  cache->prefix.entry_key.key_len = prefix_len;
                                  ~ ^~~~~~~~~~
3 warnings generated.

+1 if these warnings get fixed before or as part of the merge without
adding casts.

-- Brane


Re: [VOTE] Merging 1.9-cache-improvements to /trunk

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Wed, May 13, 2015 at 7:04 PM, Ivan Zhakov <iv...@visualsvn.com> wrote:

> [adjusting subject to make it valid vote thread]
>
> On 13 May 2015 at 19:23, Stefan Fuhrmann <st...@wandisco.com>
> wrote:
> > Hi there,
> >
> > Ivan has reviewed my recent membuffer cache
> > key handling changes, corrected and backported
> > them on the 1.9-cache-improvements branch.
> >
> > I reviewed it and I'm +1 on merging it to /trunk -
> > hoping we may even get it into 1.9. Since this
> > touches a sensitive part of the server code, I'd
> > like to see 2 more +1s for the branch->/trunk
> > merge.
> >
> +1.
>
> PS: I think detailed log message will be useful for reviewers. I'll
> make it tomorrow if you didn't outstrip me.
>

Go for it, I'm still on semi-holiday.

-- Stefan^2,