You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Yann Ylavic <yl...@gmail.com> on 2014/07/10 19:53:42 UTC

Skiplist improvements

Hello,

while trying to play with skiplists and multiple identical values
(doublons, inserted with apr_skiplist_add), I have faced several
issues.

The doublons compare equally but do not necessarily contain the same
object, hence I had hoped that doublons added last would have been
inserted last (this is the case, apr_skiplist_pop() works as
expected), and that apr_skiplist_find() would have returned the first
(oldest, FIFO) or last (newest, LIFO) all the time (this is not the
case).

The issue is that when a doublon is insert_compare()d, it will not
necessarily (and even unlikely) be at the same height as the existing
doublon(s).
Hence a "skip path" may be created between any previous value and the
new doublon just inserted, and that path may be taken by
apr_skiplist_find() to skip the oldest doublon(s).

I think this is not a desirable behaviour (the internals are
probabilistic but the returned values through the API shouldn't be :p
).

Another reason for this to be fixed is that I'd also like to have
apr_skiplist_replace[_compare]() functions that replace any existing
doublon(s) with the new value.
This is similar to apr_table_add() vs apr_table_set(), but since
apr_skiplist_set_compare() already exists (for something else), maybe
the term "replace" can be used instead of "set" here, and we would
have 3 insertion functions (modulo the _compare() versions):
apr_skiplist_insert (keep existing doublon(s) if a new one is
inserted), apr_skiplist_add (always insert), and apr_skiplist_replace
(as described above).

Beyond the names, apr_skiplist_replace() would be hard/costly to
implement if insertions don't follow a deterministic rule. Suppose the
"skip path"s taken by insert_compare() do not lead to the first
doublon, we would have to check both next and prev (which is not an
easy thing in a skiplist's structure) to be sure to remove all the
existing doublons.
Again, using the same height for inserted doublons helps here.

Since all the doublons are at the same height, we can create a link
between them, so that we can walk trough them quickly (a simple
pointer that equals the "next" pointer when the next entry is a
doublon is enough to either go to the last doublon, and/or skip the
doublons as a whole to reach the next non-doublon entry).

Based on this, I created the attached patch that provides :
- apr_skiplist_replace[_compare]() as described above,
- apr_skiplist_find_last[_compare]() which return the last searched
doublon instead of the first one returned by the existing
apr_skiplist_find[_compare](),
- apr_skiplist_remove_(first/last)[_compare]() which remove
respectively the first/last doublon, whereas the existing
apr_skiplist_remove[_compare]() remove them all,
- apr_skiplist_size() and apr_skiplist_height() that return the
current size and height of the skiplist (cheap and good to have, the
patch also fixes a double increment of the size in the
insert_compare() function so that the returned value is valid, though
not used internally),
- apr_skiplist_alloc_raw() which uses malloc (or apr_palloc) instead
of the zeroing calloc (or apr_pcalloc), is also used internally where
a double initialization is sub-optimal,
- apr_skiplist_clear() which does what apr_skiplist_destroy() was
doing, by changing apr_skiplist_destroy() so that is free()s the given
skiplist pointer if it was malloc()ed (the current implementation with
no pool allocates the skiplist's struct in init() but does not free it
in destroy(), I find this error/leak prone),
- reentrant iterators (in + out) for
apr_skiplist_find[_last][_compare]() so that one can use this
functions to iterate sequentially or over the doublons by using the
last outputed iterator as input (the new
apr_skiplist_find_last[_compare]() also take a prev iterator as
argument so that one can use with the previous node of a last doublon,
eg. to continue/remove from there),
-apr_skiplist_top() which returns the top of the skiplist (useful to
initialize iterators or else),
-apr_skiplist_element() which returns the element of the given node
(useful to dereference an iterator),
- some debug/printf traces #if SKIPLIST_DEBUG_OUT,
- a bunch of tests in testskiplist.c to validate all that (maybe still
incomplete).

This is a all in one patch I came to by adding missing things on the fly.
I can stage it if needed, but for now just wanted to propose the ideas
for comments/review.

Thoughs?

Regards,
Yann.

Re: Skiplist improvements

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Jul 10, 2014, at 1:53 PM, Yann Ylavic <yl...@gmail.com> wrote:

> Hello,
> 
> while trying to play with skiplists and multiple identical values
> (doublons, inserted with apr_skiplist_add), I have faced several
> issues.
> 
> The doublons compare equally but do not necessarily contain the same
> object, hence I had hoped that doublons added last would have been
> inserted last (this is the case, apr_skiplist_pop() works as
> expected), and that apr_skiplist_find() would have returned the first
> (oldest, FIFO) or last (newest, LIFO) all the time (this is not the
> case).
> 
> The issue is that when a doublon is insert_compare()d, it will not
> necessarily (and even unlikely) be at the same height as the existing
> doublon(s).
> Hence a "skip path" may be created between any previous value and the
> new doublon just inserted, and that path may be taken by
> apr_skiplist_find() to skip the oldest doublon(s).
> 
> I think this is not a desirable behaviour (the internals are
> probabilistic but the returned values through the API shouldn't be :p
> ).
> 

What does the canonical description of skiplists say?
Is that desirable or expected behavior? What do other
skiplist implementations do?


Re: Skiplist improvements

Posted by Yann Ylavic <yl...@gmail.com>.
On Thu, Jul 10, 2014 at 7:53 PM, Yann Ylavic <yl...@gmail.com> wrote:
> - a bunch of tests in testskiplist.c to validate all that (maybe still
> incomplete).

Attached is the (gziped) output of testskiplist when run with
-DSKIPLIST_DEBUG_OUT compiled skiplist.

Re: Skiplist improvements

Posted by Jim Jagielski <ji...@jaguNET.com>.
+1
On Jul 11, 2014, at 4:31 PM, Yann Ylavic <yl...@gmail.com> wrote:

> On Fri, Jul 11, 2014 at 10:22 PM, Yann Ylavic <yl...@gmail.com> wrote:
>> There also seem to be some doubtful code in apr_skiplist_remove_all(),
>> addressed by the following patch :
>> 
>> Index: tables/apr_skiplist.c
>> ===================================================================
>> --- tables/apr_skiplist.c    (revision 1609655)
>> +++ tables/apr_skiplist.c    (working copy)
>> @@ -591,16 +591,17 @@ APR_DECLARE(void) apr_skiplist_remove_all(apr_skip
>>     m = sl->bottom;
>>     while (m) {
>>         p = m->next;
>> -        if (p && myfree && p->data)
>> -            myfree(p->data);
>> +        if (myfree && m->data)
>> +            myfree(m->data);
>>         while (m) {
>>             u = m->up;
>> -            apr_skiplist_free(sl, p);
>> +            apr_skiplist_free(sl, m);
>>             m = u;
>>         }
>>         m = p;
>>     }
>>     sl->top = sl->bottom = NULL;
>> +    sl->topend = sl->bottomend = NULL;
>>     sl->height = 0;
>>     sl->size = 0;
>> }
>> 
>> Do you agree?
> 
> Hmm, no, the initial myfree(p->data) is on purpose, bottom is a fake
> node, the real first data are on bottom->next.
> 
> Rather this patch :
> 
> Index: tables/apr_skiplist.c
> ===================================================================
> --- tables/apr_skiplist.c    (revision 1609655)
> +++ tables/apr_skiplist.c    (working copy)
> @@ -595,12 +595,13 @@ APR_DECLARE(void) apr_skiplist_remove_all(apr_skip
>             myfree(p->data);
>         while (m) {
>             u = m->up;
> -            apr_skiplist_free(sl, p);
> +            apr_skiplist_free(sl, m);
>             m = u;
>         }
>         m = p;
>     }
>     sl->top = sl->bottom = NULL;
> +    sl->topend = sl->bottomend = NULL;
>     sl->height = 0;
>     sl->size = 0;
> }
> 
> to avoid multiple free() on the same node.
> (NULLing topend and bottomend is a safety guard since they may be used
> elsewhere without top or botton being checked).


Re: Skiplist improvements

Posted by Yann Ylavic <yl...@gmail.com>.
On Fri, Jul 11, 2014 at 10:22 PM, Yann Ylavic <yl...@gmail.com> wrote:
> There also seem to be some doubtful code in apr_skiplist_remove_all(),
> addressed by the following patch :
>
> Index: tables/apr_skiplist.c
> ===================================================================
> --- tables/apr_skiplist.c    (revision 1609655)
> +++ tables/apr_skiplist.c    (working copy)
> @@ -591,16 +591,17 @@ APR_DECLARE(void) apr_skiplist_remove_all(apr_skip
>      m = sl->bottom;
>      while (m) {
>          p = m->next;
> -        if (p && myfree && p->data)
> -            myfree(p->data);
> +        if (myfree && m->data)
> +            myfree(m->data);
>          while (m) {
>              u = m->up;
> -            apr_skiplist_free(sl, p);
> +            apr_skiplist_free(sl, m);
>              m = u;
>          }
>          m = p;
>      }
>      sl->top = sl->bottom = NULL;
> +    sl->topend = sl->bottomend = NULL;
>      sl->height = 0;
>      sl->size = 0;
>  }
>
> Do you agree?

Hmm, no, the initial myfree(p->data) is on purpose, bottom is a fake
node, the real first data are on bottom->next.

Rather this patch :

 Index: tables/apr_skiplist.c
===================================================================
--- tables/apr_skiplist.c    (revision 1609655)
+++ tables/apr_skiplist.c    (working copy)
@@ -595,12 +595,13 @@ APR_DECLARE(void) apr_skiplist_remove_all(apr_skip
             myfree(p->data);
         while (m) {
             u = m->up;
-            apr_skiplist_free(sl, p);
+            apr_skiplist_free(sl, m);
             m = u;
         }
         m = p;
     }
     sl->top = sl->bottom = NULL;
+    sl->topend = sl->bottomend = NULL;
     sl->height = 0;
     sl->size = 0;
 }

to avoid multiple free() on the same node.
(NULLing topend and bottomend is a safety guard since they may be used
elsewhere without top or botton being checked).

Re: Skiplist improvements

Posted by Yann Ylavic <yl...@gmail.com>.
On Fri, Jul 11, 2014 at 9:58 PM, Yann Ylavic <yl...@gmail.com> wrote:
> On Fri, Jul 11, 2014 at 9:57 PM, Yann Ylavic <yl...@gmail.com> wrote:
>> On Fri, Jul 11, 2014 at 9:38 PM, Jim Jagielski <ji...@jagunet.com> wrote:
>>>
>>> On Jul 11, 2014, at 12:29 PM, Yann Ylavic <yl...@gmail.com> wrote:
>>>
>>>>>
>>>>> Maybe we could also leave skiplist as is and introduce a new type
>>>>> (skipmap?) that would be ordered and that would take both key and
>>>>> value as arguments (in the relevant functions)...
>>>>
>>>
>>> Yeah... good idea.
>>
>> OK, will propose that.
>>
>> I'll commit skiplist's size computation bugfix included in the current
>> patch, and maybe the stack reuse (to avoid malloc()s for each
>> insert()).
>
> And the tests.

There also seem to be some doubtful code in apr_skiplist_remove_all(),
addressed by the following patch :

Index: tables/apr_skiplist.c
===================================================================
--- tables/apr_skiplist.c    (revision 1609655)
+++ tables/apr_skiplist.c    (working copy)
@@ -591,16 +591,17 @@ APR_DECLARE(void) apr_skiplist_remove_all(apr_skip
     m = sl->bottom;
     while (m) {
         p = m->next;
-        if (p && myfree && p->data)
-            myfree(p->data);
+        if (myfree && m->data)
+            myfree(m->data);
         while (m) {
             u = m->up;
-            apr_skiplist_free(sl, p);
+            apr_skiplist_free(sl, m);
             m = u;
         }
         m = p;
     }
     sl->top = sl->bottom = NULL;
+    sl->topend = sl->bottomend = NULL;
     sl->height = 0;
     sl->size = 0;
 }

Do you agree?

Re: Skiplist improvements

Posted by Yann Ylavic <yl...@gmail.com>.
On Fri, Jul 11, 2014 at 9:57 PM, Yann Ylavic <yl...@gmail.com> wrote:
> On Fri, Jul 11, 2014 at 9:38 PM, Jim Jagielski <ji...@jagunet.com> wrote:
>>
>> On Jul 11, 2014, at 12:29 PM, Yann Ylavic <yl...@gmail.com> wrote:
>>
>>>>
>>>> Maybe we could also leave skiplist as is and introduce a new type
>>>> (skipmap?) that would be ordered and that would take both key and
>>>> value as arguments (in the relevant functions)...
>>>
>>
>> Yeah... good idea.
>
> OK, will propose that.
>
> I'll commit skiplist's size computation bugfix included in the current
> patch, and maybe the stack reuse (to avoid malloc()s for each
> insert()).

And the tests.

>
> Regarding apr_skiplist_destroy(), don't you think we should free() the
> given pointer when the skiplist was malloc()ated, and have
> apr_skiplist_clear() which does not?
>
> Last but not least, apr_skiplist_remove() to remove one (as is) or all
> the matching elements?

Re: Skiplist improvements

Posted by Yann Ylavic <yl...@gmail.com>.
On Fri, Jul 11, 2014 at 9:38 PM, Jim Jagielski <ji...@jagunet.com> wrote:
>
> On Jul 11, 2014, at 12:29 PM, Yann Ylavic <yl...@gmail.com> wrote:
>
>>>
>>> Maybe we could also leave skiplist as is and introduce a new type
>>> (skipmap?) that would be ordered and that would take both key and
>>> value as arguments (in the relevant functions)...
>>
>
> Yeah... good idea.

OK, will propose that.

I'll commit skiplist's size computation bugfix included in the current
patch, and maybe the stack reuse (to avoid malloc()s for each
insert()).

Regarding apr_skiplist_destroy(), don't you think we should free() the
given pointer when the skiplist was malloc()ated, and have
apr_skiplist_clear() which does not?

Last but not least, apr_skiplist_remove() to remove one (as is) or all
the matching elements?

Re: Skiplist improvements

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Jul 11, 2014, at 12:29 PM, Yann Ylavic <yl...@gmail.com> wrote:

>> 
>> Maybe we could also leave skiplist as is and introduce a new type
>> (skipmap?) that would be ordered and that would take both key and
>> value as arguments (in the relevant functions)...
> 

Yeah... good idea.

Re: Skiplist improvements

Posted by Yann Ylavic <yl...@gmail.com>.
Grr, off the list...

On Fri, Jul 11, 2014 at 6:29 PM, Yann Ylavic <yl...@gmail.com> wrote:
> On Fri, Jul 11, 2014 at 6:24 PM, Yann Ylavic <yl...@gmail.com> wrote:
>> On Fri, Jul 11, 2014 at 4:28 PM, Jim Jagielski <ji...@jagunet.com> wrote:
>>>
>>> On Jul 10, 2014, at 8:07 PM, Yann Ylavic <yl...@gmail.com> wrote:
>>>
>>>>
>>>> I propose to garantee the ordering presumably at low coding and
>>>> performance (if any) cost, so that one can choose between apr_table or
>>>> apr_skiplist to do similar things (iterate over all the values of a
>>>> key is one such thing, in a deterministic order is an other, both not
>>>> feasible with the current skiplist implementation).
>>>>
>>>
>>> ++1... For me, performance is key. Or, at least, some sort
>>> of option to allow for a skiplist to be created that
>>> guarantees order or not, depending on whether performance
>>> or ordering is more important.
>>>
>>> After all, the intent of skiplist *is* in performance,
>>> and not ordering, so sacrificing performance for ordering
>>> seems kinda wonky, unless one can turn that off :)
>>
>> I think I can add this option, but in "performance mode", what would
>> be the behaviour of :
>> 1. apr_skiplist_remove(),
>> 2. apr_skiplist_remove_first(), apr_skiplist_remove_last()
>> 3. apr_skiplist_find_last()
>>
>> 2. and 3. are introduced by this patch so the can return NULL (or
>> ENOTIMPL by changing the declaration) because it makes no sense to use
>> them without ordering.
>>
>> 1. is more tricky, should it remove one single value or all doublons?
>> Doublons are a new 1.6 feature, in 1.5 apr_skiplist_remove() had not
>> this to handle. Now we have to rule.
>> In the case apr_skiplist_remove() should act on a single value, the
>> caller would have to loop to remove all the doublons (restarting from
>> the top each time).
>> In the case apr_skiplist_remove() should act on all doublons, I don't
>> think there is an efficient way to do it without ordering (once a
>> value is reached, this is not necessarily the first, and we'd have to
>> compare both up and down->next until none remain).
>>
>> In both case we are stuck with non-ordered-remove performances outside
>> the insert/add()/pop() scheme.
>> Performance for a use case can't apply to all the others (one may not
>> need ordering but still remove to be performant).
>
> Maybe we could also leave skiplist as is and introduce a new type
> (skipmap?) that would be ordered and that would take both key and
> value as arguments (in the relevant functions)...

Re: Skiplist improvements

Posted by Yann Ylavic <yl...@gmail.com>.
On Fri, Jul 11, 2014 at 4:28 PM, Jim Jagielski <ji...@jagunet.com> wrote:
>
> On Jul 10, 2014, at 8:07 PM, Yann Ylavic <yl...@gmail.com> wrote:
>
>>
>> I propose to garantee the ordering presumably at low coding and
>> performance (if any) cost, so that one can choose between apr_table or
>> apr_skiplist to do similar things (iterate over all the values of a
>> key is one such thing, in a deterministic order is an other, both not
>> feasible with the current skiplist implementation).
>>
>
> ++1... For me, performance is key. Or, at least, some sort
> of option to allow for a skiplist to be created that
> guarantees order or not, depending on whether performance
> or ordering is more important.
>
> After all, the intent of skiplist *is* in performance,
> and not ordering, so sacrificing performance for ordering
> seems kinda wonky, unless one can turn that off :)

I think I can add this option, but in "performance mode", what would
be the behaviour of :
1. apr_skiplist_remove(),
2. apr_skiplist_remove_first(), apr_skiplist_remove_last()
3. apr_skiplist_find_last()

2. and 3. are introduced by this patch so the can return NULL (or
ENOTIMPL by changing the declaration) because it makes no sense to use
them without ordering.

1. is more tricky, should it remove one single value or all doublons?
Doublons are a new 1.6 feature, in 1.5 apr_skiplist_remove() had not
this to handle. Now we have to rule.
In the case apr_skiplist_remove() should act on a single value, the
caller would have to loop to remove all the doublons (restarting from
the top each time).
In the case apr_skiplist_remove() should act on all doublons, I don't
think there is an efficient way to do it without ordering (once a
value is reached, this is not necessarily the first, and we'd have to
compare both up and down->next until none remain).

In both case we are stuck with non-ordered-remove performances outside
the insert/add()/pop() scheme.
Performance for a use case can't apply to all the others (one may not
need ordering but still remove to be performant).

Re: Skiplist improvements

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Jul 10, 2014, at 8:07 PM, Yann Ylavic <yl...@gmail.com> wrote:

> 
> I propose to garantee the ordering presumably at low coding and
> performance (if any) cost, so that one can choose between apr_table or
> apr_skiplist to do similar things (iterate over all the values of a
> key is one such thing, in a deterministic order is an other, both not
> feasible with the current skiplist implementation).
> 

++1... For me, performance is key. Or, at least, some sort
of option to allow for a skiplist to be created that
guarantees order or not, depending on whether performance
or ordering is more important.

After all, the intent of skiplist *is* in performance,
and not ordering, so sacrificing performance for ordering
seems kinda wonky, unless one can turn that off :)


Re: Skiplist improvements

Posted by Yann Ylavic <yl...@gmail.com>.
On Thu, Jul 10, 2014 at 9:55 PM, Jim Jagielski <ji...@jagunet.com> wrote:
>
> On Jul 10, 2014, at 1:53 PM, Yann Ylavic <yl...@gmail.com> wrote:
>
>>
>> The issue is that when a doublon is insert_compare()d, it will not
>> necessarily (and even unlikely) be at the same height as the existing
>> doublon(s).
>> Hence a "skip path" may be created between any previous value and the
>> new doublon just inserted, and that path may be taken by
>> apr_skiplist_find() to skip the oldest doublon(s).
>>
>> I think this is not a desirable behaviour (the internals are
>> probabilistic but the returned values through the API shouldn't be :p
>> ).
>>
>
> What does the canonical description of skiplists say?
> Is that desirable or expected behavior? What do other
> skiplist implementations do?

I could not find such description about multiple (equals) values in
skiplists, only implementations.

The current APR's implementation is clearly not the only one to have
no garantee about order, but for what I can see (not a very extensive
search though), those don't provide an iterator (node) and/or the
next/previous() functions and/or the find*() functions that
advantageously take an update-able iterator as argument (even if it's
an ouput argument only now, I find the input/ouput choice
interesting).

Some provide an ordering garantee though, eg QMultiMap which (seems
to) use a skip-list structure (and provides upper/lower_bound()
functions), or std::multimap which is (usually) a rb-tree (a
comparable structure).

I saw implementations with elements of type list to handle doublons (I
first thought about having a specific chaining when modifying the
code, but worried about non-pointer iterators requirement), but found
none with the proposed implementation (didn't loot at QMultiMap's
one).

I propose to garantee the ordering presumably at low coding and
performance (if any) cost, so that one can choose between apr_table or
apr_skiplist to do similar things (iterate over all the values of a
key is one such thing, in a deterministic order is an other, both not
feasible with the current skiplist implementation).

The proposal does not change the structure (the new node's link
member, sibling = next_is_doublon ? next : NULL, is itself a doublon),
takes few cycles (only the empty loop while sibling++; to skip the
doublons when needed and to avoid more costly compare() function), and
has no memory overhead.

It also has the advantage to allow a simple replace() implementation
(most/all other implementations have add() and set() semantics, not
insert() as defined in APR), by removing existing element(s) in order
(and in the same loop) before inserting the new one.

Finally, I find the current API quite limited to httpd's event's use,
ie. insert() then pop() oldest values, hence this proposal.

PS: while having a look at other implementations, I found the idea to
configure the min/max height quite interesting, no limit is applied
currently in the APR's implementation.

Re: Skiplist improvements

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Jul 10, 2014, at 1:53 PM, Yann Ylavic <yl...@gmail.com> wrote:

> Hello,
> 
> while trying to play with skiplists and multiple identical values
> (doublons, inserted with apr_skiplist_add), I have faced several
> issues.
> 
> The doublons compare equally but do not necessarily contain the same
> object, hence I had hoped that doublons added last would have been
> inserted last (this is the case, apr_skiplist_pop() works as
> expected), and that apr_skiplist_find() would have returned the first
> (oldest, FIFO) or last (newest, LIFO) all the time (this is not the
> case).
> 
> The issue is that when a doublon is insert_compare()d, it will not
> necessarily (and even unlikely) be at the same height as the existing
> doublon(s).
> Hence a "skip path" may be created between any previous value and the
> new doublon just inserted, and that path may be taken by
> apr_skiplist_find() to skip the oldest doublon(s).
> 
> I think this is not a desirable behaviour (the internals are
> probabilistic but the returned values through the API shouldn't be :p
> ).
> 

What does the canonical description of skiplists say?
Is that desirable or expected behavior? What do other
skiplist implementations do?