You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apr.apache.org by Yann Ylavic <yl...@gmail.com> on 2014/07/10 19:53:42 UTC

Skiplist improvements

Hello,

while trying to play with skiplists and multiple identical values
(doublons, inserted with apr_skiplist_add), I have faced several
issues.

The doublons compare equally but do not necessarily contain the same
object, hence I had hoped that doublons added last would have been
inserted last (this is the case, apr_skiplist_pop() works as
expected), and that apr_skiplist_find() would have returned the first
(oldest, FIFO) or last (newest, LIFO) all the time (this is not the
case).

The issue is that when a doublon is insert_compare()d, it will not
necessarily (and even unlikely) be at the same height as the existing
doublon(s).
Hence a "skip path" may be created between any previous value and the
new doublon just inserted, and that path may be taken by
apr_skiplist_find() to skip the oldest doublon(s).

I think this is not a desirable behaviour (the internals are
probabilistic but the returned values through the API shouldn't be :p
).

Another reason for this to be fixed is that I'd also like to have
apr_skiplist_replace[_compare]() functions that replace any existing
doublon(s) with the new value.
This is similar to apr_table_add() vs apr_table_set(), but since
apr_skiplist_set_compare() already exists (for something else), maybe
the term "replace" can be used instead of "set" here, and we would
have 3 insertion functions (modulo the _compare() versions):
apr_skiplist_insert (keep existing doublon(s) if a new one is
inserted), apr_skiplist_add (always insert), and apr_skiplist_replace
(as described above).

Beyond the names, apr_skiplist_replace() would be hard/costly to
implement if insertions don't follow a deterministic rule. Suppose the
"skip path"s taken by insert_compare() do not lead to the first
doublon, we would have to check both next and prev (which is not an
easy thing in a skiplist's structure) to be sure to remove all the
existing doublons.
Again, using the same height for inserted doublons helps here.

Since all the doublons are at the same height, we can create a link
between them, so that we can walk trough them quickly (a simple
pointer that equals the "next" pointer when the next entry is a
doublon is enough to either go to the last doublon, and/or skip the
doublons as a whole to reach the next non-doublon entry).

Based on this, I created the attached patch that provides :
- apr_skiplist_replace[_compare]() as described above,
- apr_skiplist_find_last[_compare]() which return the last searched
doublon instead of the first one returned by the existing
apr_skiplist_find[_compare](),
- apr_skiplist_remove_(first/last)[_compare]() which remove
respectively the first/last doublon, whereas the existing
apr_skiplist_remove[_compare]() remove them all,
- apr_skiplist_size() and apr_skiplist_height() that return the
current size and height of the skiplist (cheap and good to have, the
patch also fixes a double increment of the size in the
insert_compare() function so that the returned value is valid, though
not used internally),
- apr_skiplist_alloc_raw() which uses malloc (or apr_palloc) instead
of the zeroing calloc (or apr_pcalloc), is also used internally where
a double initialization is sub-optimal,
- apr_skiplist_clear() which does what apr_skiplist_destroy() was
doing, by changing apr_skiplist_destroy() so that is free()s the given
skiplist pointer if it was malloc()ed (the current implementation with
no pool allocates the skiplist's struct in init() but does not free it
in destroy(), I find this error/leak prone),
- reentrant iterators (in + out) for
apr_skiplist_find[_last][_compare]() so that one can use this
functions to iterate sequentially or over the doublons by using the
last outputed iterator as input (the new
apr_skiplist_find_last[_compare]() also take a prev iterator as
argument so that one can use with the previous node of a last doublon,
eg. to continue/remove from there),
-apr_skiplist_top() which returns the top of the skiplist (useful to
initialize iterators or else),
-apr_skiplist_element() which returns the element of the given node
(useful to dereference an iterator),
- some debug/printf traces #if SKIPLIST_DEBUG_OUT,
- a bunch of tests in testskiplist.c to validate all that (maybe still
incomplete).

This is a all in one patch I came to by adding missing things on the fly.
I can stage it if needed, but for now just wanted to propose the ideas
for comments/review.

Thoughs?

Regards,
Yann.