You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@apache.org> on 2009/07/10 01:23:24 UTC
AVL vs Array, and DIRSERVER-1377

Hi guys,

I've been working with Kiran for 3 weeks trying to fix DIRSERVER-1377. I 
think we have eliminated many differentt erros, and added some needed 
synchronization code.

Right now, I can say that I was able to run the server using a 
multi-threaded client creating and deleting the same entry (on enetry 
per thread) without problem (I have done 2 000 000 updates, with 20 
threads).

But the oneLevel index is broken soon after the test has started, and I 
can reproduce this behavior quite easily, after only a few thousands of 
loops.

So i'm still digging, and now, my perception is that we have a problem 
in the current AVL tree implementation. I'm not very proud to admit that 
I have spent more than one week trying to implement my own version of 
those tree, using a damn smart algorithm, all iterative, with a hell of 
optimisation and "pointers" all over the nodes. What a *fool* !

Not only it was useless, but the code I ended with, more than 4500 lines 
of code, is not less likely to be bug free than the existing code. NIH 
syndrom, I guess.


Anyway, yesterday, I had a kind of illumination in the dark (and smelly 
code I'm gently floating in ...). Do we need to use AVLTrees ?

Let's see the pros and cons.

pros :
o From the academic POV, a very interesting data structure. A cool one 
to evaluate your students, as soon as they don't ask you to write the code
o From the theorical POV : search in 1.44 O( log( N ) ), add and delete 
in O ( Log( N ) ) too : quite efficient
o From the ASF POV : it could be promoted to commons-collection

cons :
o A hell to implement correctly
o A hell to test thoroughly
o 4500 line sof code : a hell to maintain
o Costly memory consumption (4 references to other nodes : left and 
right, next and previou, a balance flag, a reference to the value)
o Complicated addition, even more complicated deletion
o Complicated serialization
o Almost impossible to enter in the code without a solid understanding 
of the algorithm, a few existing decent web references.

So should we continue to use them ? Is there an alternative ?


No. And *Yes*.

And a damn good alternative : a simple array of sorted elements.

Why do we immediatly think about complex data structure when simple one 
are offering all what we need ?

Ok, what do we need ? A place to store a set of ordered values. Thiose 
values can be almost any kind of objects, and we may have millions of 
them. Let's consider the case we just store 512 objects max, as we will 
use another data structure on disk when we are above thise number.

A Object[] is all what we need, assuming it's sorted. And sorting such 
an array is easy, as we just have to find the place where we will insert 
the new value, copy some part of the array to the right, and insert the 
data. O(Log(N)) for the search, plus a copy of all the values above the 
given one.
Removing an element : same thing : find it, move all the elements on the 
left one slot, and we are done Searching : O(log(N)), as the array is 
sorted (use a dichotomic alogorithm. Quite easy)

Memory ? We just store an array of references to Objects, plus some 
extra size. Let's say we start with a 16 slots array. Not to much space 
is wasted. If we compare that with an AvlNode, when having more than 3 
nodes, we save memory using an array even slightly  too big.

Serialization is *way* easier, so is deserialization.

The whole code should be no more than a few hundred of commented and 
javadoced lines of code.

Last, not least, it's likely to be *faster* than the AvlTree for the 
basic search operation.

I'm almost done with this new implementation, just wanted to inform you 
about those ideas. And I also think that it will fix DIRSERVER-1377.

Last, not least, I will be MIA for 5 days - unless I find some cheap 
internet connection).

Thanks !

PS : Any thoughts ?

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org