You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@apache.org> on 2009/07/19 12:04:04 UTC
heads up about DIRSERVER-1377/1369
Hi,
so I think the issue is now fixed (crossing fingers). I write this mail
in order to give some info about what has been done.
First, I would like to thank Kiran who seconded me during the long
process of trying to get the bug fixed. I would like to thank Sumit
Goyal and Murali Krishna who found the bug and submitted a test to
exhibit the problem. It was incredibly useful.
It took me a month - even if I didn't worked only on this problem, as I
also had to deal with some client at the same time -, but I can tell
that it was the most complicated problem I have ever had to work on. I'm
not 100% sure what exactly was the problem, I'm afraid that I just put
it aside when I removed the AvlTree implentation to replace it with some
simpler data structure.
When the issue was first created, I thought it was a problem with MINA,
as we had a concurrency issue with version 2.0.0-M4. Then I bumped up
the version to 2.0.0-M5, and closed the initial issue. Sadly, after some
more tests, another issue showed up, with a different error message, and
the JIRA was reopened.
A test was provided, which was the most useful thing possible, as it
helped me to reproduce the error on my computer. We started some more
debugging session, and the problem occurred quite immediately.
But first, we have seen another problem : an index was growing without
bound (SubLevel). We debugged the server with Kiran, and found the
reason. At least, it was easy to fix, even if it wasn't related to the
initial issue. Lesson learned : when something is wrong, it's very
likely that while investigating the problem, you find some other
problems around :). The bad news is that many parts of the server need
more thorough tests :/
While reviewing the index handling, as the problem seemed to be in this
area (some few stackTrace helped to show that, at least : sometime old
school debugging with printf come to the rescue... ), we also cleaned
many small things, like useless deletion of non existing elements in index.
Then we started to think about the way the server stores data on disk.
Basically, you have N threads updating many index and the master table,
and all those data read and write is done in a thread-safe part of the
server (Jdbm backend). As every access to JDBM is synchronized, we
discarded the idea that JDBM was buggy at this point. It narrowed the
issue to what is in between the interceptor chain and JDBM.
Then we saw that the place we massage index was not thread safe. In
other words, as we have to update around 10 index when updating an
entry, it must be done in a way that guarantee that the index can't be
fooled by another thread. This was not the case. We then thought we
found the reason of the concurrent problem. We fixed that by adding a
hell of synchronized all over this part of the code.
Great success ! When the multithreaded test was launched, instead of
failing after a few hundreds of updates, it was now running well for
thousands of updates (I went up to 700 000 updates before getting a
breakage).
That was the good news. The bad news was that the problem was still
around, but deeper. At least, we know for sure that we removed a real
concurrent issue in the code. So what could be the problem ?
When we are storing data in indexes, we store one key associated to many
values (for instance, an ObjectClass can be related to many entries). In
order to manage this, we use a ordered data structure to store those
values. Namely, depending on the number of values, either an AvlTree or
a BTree. This AvlTree has to be serialized and deserialized in order to
be stored on disk. This is where I thought we could have an issue : the
de/serialization was done outside of the synchronised part, which is not
a problem per se, but we send and get back a byte[] from JDBM. What is
fishy here is that JDBM caches some data, and returns a *reference* to
the data . Getting a reference on a byte[] means that some other thread
can perfectly modify this byte[] at the same time, with a potential
impact on the expected result. It sounded like a good catch, so I
modified JDBM to get it return a copy of the byte[]. It didn't changed
anything, I still get the error after a few hundred thousands updates.
Next step was to start suspecting the AvlTree implementation. I decided
that rewriting this part was a sane idea. What a fool ... AvlTree is a
complex data structure, and writing an iterative implementation needs
time, and a hell of tests. Basically, as I was stuck with no other
ideas, I followed this idea, even if it was crazy. (some little voice in
my head was telling me I'm not any more a student, and this was a lost
of time, but the other side of my personality told me that I was smart
enough to do better than what we have ...). The bad thing about Avl
trees is that documentation about it is *very* rough on the internet.
There are some C implementations, but they use pointers, something we
don't have in Java, making it quite complex to implement. We also have
some specific needs, like we want to be able to move backward and
forward in the tree, adding more constraints.
After around 2 weeks and a half, having a partially working AVLTree
impelmentation, I just had a better idea. In the mean time, Kiran has
implemented an in-jdbm serialization, removing the previous potential
problem with byte[] copies.
So the new idea was that we don't need AvlTree, we just need a sorted
array, which was less costly when searching data into it (O(N) for a
sorted array, compared to a 1.44O(N) for an AvlTree). Sure, insertion
and removal was more costly, as it needs an array copy, but as we don't
do a lot of addition or removal, that was a good trade. More important,
it save a lot of memory, and serialization was way easier (memory shring
by 4, for references to an entry, as we have less pointers (left and
right child, previous and next leaf, and the balance)). Less memory
consumption means more cache, less garbage collection, less pointer
update, faster read and write on disk.
I ended by writing this ArrayTree implementation, which took me three
days, with all the necessary tests, and a bit of problems with the
current Cursor interface (there are some semantic I really don't like in
this area). Then I just had to substitute AvlTree with ArrayTree, and voilĂ .
Tests went up to 2 000 000 updates, without any problem. Twice. So I
closed the issue.
My understanding is that the previous AvlTree had a problem in the way
it managed next/previous index. I didn't spent time in analysing the
code, it was too complex. Last, not least, I added a lot of debug logs
in the server, and tried to see what the logs gave. That led me to
grep/sed a few gigabytes of logs, with little success. It just sucked
time, with no result except to eliminate other options.
Last, not least, what made me thing that the bug was a problem in
AvlTree in some very specific cases, is that we always had a NPE in the
very same line, and that the analyze of the byte[] taken from the JDBM
backend shown that there were some missing elements in the serialized
tree. As the tree was correctly serialized (I added some check in the
serialize() method : the byte[] was deserialized before being stored in
JDBM to be sure that the serialization was ok, and I had no problem in
it), that mean something was broken while rebuilding the AvlTree.
It now seems to work, but I'm not 100% sure the problem is really fixed.
I think I just pushed some dust in the trash bin, expecting that it was
not pushing them under the rug...
In any case, this part of the server needs a deep check, as I saw many
potential problems that needs to be addressed. We probably have to run
this multi-threaded test on a more powerful computer, to see if it is
really fixed.
--
--
cordialement, regards,
Emmanuel LĂ©charny
www.iktek.com
directory.apache.org