You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lécharny <el...@gmail.com> on 2013/08/01 20:17:04 UTC

Performances : some other potential improvements

Hi guys,

I did some more profiling today. here are some thouhts about what I have
found, and places we can improve the code

1) Comparators

While processing a search, we are comparing keys with what we have in
trees over and over. If the BTree has N level and M elements per page,
in order to find an entry, we will do N*log2(M) comparisons. In my test,
for 200 000 searches, we are doing around 4 x 5 comparison per search,
so around 4 millions calls to the compare() method. In fact, way more
than 4 millions :

44 881 820  99384.1    5.9 
org.apache.mavibot.btree.AbstractPage:compare(Ljava/lang/Object;Ljava/lang/Object;)I

So the comparator is called plenty of times. Let's have a look at where
it gets called :

o DefaultSearchEngine.computeResult() calls the db.getEntryId( baseDn
)method
This is to check if the baseDn exist. The getEntryId will use the
RdnIndex to chec that. The problem is that this index can only handle
RDNs, and not DN, so depending on how deep is the baseDN (ie, how many
RDN it has), we will have to do many searches in this index.

In any case, we will do many comparisons to find the right Dn

o DefaultOptimizer.annotate() method calls the idx.count() method
Here, we are trying to know how many candiates we will get using one
index. If the AttributeType is singleValued, we will get either 0 (if
the index has the ExprNode value) or 1 (of the index does not have the
ExprNode value). If the AttributeType is multiValued, we will get the
number of associated values for the given key. This is for an
equalityScan, for a <= or >=, the count is computed differently.

Again, we have many comparisons to do here (and this is multiplicated by
the number of index we have in the filter).

At some point, the cost of doing all those comparisons largely exceed
the cost of fetching useless candidates from the MasterTable (we are
spending 8 more times fetching data from the indexes than grabing the
entries from the masterTable)

How can we improve this ?
-------------------------

o The first step is tryng to retrieve the entryUUID associated with the
baseDN. It's likely to be done many times, so we could benefit from
using a <Dn, entryUUID> cache, to avoid those costly operations (keep in
mind that we not only do comparisons, we also potentially fetch pages
from the disk if they are not present in memory). A MRU cache will save
us a lot of time here.

o At this point, we compare normalized values with normalized values.
That means we don't have to normalize the values again. Sadly, this is
what we do :

    PrepareString.insignifiantSpacesString(String, boolean) line: 4803   
    PrepareString.normalize(String, PrepareString$StringType) line: 244   
    DeepTrimToLowerNormalizer.normalize(String) line: 103   
    CachingNormalizer.normalize(String) line: 124   
   
DeepTrimToLowerCachingNormalizingComparator(NormalizingComparator).compare(String,
String) line: 76   
    DeepTrimToLowerCachingNormalizingComparator.compare(String, String)
line: 32   
   
DeepTrimToLowerCachingNormalizingComparator(NormalizingComparator).compare(Object,
Object) line: 36   
    SerializableComparator<E>.compare(E, E) line: 86   
    Node<K,V>(AbstractPage<K,V>).compare(K, K) line: 254   
    Node<K,V>(AbstractPage<K,V>).findPos(K) line: 189   
    Node<K,V>.getValues(K) line: 858   
    BTree<K,V>.getValues(K) line: 962   
    MavibotTable<K,V>.count(K) line: 526   
    MavibotIndex<K,V>.count(K) line: 260   
    DefaultOptimizer<E>.getEqualityScan(SimpleNode<V>) line: 304   
    DefaultOptimizer<E>.annotate(ExprNode) line: 148   
    DefaultOptimizer<E>.getConjunctionScan(BranchNode) line: 245   
    DefaultOptimizer<E>.annotate(ExprNode) line: 185   
    DefaultSearchEngine.computeResult(SchemaManager,
SearchOperationContext) line: 220   
   
MavibotPartition(AbstractBTreePartition).search(SearchOperationContext)
line: 1032   

As we can see, we are using the AT comparator (here, this is for 'cn')
which will normalize the value, which is already normalized by the
NormalizerInterceptor (for the filter and for the stored value). There
is no need to normalize neither the key we are looking for nor the
stored key. For the record, the prepareString.insignifiantSpacesString()
is extremely voracious :

12046499 303 480   18% 
org.apache.directory.api.ldap.model.schema.PrepareString:insignifiantSpacesString(Ljava/lang/String;Z)Ljava/lang/String;

The first number is the number of time the method is called, the second
the time it takes to execute, and the third number the percentage of
global CPU it takes.

Now, it's a bit hard to get rid of this computation : the comparator is
associated with the AttributeType, and we  don't have one which does not
normalize the value for the server, and another one that normalize the
values for the client (keep in mind that the SchemaManager is used on
the client and the server). So how do we distinguish the use case we are
in ? Tht's the key ; if we are in the server, we should not normalize,
and if we are on the client, we must normalize...

Atm, I have no idea on how to do that. The only thing is that it's
extremely critical to avoid this extra computation.

2) Annotate() and build() methods

Those two methods are called by the computeResult() method. They are
approximately as expensive. They mainly do the same thing :
 - find the number of candidate
 - construct a set of entryUUID based on the smallest index.

In the test I'm running the filter is like (cn=entryNNN) where NNN is a
random number. Basically, we are spending as much time to find the
number of candidate (1) in the annotate() method, and to fetch him in
the build() method.


How can we improve this ?
-------------------------

It's not simple. One can think that an equality filter wil return only a
few values, or even 1 or 0. this is true for the kind of filter I'm
using n this test, but it's not true for a filter like
(ObjectClass=person), which can select thousands of candidates.

One possible way would be to get rid of the count() method, and to fetch
the candidate immediately instead, up to a number of candidate (100 ?).
The fetched candidate will be stored associated with the annotated node,
and can be used immediately by the build() method, instead of being
feteched a second time.

For single valued AttributeType, it's even simpler : either the index
contains the value, or not, but in any case the count will be zero or
one. In this case, it's obvious that we must fetch the entryUUID
immediately and store it with the ExprNode, as it will be selected as
the smallest index.

For multi-valued AttributeType, as the value is stored in a sub-btree,
we can decide to fetch the set of candidate if this sub-btree contains
less than a number of candidate, or store the number of candidates.

Implementing those two improvements is not a big deal, and the gain will
be huge.



Conclusion
----------

I'm not done with the analysis of the profiling session, but if we can
work on the two items I have mentionned in this mail, we would get
already a great deal of improvement. Once done, we can conduct another
profiling session to find which part need to be worked out.

thoughts ?

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 



Re: Performances : some other potential improvements

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 8/1/13 9:21 PM, Kiran Ayyagari a écrit :
> On Thu, Aug 1, 2013 at 11:47 PM, Emmanuel Lécharny <el...@gmail.com>wrote:
>
>> How can we improve this ?
>> -------------------------
>>
>> o The first step is tryng to retrieve the entryUUID associated with the
>> baseDN. It's likely to be done many times, so we could benefit from
>> using a <Dn, entryUUID> cache, to avoid those costly operations (keep in
>> mind that we not only do comparisons, we also potentially fetch pages
>> from the disk if they are not present in memory). A MRU cache will save
>> us a lot of time here.

>> we have a DN cache (DnFactory) but we are not using it effectively

This is a bit different, but we could change it to store the entry's
UUID and leverage it. Sounds like a good idea.
>
> Now, it's a bit hard to get rid of this computation : the comparator is
> associated with the AttributeType, and we  don't have one which does not
> normalize the value for the server, and another one that normalize the
> values for the client (keep in mind that the SchemaManager is used on
> the client and the server). So how do we distinguish the use case we are
> in ? Tht's the key ; if we are in the server, we should not normalize,
> and if we are on the client, we must normalize...
>
> Atm, I have no idea on how to do that. The only thing is that it's
> extremely critical to avoid this extra computation.
>
> thinking about it  realized that a flag based check to avoid normalization
> is
> dangerous and will break an embedded server (where the user uses
> CoreSession API)

Well, the normalization occurs after the coreSession, so we should be safe.


>
> How can we improve this ?
> -------------------------
>
> It's not simple. One can think that an equality filter wil return only a
> few values, or even 1 or 0. this is true for the kind of filter I'm
> using n this test, but it's not true for a filter like
> (ObjectClass=person), which can select thousands of candidates.
>
> One possible way would be to get rid of the count() method, and to fetch
> the candidate immediately instead, up to a number of candidate (100 ?).
> The fetched candidate will be stored associated with the annotated node,
> and can be used immediately by the build() method, instead of being
> feteched a second time.
>
> count() method in Mavibot is efficient and accurate (cause it stores the
> number
> of elements present in the tree), so I would still lean towards using count

This is true if you are considering the counter at the top level of a
tree. But the question is not really about using this counter or not,
but how to spare the double check for the candidates.

As soon as we can grab the count of candidates, it's almost free to get
the candidate and to store them somwhere withong the ExprNode, to reuse
them in the build() method, without having to go down the tree again
down to the key.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 


Re: Performances : some other potential improvements

Posted by Kiran Ayyagari <ka...@apache.org>.
On Thu, Aug 1, 2013 at 11:47 PM, Emmanuel Lécharny <el...@gmail.com>wrote:

> Hi guys,
>
> I did some more profiling today. here are some thouhts about what I have
> found, and places we can improve the code
>
> 1) Comparators
>
> While processing a search, we are comparing keys with what we have in
> trees over and over. If the BTree has N level and M elements per page,
> in order to find an entry, we will do N*log2(M) comparisons. In my test,
> for 200 000 searches, we are doing around 4 x 5 comparison per search,
> so around 4 millions calls to the compare() method. In fact, way more
> than 4 millions :
>
> 44 881 820  99384.1    5.9
>
> org.apache.mavibot.btree.AbstractPage:compare(Ljava/lang/Object;Ljava/lang/Object;)I
>
> So the comparator is called plenty of times. Let's have a look at where
> it gets called :
>
> o DefaultSearchEngine.computeResult() calls the db.getEntryId( baseDn
> )method
> This is to check if the baseDn exist. The getEntryId will use the
> RdnIndex to chec that. The problem is that this index can only handle
> RDNs, and not DN, so depending on how deep is the baseDN (ie, how many
> RDN it has), we will have to do many searches in this index.
>
> In any case, we will do many comparisons to find the right Dn
>
> o DefaultOptimizer.annotate() method calls the idx.count() method
> Here, we are trying to know how many candiates we will get using one
> index. If the AttributeType is singleValued, we will get either 0 (if
> the index has the ExprNode value) or 1 (of the index does not have the
> ExprNode value). If the AttributeType is multiValued, we will get the
> number of associated values for the given key. This is for an
> equalityScan, for a <= or >=, the count is computed differently.
>
> Again, we have many comparisons to do here (and this is multiplicated by
> the number of index we have in the filter).
>
> At some point, the cost of doing all those comparisons largely exceed
> the cost of fetching useless candidates from the MasterTable (we are
> spending 8 more times fetching data from the indexes than grabing the
> entries from the masterTable)
>
> How can we improve this ?
> -------------------------
>
> o The first step is tryng to retrieve the entryUUID associated with the
> baseDN. It's likely to be done many times, so we could benefit from
> using a <Dn, entryUUID> cache, to avoid those costly operations (keep in
> mind that we not only do comparisons, we also potentially fetch pages
> from the disk if they are not present in memory). A MRU cache will save
> us a lot of time here.
>
> we have a DN cache (DnFactory) but we are not using it effectively

o At this point, we compare normalized values with normalized values.
> That means we don't have to normalize the values again. Sadly, this is
> what we do :
>
>     PrepareString.insignifiantSpacesString(String, boolean) line: 4803
>     PrepareString.normalize(String, PrepareString$StringType) line: 244
>     DeepTrimToLowerNormalizer.normalize(String) line: 103
>     CachingNormalizer.normalize(String) line: 124
>
>
> DeepTrimToLowerCachingNormalizingComparator(NormalizingComparator).compare(String,
> String) line: 76
>     DeepTrimToLowerCachingNormalizingComparator.compare(String, String)
> line: 32
>
>
> DeepTrimToLowerCachingNormalizingComparator(NormalizingComparator).compare(Object,
> Object) line: 36
>     SerializableComparator<E>.compare(E, E) line: 86
>     Node<K,V>(AbstractPage<K,V>).compare(K, K) line: 254
>     Node<K,V>(AbstractPage<K,V>).findPos(K) line: 189
>     Node<K,V>.getValues(K) line: 858
>     BTree<K,V>.getValues(K) line: 962
>     MavibotTable<K,V>.count(K) line: 526
>     MavibotIndex<K,V>.count(K) line: 260
>     DefaultOptimizer<E>.getEqualityScan(SimpleNode<V>) line: 304
>     DefaultOptimizer<E>.annotate(ExprNode) line: 148
>     DefaultOptimizer<E>.getConjunctionScan(BranchNode) line: 245
>     DefaultOptimizer<E>.annotate(ExprNode) line: 185
>     DefaultSearchEngine.computeResult(SchemaManager,
> SearchOperationContext) line: 220
>
> MavibotPartition(AbstractBTreePartition).search(SearchOperationContext)
> line: 1032
>
> As we can see, we are using the AT comparator (here, this is for 'cn')
> which will normalize the value, which is already normalized by the
> NormalizerInterceptor (for the filter and for the stored value). There
> is no need to normalize neither the key we are looking for nor the
> stored key. For the record, the prepareString.insignifiantSpacesString()
> is extremely voracious :
>
> 12046499 303 480   18%
>
> org.apache.directory.api.ldap.model.schema.PrepareString:insignifiantSpacesString(Ljava/lang/String;Z)Ljava/lang/String;
>
> The first number is the number of time the method is called, the second
> the time it takes to execute, and the third number the percentage of
> global CPU it takes.
>
> Now, it's a bit hard to get rid of this computation : the comparator is
> associated with the AttributeType, and we  don't have one which does not
> normalize the value for the server, and another one that normalize the
> values for the client (keep in mind that the SchemaManager is used on
> the client and the server). So how do we distinguish the use case we are
> in ? Tht's the key ; if we are in the server, we should not normalize,
> and if we are on the client, we must normalize...
>
> Atm, I have no idea on how to do that. The only thing is that it's
> extremely critical to avoid this extra computation.
>
> thinking about it  realized that a flag based check to avoid normalization
is
dangerous and will break an embedded server (where the user uses
CoreSession API)

2) Annotate() and build() methods
>
> Those two methods are called by the computeResult() method. They are
> approximately as expensive. They mainly do the same thing :
>  - find the number of candidate
>  - construct a set of entryUUID based on the smallest index.
>
> In the test I'm running the filter is like (cn=entryNNN) where NNN is a
> random number. Basically, we are spending as much time to find the
> number of candidate (1) in the annotate() method, and to fetch him in
> the build() method.
>
>
> How can we improve this ?
> -------------------------
>
> It's not simple. One can think that an equality filter wil return only a
> few values, or even 1 or 0. this is true for the kind of filter I'm
> using n this test, but it's not true for a filter like
> (ObjectClass=person), which can select thousands of candidates.
>
> One possible way would be to get rid of the count() method, and to fetch
> the candidate immediately instead, up to a number of candidate (100 ?).
> The fetched candidate will be stored associated with the annotated node,
> and can be used immediately by the build() method, instead of being
> feteched a second time.
>
> count() method in Mavibot is efficient and accurate (cause it stores the
number
of elements present in the tree), so I would still lean towards using count

> For single valued AttributeType, it's even simpler : either the index
> contains the value, or not, but in any case the count will be zero or
> one. In this case, it's obvious that we must fetch the entryUUID
> immediately and store it with the ExprNode, as it will be selected as
> the smallest index.
>
> For multi-valued AttributeType, as the value is stored in a sub-btree,
> we can decide to fetch the set of candidate if this sub-btree contains
> less than a number of candidate, or store the number of candidates.
>
> Implementing those two improvements is not a big deal, and the gain will
> be huge.
>
>
>
> Conclusion
> ----------
>
> I'm not done with the analysis of the profiling session, but if we can
> work on the two items I have mentionned in this mail, we would get
> already a great deal of improvement. Once done, we can conduct another
> profiling session to find which part need to be worked out.
>
> thoughts ?
>
> I can join you tomorrow in experimenting with these

thanks for the excellent analysis

> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>
>


-- 
Kiran Ayyagari
http://keydap.com