You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by "Martin Alderson (JIRA)" <ji...@apache.org> on 2007/05/16 17:52:16 UTC

[jira] Created: (DIRSERVER-933) Slow searches using a non-indexed attribute in a filter

Slow searches using a non-indexed attribute in a filter
-------------------------------------------------------

                 Key: DIRSERVER-933
                 URL: https://issues.apache.org/jira/browse/DIRSERVER-933
             Project: Directory ApacheDS
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.5.1
            Reporter: Martin Alderson


When searching for entries in a specific container with a filter such as (cn=*) and the cn attribute is not indexed, the server has to test each entry in the partition even when the search has been restricted to a container.

As an example of how bad this could be - if a partition contains millions of entries and the user does a search in that partition within a container that only contains 1 of those entries, every entry in the partition is checked in turn even though the server knows there is only one entry within the specified container.

This is due to the search optimizer which annotates each part of the filter with the number of entries that match where it can.  For those it can't (such as with attributes that are not indexed) this 'count' will default to Long.MAX_VALUE - to indicate that it is the worst case.  (See org.apache.directory.server.core.partition.impl.btree.DefaultOptimizer).

When these count annotations are checked to decide which part of the filter to use first they are dropped down to integers which means the items with the worst case value of Long.MAX_VALUE become -1 -- effectively making them the best case.  (See org.apache.directory.server.core.partition.impl.btree.ExpressionEnumerator.enumConj).

Disclaimer: I have not done any performance testing on this.  I just noticed the problem while stepping through the code with a debugger.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DIRSERVER-933) Slow searches using a non-indexed attribute in a filter

Posted by "Emmanuel Lecharny (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DIRSERVER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496353 ] 

Emmanuel Lecharny commented on DIRSERVER-933:
---------------------------------------------

Interesting...

If you do a search starting at some point of the tree (dc=example,dc=org) with a filter mapped on a non-indexed attribute, then this is just normal that you have to search through all the values. 

*BUT*, what should be done is to first count the number of elements starting at "dc=example,dc=org" point. We have to check that the counter associated with the hierarchical index we have is used or not (100$ bet on *not*, as you got a full scan ;)



> Slow searches using a non-indexed attribute in a filter
> -------------------------------------------------------
>
>                 Key: DIRSERVER-933
>                 URL: https://issues.apache.org/jira/browse/DIRSERVER-933
>             Project: Directory ApacheDS
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.5.0
>            Reporter: Martin Alderson
>            Priority: Critical
>             Fix For: 1.5.1
>
>
> When searching for entries in a specific container with a filter such as (cn=*) and the cn attribute is not indexed, the server has to test each entry in the partition even when the search has been restricted to a container.
> As an example of how bad this could be - if a partition contains millions of entries and the user does a search in that partition within a container that only contains 1 of those entries, every entry in the partition is checked in turn even though the server knows there is only one entry within the specified container.
> This is due to the search optimizer which annotates each part of the filter with the number of entries that match where it can.  For those it can't (such as with attributes that are not indexed) this 'count' will default to Long.MAX_VALUE - to indicate that it is the worst case.  (See org.apache.directory.server.core.partition.impl.btree.DefaultOptimizer).
> When these count annotations are checked to decide which part of the filter to use first they are dropped down to integers which means the items with the worst case value of Long.MAX_VALUE become -1 -- effectively making them the best case.  (See org.apache.directory.server.core.partition.impl.btree.ExpressionEnumerator.enumConj).
> Disclaimer: I have not done any performance testing on this.  I just noticed the problem while stepping through the code with a debugger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DIRSERVER-933) Slow searches using a non-indexed attribute in a filter

Posted by "Alex Karasulu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DIRSERVER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Karasulu updated DIRSERVER-933:
------------------------------------

             Priority: Critical  (was: Major)
    Affects Version/s:     (was: 1.5.1)
                       1.5.0
        Fix Version/s: 1.5.1

This is incredibly insightful Martin and a very good catch.  I will raise the priority of this issue and make sure we get a fix for it in 1.5.1.  If you are interested in working with us to improve the performance of the JDBM partition feel free to supply a patch.  We would love to have more sharp contributors like yourself become committers.  Thanks again for catching this bug and describing it so well in this JIRA.

NOTE: Fixing this issue will result in properly leveraging the system indices, namely the normalized name index to take DN and scope into account.  This will dramatically improve search performance when no attribute in the search filter is indexed.

> Slow searches using a non-indexed attribute in a filter
> -------------------------------------------------------
>
>                 Key: DIRSERVER-933
>                 URL: https://issues.apache.org/jira/browse/DIRSERVER-933
>             Project: Directory ApacheDS
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.5.0
>            Reporter: Martin Alderson
>            Priority: Critical
>             Fix For: 1.5.1
>
>
> When searching for entries in a specific container with a filter such as (cn=*) and the cn attribute is not indexed, the server has to test each entry in the partition even when the search has been restricted to a container.
> As an example of how bad this could be - if a partition contains millions of entries and the user does a search in that partition within a container that only contains 1 of those entries, every entry in the partition is checked in turn even though the server knows there is only one entry within the specified container.
> This is due to the search optimizer which annotates each part of the filter with the number of entries that match where it can.  For those it can't (such as with attributes that are not indexed) this 'count' will default to Long.MAX_VALUE - to indicate that it is the worst case.  (See org.apache.directory.server.core.partition.impl.btree.DefaultOptimizer).
> When these count annotations are checked to decide which part of the filter to use first they are dropped down to integers which means the items with the worst case value of Long.MAX_VALUE become -1 -- effectively making them the best case.  (See org.apache.directory.server.core.partition.impl.btree.ExpressionEnumerator.enumConj).
> Disclaimer: I have not done any performance testing on this.  I just noticed the problem while stepping through the code with a debugger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DIRSERVER-933) Slow searches using a non-indexed attribute in a filter

Posted by "Emmanuel Lecharny (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DIRSERVER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496354 ] 

Emmanuel Lecharny commented on DIRSERVER-933:
---------------------------------------------

Alex, I think it's just a matter of using hierarchyIdx to get the count, no ? As far as I understand the backend, of course ;)

> Slow searches using a non-indexed attribute in a filter
> -------------------------------------------------------
>
>                 Key: DIRSERVER-933
>                 URL: https://issues.apache.org/jira/browse/DIRSERVER-933
>             Project: Directory ApacheDS
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.5.0
>            Reporter: Martin Alderson
>            Priority: Critical
>             Fix For: 1.5.1
>
>
> When searching for entries in a specific container with a filter such as (cn=*) and the cn attribute is not indexed, the server has to test each entry in the partition even when the search has been restricted to a container.
> As an example of how bad this could be - if a partition contains millions of entries and the user does a search in that partition within a container that only contains 1 of those entries, every entry in the partition is checked in turn even though the server knows there is only one entry within the specified container.
> This is due to the search optimizer which annotates each part of the filter with the number of entries that match where it can.  For those it can't (such as with attributes that are not indexed) this 'count' will default to Long.MAX_VALUE - to indicate that it is the worst case.  (See org.apache.directory.server.core.partition.impl.btree.DefaultOptimizer).
> When these count annotations are checked to decide which part of the filter to use first they are dropped down to integers which means the items with the worst case value of Long.MAX_VALUE become -1 -- effectively making them the best case.  (See org.apache.directory.server.core.partition.impl.btree.ExpressionEnumerator.enumConj).
> Disclaimer: I have not done any performance testing on this.  I just noticed the problem while stepping through the code with a debugger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (DIRSERVER-933) Slow searches using a non-indexed attribute in a filter

Posted by "Martin Alderson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DIRSERVER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Alderson closed DIRSERVER-933.
-------------------------------------


> Slow searches using a non-indexed attribute in a filter
> -------------------------------------------------------
>
>                 Key: DIRSERVER-933
>                 URL: https://issues.apache.org/jira/browse/DIRSERVER-933
>             Project: Directory ApacheDS
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.5.0
>            Reporter: Martin Alderson
>            Priority: Critical
>             Fix For: 1.5.1
>
>
> When searching for entries in a specific container with a filter such as (cn=*) and the cn attribute is not indexed, the server has to test each entry in the partition even when the search has been restricted to a container.
> As an example of how bad this could be - if a partition contains millions of entries and the user does a search in that partition within a container that only contains 1 of those entries, every entry in the partition is checked in turn even though the server knows there is only one entry within the specified container.
> This is due to the search optimizer which annotates each part of the filter with the number of entries that match where it can.  For those it can't (such as with attributes that are not indexed) this 'count' will default to Long.MAX_VALUE - to indicate that it is the worst case.  (See org.apache.directory.server.core.partition.impl.btree.DefaultOptimizer).
> When these count annotations are checked to decide which part of the filter to use first they are dropped down to integers which means the items with the worst case value of Long.MAX_VALUE become -1 -- effectively making them the best case.  (See org.apache.directory.server.core.partition.impl.btree.ExpressionEnumerator.enumConj).
> Disclaimer: I have not done any performance testing on this.  I just noticed the problem while stepping through the code with a debugger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DIRSERVER-933) Slow searches using a non-indexed attribute in a filter

Posted by "Emmanuel Lecharny (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DIRSERVER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496373 ] 

Emmanuel Lecharny commented on DIRSERVER-933:
---------------------------------------------

Ok, I have fixed the integer overflow :
http://svn.apache.org/viewvc?view=rev&rev=538684

Can you test the new server from tunks to see if it work faster with this fix ?
Thanks !

> Slow searches using a non-indexed attribute in a filter
> -------------------------------------------------------
>
>                 Key: DIRSERVER-933
>                 URL: https://issues.apache.org/jira/browse/DIRSERVER-933
>             Project: Directory ApacheDS
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.5.0
>            Reporter: Martin Alderson
>            Priority: Critical
>             Fix For: 1.5.1
>
>
> When searching for entries in a specific container with a filter such as (cn=*) and the cn attribute is not indexed, the server has to test each entry in the partition even when the search has been restricted to a container.
> As an example of how bad this could be - if a partition contains millions of entries and the user does a search in that partition within a container that only contains 1 of those entries, every entry in the partition is checked in turn even though the server knows there is only one entry within the specified container.
> This is due to the search optimizer which annotates each part of the filter with the number of entries that match where it can.  For those it can't (such as with attributes that are not indexed) this 'count' will default to Long.MAX_VALUE - to indicate that it is the worst case.  (See org.apache.directory.server.core.partition.impl.btree.DefaultOptimizer).
> When these count annotations are checked to decide which part of the filter to use first they are dropped down to integers which means the items with the worst case value of Long.MAX_VALUE become -1 -- effectively making them the best case.  (See org.apache.directory.server.core.partition.impl.btree.ExpressionEnumerator.enumConj).
> Disclaimer: I have not done any performance testing on this.  I just noticed the problem while stepping through the code with a debugger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (DIRSERVER-933) Slow searches using a non-indexed attribute in a filter

Posted by "Emmanuel Lecharny (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DIRSERVER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Emmanuel Lecharny resolved DIRSERVER-933.
-----------------------------------------

    Resolution: Fixed

Thanks Martin !!!

You have pretty sharp eyes :) I have totally missed this potential overflow. It has been fixed in http://svn.apache.org/viewvc?view=rev&rev=539036

I'm really impressed by the fact you went so deep into the server and found this nasty bug ;)

I mark this bug a resolved, as it seems that the previous fix solved your performance pb, just repopen it if it's not teh case.

Thanks !

> Slow searches using a non-indexed attribute in a filter
> -------------------------------------------------------
>
>                 Key: DIRSERVER-933
>                 URL: https://issues.apache.org/jira/browse/DIRSERVER-933
>             Project: Directory ApacheDS
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.5.0
>            Reporter: Martin Alderson
>            Priority: Critical
>             Fix For: 1.5.1
>
>
> When searching for entries in a specific container with a filter such as (cn=*) and the cn attribute is not indexed, the server has to test each entry in the partition even when the search has been restricted to a container.
> As an example of how bad this could be - if a partition contains millions of entries and the user does a search in that partition within a container that only contains 1 of those entries, every entry in the partition is checked in turn even though the server knows there is only one entry within the specified container.
> This is due to the search optimizer which annotates each part of the filter with the number of entries that match where it can.  For those it can't (such as with attributes that are not indexed) this 'count' will default to Long.MAX_VALUE - to indicate that it is the worst case.  (See org.apache.directory.server.core.partition.impl.btree.DefaultOptimizer).
> When these count annotations are checked to decide which part of the filter to use first they are dropped down to integers which means the items with the worst case value of Long.MAX_VALUE become -1 -- effectively making them the best case.  (See org.apache.directory.server.core.partition.impl.btree.ExpressionEnumerator.enumConj).
> Disclaimer: I have not done any performance testing on this.  I just noticed the problem while stepping through the code with a debugger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DIRSERVER-933) Slow searches using a non-indexed attribute in a filter

Posted by "Martin Alderson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DIRSERVER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496522 ] 

Martin Alderson commented on DIRSERVER-933:
-------------------------------------------

Hi guys, thanks for the prompt response.

We need one extra change on top of what Emmanuel has done.  In org.apache.directory.server.core.partition.impl.btree.ExpressionEnumerator.enumConj at line 250 we now have:
  value = ( ( Long ) child.get( "count" ) ).intValue();
This needs to be 
  value = ( ( Long ) child.get( "count" ) ).longValue();
to avoid the integer overflow.

I have tested with this extra change and it works well (i.e. the scope is taken into account before matching with the non-indexed attributes in the search filter).

> Slow searches using a non-indexed attribute in a filter
> -------------------------------------------------------
>
>                 Key: DIRSERVER-933
>                 URL: https://issues.apache.org/jira/browse/DIRSERVER-933
>             Project: Directory ApacheDS
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.5.0
>            Reporter: Martin Alderson
>            Priority: Critical
>             Fix For: 1.5.1
>
>
> When searching for entries in a specific container with a filter such as (cn=*) and the cn attribute is not indexed, the server has to test each entry in the partition even when the search has been restricted to a container.
> As an example of how bad this could be - if a partition contains millions of entries and the user does a search in that partition within a container that only contains 1 of those entries, every entry in the partition is checked in turn even though the server knows there is only one entry within the specified container.
> This is due to the search optimizer which annotates each part of the filter with the number of entries that match where it can.  For those it can't (such as with attributes that are not indexed) this 'count' will default to Long.MAX_VALUE - to indicate that it is the worst case.  (See org.apache.directory.server.core.partition.impl.btree.DefaultOptimizer).
> When these count annotations are checked to decide which part of the filter to use first they are dropped down to integers which means the items with the worst case value of Long.MAX_VALUE become -1 -- effectively making them the best case.  (See org.apache.directory.server.core.partition.impl.btree.ExpressionEnumerator.enumConj).
> Disclaimer: I have not done any performance testing on this.  I just noticed the problem while stepping through the code with a debugger.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.