You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2016/10/05 07:03:20 UTC
[jira] [Comment Edited] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected

    [ https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547892#comment-15547892 ] 

Thomas Mueller edited comment on OAK-4887 at 10/5/16 7:02 AM:
--------------------------------------------------------------

Yes, the cost of ordering is not accounted for within the query engine (org.apache.jackrabbit.oak.query.QueryImpl.getBestSelectorExecutionPlan). For small results that easily fit in memory, let's say 1000 rows, the cost of ordering itself is very low, almost negligible. But one problem is that with "order by propertyName", the whole result needs to be read in memory before the first row can be returned. Without "order by", the first row can be returned much faster.

If there are two indexes with similar cost, one that returns rows in sorted order, and one that returns rows unsorted, then it would be better to use the sorted index. How much better is hard to say, it depends a lot on the number of rows that we expected to be read, and that number is not known.

Maybe we should support a way to specify the number of rows that the query engine should optimize for; that is, the number of rows that are _expected_ to be read. This would be similar to the "fastfirstrow" option / "option (fast <n>)" in MS SQL Server: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/09a6060a-1f72-4438-a3b2-209c240ee4d6/fastfirstrow?forum=transactsql . Currently, we support the "limit" option, but maybe a user needs to read more than <n> rows in some cases, so the limit is not known.

By the way, currently, if the chosen index supports ordering, then ordering is not done afterwards. So this part is working fine.





was (Author: tmueller):
Yes, the cost of ordering is not accounted for within the query engine (org.apache.jackrabbit.oak.query.QueryImpl.getBestSelectorExecutionPlan). For small results that easily fit in memory, let's say 1000 rows, the cost of ordering itself is very low, almost negligible. But one problem is that with "order by propertyName", the whole result needs to be read in memory before the first row can be returned. Without "order by", the first row can be returned much faster.

If there are two indexes with similar cost, one that returns rows in sorted order, and one that returns rows unsorted, then it would be better to use that index. How much better is hard to say, it depends a lot on the number of rows that we expected to be read, and that number is not known.

Maybe we should support a way to specify the number of rows that the query engine should optimize for; that is, the number of rows that are _expected_ to be read. This would be similar to the "fastfirstrow" option / "option (fast <n>)" in MS SQL Server: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/09a6060a-1f72-4438-a3b2-209c240ee4d6/fastfirstrow?forum=transactsql . Currently, we support the "limit" option, but maybe a user needs to read more than <n> rows in some cases, so the limit is not known.

By the way, currently, if the chosen index supports ordering, then ordering is not done afterwards. So this part is working fine.




> Query cost estimation: ordering by an unindexed property not reflected
> ----------------------------------------------------------------------
>
>                 Key: OAK-4887
>                 URL: https://issues.apache.org/jira/browse/OAK-4887
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.4.2
>            Reporter: Alexander Klimetschek
>            Assignee: Thomas Mueller
>             Fix For: 1.6
>
>
> A query that orders by an unindexed property seems to have no effect on the cost estimation, compared to the same query without the order by, although it has a big impact on the execution performance for larger results/indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)