You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2013/05/08 13:35:15 UTC

[jira] [Commented] (OAK-622) Improve QueryIndex interface

    [ https://issues.apache.org/jira/browse/OAK-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651804#comment-13651804 ] 

Thomas Mueller commented on OAK-622:
------------------------------------

In a discussion with Alex Parvulescu and Tommaso Teofili, we decided to not change the interface currently, but instead improve documentation for those methods that were not fully clear, this is completed in revision 1480226.

We also discussed to add a marker interface FulltextQueryIndex, which (if implemented) flags that the given index may support more than just the minimal fulltext query syntax. If this index is used, then the query engine is supposed to *not* verify the fulltext constraint(s) for the given selector.

We need to support for the "rep:excerpt()" feature of Jackrabbit 2.x. One idea is to add this property to the filter (without actual restriction) if the query contains this column. That way the index can detect that "rep:excerpt()" is needed. The excerpt is retrieved using the regular way (Cursor.next() and then IndexRow.getValue("rep:excerpt")).

Later on, we still may want to change the query index interface, but just now it seems the extensive changes originally proposed above are not yet needed.

                
> Improve QueryIndex interface
> ----------------------------
>
>                 Key: OAK-622
>                 URL: https://issues.apache.org/jira/browse/OAK-622
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: query
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Minor
>
> The current QueryIndex interface is quite simple, but doesn't address some of the required features and more advanced optimizations that are possible:
> - For fulltext queries, it doesn't address the case where the index implementation has a different understanding of the fulltext condition than what is described in the JCR spec (the basic features).
> - For queries with "order by" it would be good to know if the index supports returning the data in sorted order, and if yes, how much slower that would be (if it is slower). So a index might have multiple strategies with different costs.
> - It's quite easy to misunderstand what getCost is supposed to do exactly. The new API should have a clearer solution here.
> - Even if the query doesn't have "order by", the index might return the data in a sorted way, which might help improving query performance (using a merge join)
> - The cost is currently a single value, it might be better to estimate the number of nodes, the cost to run a query, and the cost per node. That way we could optimize to quickly return the first few nodes (versus optimize for thoughput).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira