You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2014/09/25 14:37:33 UTC

[jira] [Commented] (OAK-2134) Lucene: not using the path restriction can speed up queries

    [ https://issues.apache.org/jira/browse/OAK-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147692#comment-14147692 ] 

Thomas Mueller commented on OAK-2134:
-------------------------------------

Some more details why using the path is slow: Lucene first expands the prefix restriction, that means, it will fetch all terms from the index that match the given path. If there are many paths that match (and in my case there are potentially millions of matches), this is slow. Then, Lucene searches all combinations of the other full-text condition(s) with the respective path. That means Lucene is simply not made for this use case.

Two solutions to investigate are:

* (a) only index the path prefix (for example, the 20 first characters of the path, or only the first 5 path entries), and

* (b) also index all parent paths of a node (not store those in the document; just index them); when querying, use an exact match for the parent.

Solution (a) will reduce the index size in most cases, but is not guaranteed to always solve the problem, if there are many nodes that have distinct short path (for example, a counter node near the root node).

Solution (b) will increase the index size (how much needs to be tested), and would solve the problem.

> Lucene: not using the path restriction can speed up queries
> -----------------------------------------------------------
>
>                 Key: OAK-2134
>                 URL: https://issues.apache.org/jira/browse/OAK-2134
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.1, 1.0.7
>
>
> Currently, the Oak Lucene index uses the path restriction in the hope that queries can be faster. However, I found that not using the path restriction is better (much better) in many cases. The following queries were run:
> {noformat}
> :fulltext:test
> +:fulltext:test +:path:/path/prefix/*
> {noformat}
> A workaround is to change the query, by removing the path restriction, and adding a 'like' conditions, as follows (for XPath):
> {noformat}
> ... and jcr:like(@jcr:path, '/path/prefix/%')
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)