You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Luca Cavanna (JIRA)" <ji...@apache.org> on 2014/05/30 18:03:02 UTC

[jira] [Created] (LUCENE-5718) More flexible compound queries (containing mtq) support in postings highlighter

Luca Cavanna created LUCENE-5718:
------------------------------------

             Summary: More flexible compound queries (containing mtq) support in postings highlighter
                 Key: LUCENE-5718
                 URL: https://issues.apache.org/jira/browse/LUCENE-5718
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/highlighter
    Affects Versions: 4.8.1
            Reporter: Luca Cavanna


The postings highlighter currently pulls the automata from multi term queries and doesn't require calling rewrite to make highlighting work. In order to do so it also needs to check whether the query is a compound one and eventually extract its subqueries. This is currently done in the MultiTermHighlighting class and works well but has two potential problems:

1) not all the possible compound queries are necessarily supported as we need to go over each of them one by one (see LUCENE-5717) and this requires keeping the "switch" up-to-date if new queries gets added to lucene
2) it doesn't support custom compound queries but only the set of queries available out-of-the-box

I've been thinking about how this can be improved and one of the ideas I came up with is to introduce a generic way to retrieve the subqueries from compound queries, like for instance have a new abstract base class with a getLeaves or getSubQueries method and have all the compound queries extend it. What this method would do is return a flat array of all the leaf queries that the compound query is made of. 

Not sure whether this would be needed in other places in lucene, but it doesn't seem like a small change and it would definitely affect (or benefit?) more than just the postings highlighter support for multi term queries.

In particular the second problem (custom queries) seems hard to solve without a way to expose this info directly from the query though, unless we want to make the MultiTermHighlighting#extractAutomata method extensible in some way.

Would like to hear what people think and work on this as soon as we identified which direction we want to take.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org