You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Leonid Bolshinsky <le...@gmail.com> on 2013/08/26 13:15:00 UTC

Boosting potential phrases when using QueryParser

We are using QueryParser.parse(userEnteredQuery) to get a programmatic
Query object.
We would like to boost documents that contain some of the query terms as
"mini phrases".
For example, when the user searches for: *professional development leader*,
we would like to get back all the documents that contain all the 3 terms,
but rank higher documents that contain some of the terms next to each other
like: "*professional development*" or "*development leader*" or "*professional
development leader*".
We want to keep using QueryParser and avoid dealing with it's syntax.
Therefore, if the user query text contains special QueryParser characters,
then we prefer to give up on adding the phrase boost.
For example, for user query* professional^0.5 -development
title:leader *we wouldn't
use the phrase boost.
We assume that's something that other people would need as well.
Is there any standard solution?

The naive approach could be manually checking if the user query contains
any Lucene syntax characters, like (+ - ~ ^ ) etc.
Then splitting the user query into terms by white spaces, creating phrase
queries from the combinations of terms and adding them as SHOULD
(optionally with some boosting) to the original query (which is MUST).

Any other ideas or known solutions?
And what about the performance implications of the proposed naive solution?
How adding a significant number of additional phrase queries with SHOULD is
likely to affect the search time performance?