You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Preetam Rao <bl...@gmail.com> on 2008/07/15 10:45:17 UTC

Dismax request handler and sub phrase matches... suggestion for another handler..

Hi,

Apologies if you are receiving it second time...having tough time with mail
server..

I take a user entered query as it is and run it with dismax query handler.
The documents fields have been filled from structured data, where different
fields have different attributes like number of beds, number of baths, city
name etc. A sample user query would look like "3 bed homes in new york". I
would like this to match against city:new york and beds:3 beds. When I use
dismax handler with boosts and tie parameter, I do not always get the most
relevant top 10 results because there seem to be many factors in play one of
which is not being able to recognize the presence of sub phrases and
secondly not being able to ignore unwanted matches in unwanted fields.

What are your thoughts on having one more request handler like dismax, but
which uses a sub-phrase query instead of dismax query ?
It would also provide the below parameters, on per field basis, to help
customize the behavior of the request handler, and give more flexibility in
different scenarios.
.
phraseBoost - how better is a 3 word sub phrase match than 2 word sub phrase
match
useOnlyMaxMatch - If many sub phrases match in the field, only the best
score is used.
ignoreDuplicates - If a field has duplicate matches, pick only one match for
scoring.
matchOnlyOneField - if match is found in the first field, remove the matched
terms while querying the other fields. For example, for me city match is
more important than in other fields. So,, I do not want the"new" in new york
to match all other fields and skew the results, which is what i am seeing
with dismax, irrespective of the high boosts.
ignoreSomeLuceneScorefactors - Ignore the lucene tf, idf, query norm or any
such criteria which is not needed for this field., since if I want exact
matches only, they are really not important. They also seem to play a big
role in me not being to get most relevant top 10 results.

I see this handler might be useful in the below use cases -
a) data is mostly exact in that, I am not trying to search on free text
like, mails, reviews, articles, web pages etc
b) numbers and their binding are important
c) exact phrase or sub phrase matches are more important than rankings
derived from tf, idf, query norm etc.
d) need to make sure that in some cases some fields affect the scoring and
in some they don't. I found this was the most difficult task, to trace the
noise matches from the required ones for my use case.

Your thoughts and suggestions on alternatives are welcome.

Have also posted a question on sub phrase matching in lucene-user which is
not related to having a solr handler with additional features like
sub-phrase matching, for user entered queries.

Thanks
Preetam

Re: Dismax request handler and sub phrase matches... suggestion for another handler..

Posted by Preetam Rao <bl...@gmail.com>.
I agree. If we do decide to implement another kind of request handler, it
should be through StandardRequesthandler def type attribute, which selects
the registered QParser which generates appropriate queries for lucene.

------------
Preetam

On Tue, Jul 15, 2008 at 3:59 PM, Erik Hatcher <er...@ehatchersolutions.com>
wrote:

>
> On Jul 15, 2008, at 4:45 AM, Preetam Rao wrote:
>
>> What are your thoughts on having one more request handler like dismax, but
>> which uses a sub-phrase query instead of dismax query ?
>>
>
> It'd be better to just implement a QParser(Plugin) such that the
> StandardRequestHandler can use it (&defType=dismax, for example).
>
> No need to have additional actual request handlers just to swap out query
> parsing logic anymore.
>
>        Erik
>
>

Re: Dismax request handler and sub phrase matches... suggestion for another handler..

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 15, 2008, at 4:45 AM, Preetam Rao wrote:
> What are your thoughts on having one more request handler like  
> dismax, but
> which uses a sub-phrase query instead of dismax query ?

It'd be better to just implement a QParser(Plugin) such that the  
StandardRequestHandler can use it (&defType=dismax, for example).

No need to have additional actual request handlers just to swap out  
query parsing logic anymore.

	Erik