You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by cyang2010 <ys...@hotmail.com> on 2011/01/31 23:22:19 UTC

phrase, inidividual term, prefix, fuzzy and stemming search

My current project has the requirement to support search when user inputs any
number of terms across a few index fields (movie title, actor, director).

In order to maximize result, I plan to support all those searches listed in
the subject, phrase, individual term, prefix, fuzzy and stemming.  Of
course, score relevance in the right order is also important.

I have considered using dismax query.  However, it does not support prefix
query.  I am not sure if it supports fuzzy query, my guess is does not.

Therefore, i still need to use standard query.   For example, if someone
searches "deim moer" (typo for demi moore), i compare the phrase and terms
with each searchable fields (title, actor, director):


title_display: "deim moer"~30 actors: "deim moer"~30 directors: "deim
moer"~30    <--  OR

title_display: deim    <-- OR
actors: deim 
directors: deim 

title_display: deim*   <-- OR
actors: deim* 
directors: deim* 

title_display: deim~0.6   <-- OR
actors: deim~0.6 
directors: deim~0.6 

title_display: moer    <-- OR
actors: moer 
directors: moer 

title_display: moer*   <-- OR
actors: moer* 
directors: moer* 

title_display: moer~0.6    <-- OR
actors: moer~0.6 
directors: moer~0.6

The solr relevance score is sum for all those OR.  In that way, i can make
sure relevance score are in order.  For example, for the exact match ("deim
moer"), it will match phrase, term, prefix and fuzzy query all at the same
time.   Therefore, it will score higher than some input text only matchs
term, or prefix or fuzzy.     At the same time, i can apply boost to a
particular search field if requirement needs.


Does it sound right to you?  Is there better ways to achieve the same thing? 
My concern is my query is not going to perform, since it tries to do too
much.  But isn't that what people want to get (maximize result) when they
just type in a few search words?

Another question is that:  Can i combine the result of two query together? 
For example, first i query phrase and term match, next I query for prefix
match.  Can I just append the result for prefix match to that for
phrase/term match?   I thought two queries have different queryNorm,
therefore, the score is not comparable to each other so as to combine.  Is
it correct?


Thanks.  love to hear what your thought is.


-- 
View this message in context: http://lucene.472066.n3.nabble.com/phrase-inidividual-term-prefix-fuzzy-and-stemming-search-tp2391111p2391111.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: phrase, inidividual term, prefix, fuzzy and stemming search

Posted by Jay Hill <ja...@gmail.com>.
You mentioned that dismax does not support wildcards, but edismax does. Not
sure if dismax would have solved your other problems, or whether you just
had to shift gears because of the wildcard issue, but you might want to have
a look at edismax.

-Jay
http://www.lucidimagination.com


On Mon, Jan 31, 2011 at 2:22 PM, cyang2010 <ys...@hotmail.com> wrote:

>
> My current project has the requirement to support search when user inputs
> any
> number of terms across a few index fields (movie title, actor, director).
>
> In order to maximize result, I plan to support all those searches listed in
> the subject, phrase, individual term, prefix, fuzzy and stemming.  Of
> course, score relevance in the right order is also important.
>
> I have considered using dismax query.  However, it does not support prefix
> query.  I am not sure if it supports fuzzy query, my guess is does not.
>
> Therefore, i still need to use standard query.   For example, if someone
> searches "deim moer" (typo for demi moore), i compare the phrase and terms
> with each searchable fields (title, actor, director):
>
>
> title_display: "deim moer"~30 actors: "deim moer"~30 directors: "deim
> moer"~30    <--  OR
>
> title_display: deim    <-- OR
> actors: deim
> directors: deim
>
> title_display: deim*   <-- OR
> actors: deim*
> directors: deim*
>
> title_display: deim~0.6   <-- OR
> actors: deim~0.6
> directors: deim~0.6
>
> title_display: moer    <-- OR
> actors: moer
> directors: moer
>
> title_display: moer*   <-- OR
> actors: moer*
> directors: moer*
>
> title_display: moer~0.6    <-- OR
> actors: moer~0.6
> directors: moer~0.6
>
> The solr relevance score is sum for all those OR.  In that way, i can make
> sure relevance score are in order.  For example, for the exact match ("deim
> moer"), it will match phrase, term, prefix and fuzzy query all at the same
> time.   Therefore, it will score higher than some input text only matchs
> term, or prefix or fuzzy.     At the same time, i can apply boost to a
> particular search field if requirement needs.
>
>
> Does it sound right to you?  Is there better ways to achieve the same
> thing?
> My concern is my query is not going to perform, since it tries to do too
> much.  But isn't that what people want to get (maximize result) when they
> just type in a few search words?
>
> Another question is that:  Can i combine the result of two query together?
> For example, first i query phrase and term match, next I query for prefix
> match.  Can I just append the result for prefix match to that for
> phrase/term match?   I thought two queries have different queryNorm,
> therefore, the score is not comparable to each other so as to combine.  Is
> it correct?
>
>
> Thanks.  love to hear what your thought is.
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/phrase-inidividual-term-prefix-fuzzy-and-stemming-search-tp2391111p2391111.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: phrase, inidividual term, prefix, fuzzy and stemming search

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

I'll admit I didn't read your email closely, but the first part makes me thing 
that ngrams, which I don't think you mentioned, might be handy for you here, 
allowing for misspellings without the implementation complexity.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: cyang2010 <ys...@hotmail.com>
> To: solr-user@lucene.apache.org
> Sent: Mon, January 31, 2011 5:22:19 PM
> Subject: phrase, inidividual term, prefix, fuzzy and stemming search
> 
> 
> My current project has the requirement to support search when user inputs  any
> number of terms across a few index fields (movie title, actor,  director).
> 
> In order to maximize result, I plan to support all those  searches listed in
> the subject, phrase, individual term, prefix, fuzzy and  stemming.  Of
> course, score relevance in the right order is also  important.
> 
> I have considered using dismax query.  However, it does  not support prefix
> query.  I am not sure if it supports fuzzy query, my  guess is does not.
> 
> Therefore, i still need to use standard query.    For example, if someone
> searches "deim moer" (typo for demi moore), i compare  the phrase and terms
> with each searchable fields (title, actor,  director):
> 
> 
> title_display: "deim moer"~30 actors: "deim moer"~30  directors: "deim
> moer"~30    <--  OR
> 
> title_display:  deim    <-- OR
> actors: deim 
> directors: deim 
> 
> title_display: deim*   <-- OR
> actors: deim* 
> directors:  deim* 
> 
> title_display: deim~0.6   <-- OR
> actors: deim~0.6 
> directors: deim~0.6 
> 
> title_display: moer    <--  OR
> actors: moer 
> directors: moer 
> 
> title_display: moer*    <-- OR
> actors: moer* 
> directors: moer* 
> 
> title_display:  moer~0.6    <-- OR
> actors: moer~0.6 
> directors:  moer~0.6
> 
> The solr relevance score is sum for all those OR.  In that  way, i can make
> sure relevance score are in order.  For example, for the  exact match ("deim
> moer"), it will match phrase, term, prefix and fuzzy query  all at the same
> time.   Therefore, it will score higher than some input  text only matchs
> term, or prefix or fuzzy.     At the same time, i  can apply boost to a
> particular search field if requirement  needs.
> 
> 
> Does it sound right to you?  Is there better ways to  achieve the same thing? 
> My concern is my query is not going to perform,  since it tries to do too
> much.  But isn't that what people want to get  (maximize result) when they
> just type in a few search words?
> 
> Another  question is that:  Can i combine the result of two query together? 
> For  example, first i query phrase and term match, next I query for  prefix
> match.  Can I just append the result for prefix match to that  for
> phrase/term match?   I thought two queries have different  queryNorm,
> therefore, the score is not comparable to each other so as to  combine.  Is
> it correct?
> 
> 
> Thanks.  love to hear what your  thought is.
> 
> 
> -- 
> View this message in context: 
>http://lucene.472066.n3.nabble.com/phrase-inidividual-term-prefix-fuzzy-and-stemming-search-tp2391111p2391111.html
>
> Sent  from the Solr - User mailing list archive at Nabble.com.
>