You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Vinci <vi...@polyu.edu.hk> on 2008/04/06 14:06:21 UTC

Phrase matching question

Hi all,

While I am dealing with the headache unicode normalization(it drives people
into crazy;) ),I would like to ask, how Solr dealing with the phrase
matching? If I set all the symbol as a stop word, would I put myself into a
risk that I get a error matching?
e.g. A field contains:
This is a Unicode, normalization drive peopel into crazy. 5.0 is too far
away. 
=> (stopword removal) This is a Unicode   normalization drive peopel into
crazy   5.0 is too far away.
Then phrase "crazy 5.0" or "Unicode normalization" we have a match for this
field?

Thank you,
Vinci
-- 
View this message in context: http://www.nabble.com/Phrase-matching-question-tp16523669p16523669.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase matching question

Posted by Vinci <vi...@polyu.edu.hk>.

Hi,

Thank hossman. You get it right :)
Seems the only solution is doing stopword removal in the query but not the
index....it may be problem if the index is very large, but it is okay for me 
(also it allows user not killing the phrase "to be or not to be")


hossman wrote:
> 
> 
> I'm not sure if i understand your question ...  "unicode 
> normalization" seems to be coming up in both your message and your 
> example; plus your example suggests stop words are being removed, even 
> though none of te words are removed from your original "field contains" 
> string.
> 
> I *think* what you are asking is "if i remove stop words during analysis, 
> will the words on either side of a stop word match as a phrase?"
> Example:
> original text:   The Quick Brown Fox Jumped A Dog
> after analysis:      quick brown fox jumped dog
> 
> Aill that match a phrase query for "jumped dog" ?
> 
> At the moment, the answer is "yes" because stop word removal doesn't 
> reserve a position gap.  
> 
> there is an open issue to add an option so you can turn position gaps on 
> when stop words are removed.  when that gets commited the answer will be 
> "maybe" depending on wether you allow any slop on your phrase query.
> 
> 
> : 
> : While I am dealing with the headache unicode normalization(it drives
> people
> : into crazy;) ),I would like to ask, how Solr dealing with the phrase
> : matching? If I set all the symbol as a stop word, would I put myself
> into a
> : risk that I get a error matching?
> : e.g. A field contains:
> : This is a Unicode, normalization drive peopel into crazy. 5.0 is too far
> : away. 
> : => (stopword removal) This is a Unicode   normalization drive peopel
> into
> : crazy   5.0 is too far away.
> : Then phrase "crazy 5.0" or "Unicode normalization" we have a match for
> this
> : field?
> : 
> : Thank you,
> : Vinci
> : -- 
> : View this message in context:
> http://www.nabble.com/Phrase-matching-question-tp16523669p16523669.html
> : Sent from the Solr - User mailing list archive at Nabble.com.
> : 
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Phrase-matching-question-tp16523669p16645196.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase matching question

Posted by Chris Hostetter <ho...@fucit.org>.

I'm not sure if i understand your question ... "unicode
normalization" seems to be coming up in both your message and your
example; plus your example suggests stop words are being removed, even
though none of te words are removed from your original "field contains"
string.

I *think* what you are asking is "if i remove stop words during analysis,
will the words on either side of a stop word match as a phrase?"
Example:
original text: The Quick Brown Fox Jumped A Dog
after analysis: quick brown fox jumped dog

Aill that match a phrase query for "jumped dog" ?

At the moment, the answer is "yes" because stop word removal doesn't
reserve a position gap.

there is an open issue to add an option so you can turn position gaps on
when stop words are removed. when that gets commited the answer will be
"maybe" depending on wether you allow any slop on your phrase query.

:
: While I am dealing with the headache unicode normalization(it drives people
: into crazy;) ),I would like to ask, how Solr dealing with the phrase
: matching? If I set all the symbol as a stop word, would I put myself into a
: risk that I get a error matching?
: e.g. A field contains:
: This is a Unicode, normalization drive peopel into crazy. 5.0 is too far
: away.
: => (stopword removal) This is a Unicode normalization drive peopel into
: crazy 5.0 is too far away.
: Then phrase "crazy 5.0" or "Unicode normalization" we have a match for this
: field?
:
: Thank you,
: Vinci
: --
: View this message in context: http://www.nabble.com/Phrase-matching-question-tp16523669p16523669.html
: Sent from the Solr - User mailing list archive at Nabble.com.
:

-Hoss