You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by stevef-pcbi <st...@pcbi.upenn.edu> on 2008/05/12 18:39:25 UTC

words close together - like google

hi, i am a newbie to text search, but need to evaluate lucene.

my question is this:  in a google query such as "prune scotch broom" it has
always seemed to me that the closer together the three words are found the
better the rank of the document.

(1) is that true?

(2) in the FAQ (http://wiki.apache.org/lucene-java/LuceneFAQ) it says this:
Does the position of the matches in the text affect the scoring?

No, the position of matches within a field does not affect ranking. 

does that mean that lucene does not support what i imagine google is doing?

(3) the lucene querying language described here
http://lucene.apache.org/java/2_3_2/queryparsersyntax.html seems very fancy. 
but, i don't understand why, in the most common use cases, i need it.  in
google, i just type some words, and it figures the rest out.  for example:
   - the more words i hit the better.  i don't need to specify AND or OR
   - the closer they are together the better.  i don't need to specify
distance requirements

thanks very much for explaining this, and please pardon and ignorance on my
part

steve
-- 
View this message in context: http://www.nabble.com/words-close-together---like-google-tp17189864p17189864.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Re: words close together - like google

Posted by Chris Hostetter <ho...@fucit.org>.
: Does the position of the matches in the text affect the scoring?
: 
: No, the position of matches within a field does not affect ranking. 
: 
: does that mean that lucene does not support what i imagine google is doing?

no .. that comment is in regards to basic term queries.  if you want the 
proximity of terms (to eachother) to affect the scoring this can be donw 
with a PhraseQuery or a SpanNearQuery.

: (3) the lucene querying language described here
: http://lucene.apache.org/java/2_3_2/queryparsersyntax.html seems very fancy. 
: but, i don't understand why, in the most common use cases, i need it.  in
: google, i just type some words, and it figures the rest out.  for example:
:    - the more words i hit the better.  i don't need to specify AND or OR
:    - the closer they are together the better.  i don't need to specify
: distance requirements

you don't have to use the QueryParser ... it's just there for convinience.  
you're free to parse your query strings into Query objects any way you 
want.

BTW: future questions about the java API will get more/better 
responses from the java-user list.


-Hoss