You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Vladimir Olenin <VO...@cihi.ca> on 2006/09/26 23:23:44 UTC

term OR term OR term OR .... query question

Hi.
 
I have a question regarding Lucene scoring algorithm. Providing I have a
query "a OR b OR c OR d OR e OR f", and two documents: doc1 "a b c d"
and doc2 "d e", will doc1 score higher than doc2? In other words, does
Lucene takes into account the number of terms matched in the document in
case of the 'or' query?
 
Providing that I don't know the algorithms behind the Lucene, how does
'or' query time depends on the number of searched terms? Does it grow
linierly, exponentially? How does 'and' query time depends on the number
of searched terms? (it should decrease, right?)
 
Thanks.
 
Vlad

Re: term OR term OR term OR .... query question

Posted by Grant Ingersoll <gs...@syr.edu>.
See below.

Also, there is new Scoring documentation available via the website  
(http://lucene.apache.org/java/docs/scoring.html) that covers scoring  
in some detail.

On Sep 26, 2006, at 5:23 PM, Vladimir Olenin wrote:

> Hi.
>
> I have a question regarding Lucene scoring algorithm. Providing I  
> have a
> query "a OR b OR c OR d OR e OR f", and two documents: doc1 "a b c d"
> and doc2 "d e", will doc1 score higher than doc2? In other words, does
> Lucene takes into account the number of terms matched in the  
> document in
> case of the 'or' query?
>

Yes, it should score higher.  See the coord() factor as part of the  
similarity.

> Providing that I don't know the algorithms behind the Lucene, how does
> 'or' query time depends on the number of searched terms? Does it grow
> linierly, exponentially? How does 'and' query time depends on the  
> number
> of searched terms? (it should decrease, right?)
>

Not 100% on this, but that does make sense, pretty simple to test  
out, I think.    We are working on some benchmarks and this may be a  
good one to add to it.



--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org

Voice: 315-443-5484
Skype: grant_ingersoll
Fax: 315-443-6886




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org