You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Vladimir Olenin <VO...@cihi.ca> on 2006/09/26 23:23:44 UTC
term OR term OR term OR .... query question
Hi.
I have a question regarding Lucene scoring algorithm. Providing I have a
query "a OR b OR c OR d OR e OR f", and two documents: doc1 "a b c d"
and doc2 "d e", will doc1 score higher than doc2? In other words, does
Lucene takes into account the number of terms matched in the document in
case of the 'or' query?
Providing that I don't know the algorithms behind the Lucene, how does
'or' query time depends on the number of searched terms? Does it grow
linierly, exponentially? How does 'and' query time depends on the number
of searched terms? (it should decrease, right?)
Thanks.
Vlad
Re: term OR term OR term OR .... query question
Posted by Grant Ingersoll <gs...@syr.edu>.
See below.
Also, there is new Scoring documentation available via the website
(http://lucene.apache.org/java/docs/scoring.html) that covers scoring
in some detail.
On Sep 26, 2006, at 5:23 PM, Vladimir Olenin wrote:
> Hi.
>
> I have a question regarding Lucene scoring algorithm. Providing I
> have a
> query "a OR b OR c OR d OR e OR f", and two documents: doc1 "a b c d"
> and doc2 "d e", will doc1 score higher than doc2? In other words, does
> Lucene takes into account the number of terms matched in the
> document in
> case of the 'or' query?
>
Yes, it should score higher. See the coord() factor as part of the
similarity.
> Providing that I don't know the algorithms behind the Lucene, how does
> 'or' query time depends on the number of searched terms? Does it grow
> linierly, exponentially? How does 'and' query time depends on the
> number
> of searched terms? (it should decrease, right?)
>
Not 100% on this, but that does make sense, pretty simple to test
out, I think. We are working on some benchmarks and this may be a
good one to add to it.
--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Skype: grant_ingersoll
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org