You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by shai deljo <sh...@gmail.com> on 2007/03/19 00:21:08 UTC

Score / Sort question

How do i force SOLR to score documents that contain ALL terms 1st
before results that contain some of the terms?
The problem is that i don't want to use an AND (since i am also
interested in the OR results) but i do want to score documents that
contain all terms higher.
Please advise,
Thanks

Re: Score / Sort question

Posted by shai deljo <sh...@gmail.com>.
Yep, I get it. But from the tests I did it tips it enough for those
cases to be rare (and probably justified).
Thx

On 3/19/07, Chris Hostetter <ho...@fucit.org> wrote:
> : wouldn't doing something like this in the query :
> : (field1:tag1 tag2) OR (field1:tag1 AND tag2)
>
> : The documents that have all the tags (tag1 and tag2)  will comply with
> : both conditions and get scores from both while the documents that
> : don't have both tags will only get a score from the 1st (the OR)
> : condition therfore won't have higher score.
>
> correct ... but it won't *garuntee* the order you asked about ... the
> tf/idf could still tweak your ordering a bit in some circumstances, even
> if you did this...
>
> 	(field1:tag1 tag2) OR (field1:tag1 AND tag2)^1000
>
>
>
>
>
> -Hoss
>
>

Re: Score / Sort question

Posted by Chris Hostetter <ho...@fucit.org>.
: wouldn't doing something like this in the query :
: (field1:tag1 tag2) OR (field1:tag1 AND tag2)

: The documents that have all the tags (tag1 and tag2)  will comply with
: both conditions and get scores from both while the documents that
: don't have both tags will only get a score from the 1st (the OR)
: condition therfore won't have higher score.

correct ... but it won't *garuntee* the order you asked about ... the
tf/idf could still tweak your ordering a bit in some circumstances, even
if you did this...

	(field1:tag1 tag2) OR (field1:tag1 AND tag2)^1000





-Hoss


Re: Score / Sort question

Posted by shai deljo <sh...@gmail.com>.
Hey Chris,
wouldn't doing something like this in the query :
(field1:tag1 tag2) OR (field1:tag1 AND tag2)

Achieve similar affect ?

The documents that have all the tags (tag1 and tag2)  will comply with
both conditions and get scores from both while the documents that
don't have both tags will only get a score from the 1st (the OR)
condition therfore won't have higher score.

is this right ?



On 3/18/07, Chris Hostetter <ho...@fucit.org> wrote:
>
> : How do i force SOLR to score documents that contain ALL terms 1st
> : before results that contain some of the terms?
>
> generally speaking this is hte result you will usually on random data ...
> under the covers Lucene uses TF/IDF based weighting of terms, with a coord
> factor that penalizes queries that don't match all clauses -- but i'm sure
> it's possible that sometimes the tf is so high and the idf so low, that
> the score from one term can dominate.
>
> the only solution to your problem that i can think of is to write a custom
> Similarity class where tf and idf are fixed so only the coordFactor
> matters.
>
>
>
>
> -Hoss
>
>

Re: Score / Sort question

Posted by Walter Underwood <wu...@netflix.com>.
An example would help. A query and the results that you see.

wunder

On 3/18/07 6:48 PM, "shai deljo" <sh...@gmail.com> wrote:

> I assumed the tf/idf would behave like this but it's behaving VERY
> differently/wrong so i wonder maybe something is wrong with my
> indexing strategy ?
> I think for a quicker solution (ok, hack :) ) I'll run two different
> queries (AND, OR) and merge them.
> Does SOLR support some kind of merging i can leverage or do i need to
> do it manually ?
> Thanks,
> 
> 
> On 3/18/07, Chris Hostetter <ho...@fucit.org> wrote:
>> 
>> : How do i force SOLR to score documents that contain ALL terms 1st
>> : before results that contain some of the terms?
>> 
>> generally speaking this is hte result you will usually on random data ...
>> under the covers Lucene uses TF/IDF based weighting of terms, with a coord
>> factor that penalizes queries that don't match all clauses -- but i'm sure
>> it's possible that sometimes the tf is so high and the idf so low, that
>> the score from one term can dominate.
>> 
>> the only solution to your problem that i can think of is to write a custom
>> Similarity class where tf and idf are fixed so only the coordFactor
>> matters.
>> 
>> 
>> 
>> 
>> -Hoss
>> 
>> 


Re: Score / Sort question

Posted by shai deljo <sh...@gmail.com>.
I assumed the tf/idf would behave like this but it's behaving VERY
differently/wrong so i wonder maybe something is wrong with my
indexing strategy ?
I think for a quicker solution (ok, hack :) ) I'll run two different
queries (AND, OR) and merge them.
Does SOLR support some kind of merging i can leverage or do i need to
do it manually ?
Thanks,


On 3/18/07, Chris Hostetter <ho...@fucit.org> wrote:
>
> : How do i force SOLR to score documents that contain ALL terms 1st
> : before results that contain some of the terms?
>
> generally speaking this is hte result you will usually on random data ...
> under the covers Lucene uses TF/IDF based weighting of terms, with a coord
> factor that penalizes queries that don't match all clauses -- but i'm sure
> it's possible that sometimes the tf is so high and the idf so low, that
> the score from one term can dominate.
>
> the only solution to your problem that i can think of is to write a custom
> Similarity class where tf and idf are fixed so only the coordFactor
> matters.
>
>
>
>
> -Hoss
>
>

Re: Score / Sort question

Posted by Chris Hostetter <ho...@fucit.org>.
: How do i force SOLR to score documents that contain ALL terms 1st
: before results that contain some of the terms?

generally speaking this is hte result you will usually on random data ...
under the covers Lucene uses TF/IDF based weighting of terms, with a coord
factor that penalizes queries that don't match all clauses -- but i'm sure
it's possible that sometimes the tf is so high and the idf so low, that
the score from one term can dominate.

the only solution to your problem that i can think of is to write a custom
Similarity class where tf and idf are fixed so only the coordFactor
matters.




-Hoss