You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Luis Rodrigo Aguado <lr...@isoco.com> on 2006/11/21 10:02:57 UTC

Combining scores

Hi all,

I am working in a project that, for each query from the user, builds 
four or five different queries and tries to combine the results. The 
first part is already working, but, as I have read that the scores from 
different queries are not comparable at all among them, I am a bit stuck 
in the second part. Which could be a good strategy to get a unified 
score merging the results from different queries over the same index. Is 
there anything already done on that?

I have searched through the list, but all I have found is threads like this:

http://article.gmane.org/gmane.comp.jakarta.lucene.user/10810

that only warn about the incompatibility of scores, but not any clue 
about how to solve it.

Thanks!

Luis.

-- 

*Luis Rodrigo Aguado*

Innovation and R&D

Research Manager

lrodrigo(at)isoco.com

#T  +34913349777

C/Pedro de Valdivia, 10

28006, Madrid, Spain

* *

*iSOCO** *

            intelligent software for the networked economy

            www.isoco.com <http://www.isoco.com/>

 

Este mensaje se dirige exclusivamente a su destinatario y puede contener 
información privilegiada o confidencial. Si no es vd. el destinatario 
indicado, queda notificado de que la utilización, divulgación y/o copia 
sin autorización está prohibida en virtud de la legislación vigente. Si 
ha recibido este mensaje por error, le rogamos que nos lo comunique 
inmediatamente por esta misma vía y proceda a su destrucción.

 

This message is intended exclusively for its addressee and may contain 
information that is CONFIDENTIAL and protected by professional 
privilege. If you are not the intended recipient you are hereby notified 
that any dissemination, copy or disclosure of this communication is 
strictly prohibited by law. If this message has been received in error, 
please immediately notify us via e-mail and delete it.


Re: Combining scores

Posted by Erick Erickson <er...@gmail.com>.
This is a *really* simplistic approach, but why not just submit all 4 or 5
queries at once ina BooleanQuery and let Lucene do all the work for you? Or
are the 4 or 5 queries such that they don't combine easily with MUST,
MUST_NOT or SHOULD in a BooleanQuery?

Best
Erick

On 11/21/06, Luis Rodrigo Aguado <lr...@isoco.com> wrote:
>
> Hi all,
>
> I am working in a project that, for each query from the user, builds
> four or five different queries and tries to combine the results. The
> first part is already working, but, as I have read that the scores from
> different queries are not comparable at all among them, I am a bit stuck
> in the second part. Which could be a good strategy to get a unified
> score merging the results from different queries over the same index. Is
> there anything already done on that?
>
> I have searched through the list, but all I have found is threads like
> this:
>
> http://article.gmane.org/gmane.comp.jakarta.lucene.user/10810
>
> that only warn about the incompatibility of scores, but not any clue
> about how to solve it.
>
> Thanks!
>
> Luis.
>
> --
>
> *Luis Rodrigo Aguado*
>
> Innovation and R&D
>
> Research Manager
>
> lrodrigo(at)isoco.com
>
> #T  +34913349777
>
> C/Pedro de Valdivia, 10
>
> 28006, Madrid, Spain
>
> * *
>
> *iSOCO** *
>
>             intelligent software for the networked economy
>
>             www.isoco.com <http://www.isoco.com/>
>
>
>
> Este mensaje se dirige exclusivamente a su destinatario y puede contener
> información privilegiada o confidencial. Si no es vd. el destinatario
> indicado, queda notificado de que la utilización, divulgación y/o copia
> sin autorización está prohibida en virtud de la legislación vigente. Si
> ha recibido este mensaje por error, le rogamos que nos lo comunique
> inmediatamente por esta misma vía y proceda a su destrucción.
>
>
>
> This message is intended exclusively for its addressee and may contain
> information that is CONFIDENTIAL and protected by professional
> privilege. If you are not the intended recipient you are hereby notified
> that any dissemination, copy or disclosure of this communication is
> strictly prohibited by law. If this message has been received in error,
> please immediately notify us via e-mail and delete it.
>
>
>

Re: Combining scores

Posted by José Ramón Pérez Agüera <jo...@fdi.ucm.es>.
i've some code to do that, but it is not really friendly yet :-(

Anyway is quite simple. You need merge the postings that you obtain for the differents queries using TermDocs. With TermDocs you obtain the internal ids for the docs related to terms. If you merge the TermDocs for each word that appear in the queries that you want to merge, you obtain the postings for a set of queries and, therefore you can compute the score for this set.

i hope to help

jose

José Ramón Pérez Agüera

Visiting Research Scholar at Yahoo! Research Spain.
Ocata 1, 1st floor 08003 Barcelona Catalunya, Spain

Dept. de Ingeniería del Software e Inteligencia Artificial
Despacho 411 tlf. 913947599
Facultad de Informática
Universidad Complutense de Madrid

----- Mensaje original -----
De: Luis Rodrigo Aguado <lr...@isoco.com>
Fecha: Martes, Noviembre 21, 2006 12:39 pm
Asunto: Combining scores
A: java-user@lucene.apache.org

> Hi all,
> 
> I am working in a project that, for each query from the user, 
> builds 
> four or five different queries and tries to combine the results. 
> The 
> first part is already working, but, as I have read that the 
> scores from 
> different queries are not comparable at all among them, I am a 
> bit stuck 
> in the second part. Which could be a good strategy to get a 
> unified 
> score merging the results from different queries over the same 
> index. Is 
> there anything already done on that?
> 
> I have searched through the list, but all I have found is 
> threads like this:
> 
> http://article.gmane.org/gmane.comp.jakarta.lucene.user/10810
> 
> that only warn about the incompatibility of scores, but not any 
> clue 
> about how to solve it.
> 
> Thanks!
> 
> Luis.
> 
> -- 
> 
> *Luis Rodrigo Aguado*
> 
> Innovation and R&D
> 
> Research Manager
> 
> lrodrigo(at)isoco.com
> 
> #T  +34913349777
> 
> C/Pedro de Valdivia, 10
> 
> 28006, Madrid, Spain
> 
> * *
> 
> *iSOCO** *
> 
>             intelligent software for the networked economy
> 
>             www.isoco.com <http://www.isoco.com/>
> 
>  
> 
> Este mensaje se dirige exclusivamente a su destinatario y puede 
> contener 
> información privilegiada o confidencial. Si no es vd. el 
> destinatario 
> indicado, queda notificado de que la utilización, divulgación 
> y/o copia 
> sin autorización está prohibida en virtud de la legislación 
> vigente. Si 
> ha recibido este mensaje por error, le rogamos que nos lo 
> comunique 
> inmediatamente por esta misma vía y proceda a su destrucción.
> 
>  
> 
> This message is intended exclusively for its addressee and may 
> contain 
> information that is CONFIDENTIAL and protected by professional 
> privilege. If you are not the intended recipient you are hereby 
> notified 
> that any dissemination, copy or disclosure of this communication 
> is 
> strictly prohibited by law. If this message has been received in 
> error, 
> please immediately notify us via e-mail and delete it.
> 
> 

José Ramón Pérez Agüera

Visiting Research Scholar at Yahoo! Research Spain.
Ocata 1, 1st floor 08003 Barcelona Catalunya, Spain

Dept. de Ingeniería del Software e Inteligencia Artificial
Despacho 411 tlf. 913947599
Facultad de Informática
Universidad Complutense de Madrid

Re: Combining scores

Posted by karl wettin <ka...@gmail.com>.
21 nov 2006 kl. 10.02 skrev Luis Rodrigo Aguado:

> Hi all,
>
> I am working in a project that, for each query from the user,  
> builds four or five different queries and tries to combine the  
> results. The first part is already working, but, as I have read  
> that the scores from different queries are not comparable at all  
> among them, I am a bit stuck in the second part. Which could be a  
> good strategy to get a unified score merging the results from  
> different queries over the same index. Is there anything already  
> done on that?

Perhaps this can help you:

I've written something I call a QueryFork, used for simplified  
queries when the user does not specify any operators (field:, +, -,  
et c). In essence it builds a number of queries from the same text  
and place them in the order of their priority. The collector discards  
any documents already matched. For each query results to be collected  
I normlize the score based on the bottom score of the previous results.

So something like this:

1. Fork results from phrase query
2. Fork results from non sequencial near query
3. Fork results from boolean MUST term queries

Each query can be placed in multiple fields with multiple boosts. See  
it as an alternative to MultiFieldQueryParser.

I could put the code in the Jira if you want. This is the core:

class ForkCollector extends HitCollector {

     private ConcurrentLinkedQueue<ForkHit> collected = new  
ConcurrentLinkedQueue<ForkHit>();
     private float iterationLowScore = 1;
     private float norm = 1;

     private Set<Integer> collectedDocumentNumbers = new  
HashSet<Integer>(100);

     public synchronized void collect(int doc, float score) {
         if (!collectedDocumentNumbers.contains(doc)) {
             collectedDocumentNumbers.add(doc);
             ForkHit hit = new ForkHit(doc, score * norm);
             collected.add(hit);
             if (hit.getScore() < iterationLowScore) {
                 iterationLowScore = hit.getScore();
             }
         }
     }

     void prepareNextCollectionIteration() {
         norm = iterationLowScore;
     }


     ConcurrentLinkedQueue<ForkHit> getCollected() {
         return collected;
     }
}


public class ForkSearcher {

     private List<Fork> forks = new ArrayList<Fork>();

     /**
      * @param searcher
      * @param analyzer
      * @param forkQuery
      * @return null if no Fork was applicable (thus no search was  
placed)
      * @throws IOException
      */
     public ForkHit[] search(Searcher searcher, Analyzer analyzer,  
ForkQuery forkQuery) throws IOException {

         boolean forkUsed = false;

         ForkCollector collector = new ForkCollector();
         for (Fork fork : getForks()) {
             Query query = fork.queryFactory(forkQuery, analyzer);
             if (query != null) {
                 if (forkQuery.getStaticQuery() != null) {
                     BooleanQuery q = new BooleanQuery();
                     q.add(new BooleanClause(forkQuery.getStaticQuery 
(), BooleanClause.Occur.MUST));
                     q.add(new BooleanClause(query,  
BooleanClause.Occur.MUST));

                     searcher.search(q, collector);
                     collector.prepareNextCollectionIteration();

                     forkUsed = true;
                 } else {
                     searcher.search(query, collector);
                     collector.prepareNextCollectionIteration();
                     forkUsed = true;
                 }
             }
         }

         if (!forkUsed) {
             return null;
         }

         ForkHit[] forkHits = collector.getCollected().toArray(new  
ForkHit[collector.getCollected().size()]);
         Arrays.sort(forkHits, new Comparator<ForkHit>() {
             public int compare(ForkHit forkHit, ForkHit forkHit1) {
                 return Float.compare(forkHit.getScore(),  
forkHit1.getScore());
             }
         });
         return forkHits;
     }


public interface Fork {

     /**
      * @param forkQuery
      * @param analyzer
      * @return null if fork is not applicable
      */
     Query queryFactory(ForkQuery forkQuery, Analyzer analyzer)  
throws IOException;
}



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org