You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "denis.zhdanov" <de...@gmail.com> on 2013/10/16 08:03:55 UTC

PhraseQuery boost doesn't affect ScoreDoc.score

Hello,

Have a question about default PhraseQuery boost processing. The 
Query.setBoost()
<http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/Query.html#setBoost(float)>  
says:
/
Sets the boost for this query clause to b. Documents matching this clause
will (in addition to the normal weightings) have their score multiplied by b
/

However, that's not true for /PhraseQuery/. Example:

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.PhraseQuery;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;

import java.io.IOException;

public class Test {
    public static void main(String[] args) throws IOException {
        RAMDirectory dir = new RAMDirectory();
        Version version = Version.LUCENE_44;
        try (IndexWriter writer = new IndexWriter(dir, new
IndexWriterConfig(version, new StandardAnalyzer(version)))) {
            Document document = new Document();
            document.add(new TextField("data", "1 2 3", Field.Store.YES));
            writer.addDocument(document);
        }

        IndexSearcher searcher = new
IndexSearcher(DirectoryReader.open(dir));
        search(searcher, 1);
        search(searcher, 5);
    }

    private static void search(IndexSearcher searcher, float boost) throws
IOException {
        PhraseQuery query = new PhraseQuery();
        query.add(new Term("data", "2"));
        query.add(new Term("data", "3"));
        query.setBoost(boost);
        TopDocs hits = searcher.search(query, 1);
        assert hits != null && hits.scoreDocs.length == 1;
        ScoreDoc doc = hits.scoreDocs[0];
        System.out.printf("Boost %g, score %g:%n%s%n", boost, doc.score,
searcher.explain(query, doc.doc));
    }
}

*Output:*
/
Boost 1.00000, score 0.306853:
0.30685282 = (MATCH) weight(data:"2 3" in 0) [DefaultSimilarity], result of:
  0.30685282 = fieldWeight in 0, product of:
    1.0 = tf(freq=1.0), with freq of:
      1.0 = phraseFreq=1.0
    0.61370564 = idf(), sum of:
      0.30685282 = idf(docFreq=1, maxDocs=1)
      0.30685282 = idf(docFreq=1, maxDocs=1)
    0.5 = fieldNorm(doc=0)

Boost 5.00000, score 0.306853:
0.30685282 = (MATCH) weight(data:"2 3"^5.0 in 0) [DefaultSimilarity], result
of:
  0.30685282 = fieldWeight in 0, product of:
    1.0 = tf(freq=1.0), with freq of:
      1.0 = phraseFreq=1.0
    0.61370564 = idf(), sum of:
      0.30685282 = idf(docFreq=1, maxDocs=1)
      0.30685282 = idf(docFreq=1, maxDocs=1)
    0.5 = fieldNorm(doc=0)
/

I.e. resulting ScoreDoc.score is the same for boosted and non-boosted query.
I see that that contradicts to the /Query.setBoost()/ contract. Did I miss
something?



--
View this message in context: http://lucene.472066.n3.nabble.com/PhraseQuery-boost-doesn-t-affect-ScoreDoc-score-tp4095791.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: PhraseQuery boost doesn't affect ScoreDoc.score

Posted by Ian Lea <ia...@gmail.com>.
Boosting query clauses means more "this clause is more important than
that clause" rather than "make the score for this search higher".  I
use it for biblio searching when want to search across multiple fields
and want matches in titles to be more important than matches in
blurbs..  Amended version of your program, pasted below, produces this
output

$ java Test3 title | grep -e Query -e First
Query: title:"amber eyes" blurb:"amber eyes": 2 hits, boost=1.0 on title
First: The Hare with Amber Eyes, Boost 1.00000, score 0.353553:
Query: title:"amber eyes"^5.0 blurb:"amber eyes": 2 hits, boost=5.0 on title
First: The Hare with Amber Eyes, Boost 5.00000, score 0.490290:


$ java Test3 blurb | grep -e Query -e First
Query: title:"amber eyes" blurb:"amber eyes": 2 hits, boost=1.0 on blurb
First: The Hare with Amber Eyes, Boost 1.00000, score 0.353553:
Query: title:"amber eyes" blurb:"amber eyes"^5.0: 2 hits, boost=5.0 on blurb
First: Some Book, Boost 5.00000, score 0.429004:

The first run boosts matches on title, the second boosts matches on
blurb and this affects the result ordering when boosting is > 1.
Looking at precise scores is generally not very helpful.

My test was with 4.5 rather than 4.4 but I'm sure that's irrelevant.


--
Ian.



public class Test3 {
    public static void main(String[] args) throws IOException {
        RAMDirectory dir = new RAMDirectory();
        Version version = Version.LUCENE_45;
        try (
    IndexWriter writer = new IndexWriter(dir, new
IndexWriterConfig(version, new StandardAnalyzer(version)))) {
Document doc1 = new Document();
doc1.add(new TextField("title", "Some Book", Field.Store.YES));
doc1.add(new TextField("blurb", "This book is not as good as The Hare
with Amber Eyes", Field.Store.YES));
writer.addDocument(doc1);

Document doc2 = new Document();
doc2.add(new TextField("title", "The Hare with Amber Eyes", Field.Store.YES));
doc2.add(new TextField("blurb", "This book is brilliant", Field.Store.YES));
writer.addDocument(doc2);
   }
        IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(dir));
        search(searcher, 1, args[0]);
        search(searcher, 5, args[0]);
    }

    private static void search(IndexSearcher searcher, float boost,
String ftoboost) throws
IOException {
BooleanQuery bq = new BooleanQuery();
PhraseQuery tpq = new PhraseQuery();
tpq.add(new Term("title", "amber"));
tpq.add(new Term("title", "eyes"));
PhraseQuery bpq = new PhraseQuery();
bpq.add(new Term("blurb", "amber"));
bpq.add(new Term("blurb", "eyes"));
if ("title".equals(ftoboost)) {
   tpq.setBoost(boost);
}
if ("blurb".equals(ftoboost)) {
   bpq.setBoost(boost);
}
bq.add(tpq, BooleanClause.Occur.SHOULD);
bq.add(bpq, BooleanClause.Occur.SHOULD);
        TopDocs hits = searcher.search(bq, 10);
System.out.printf("Query: %s: %s hits, boost=%s on %s\n",
 bq, hits.scoreDocs.length, boost, ftoboost);
if (hits != null && hits.scoreDocs.length > 0) {
   ScoreDoc doc = hits.scoreDocs[0];
   int docid = doc.doc;
   System.out.printf("First: %s, Boost %g, score %g:%n%s%n%n",
     searcher.doc(docid).get("title"), boost, doc.score,
     searcher.explain(bq, doc.doc));
}
    }
}

On Wed, Oct 16, 2013 at 7:03 AM, denis.zhdanov <de...@gmail.com> wrote:
> Hello,
>
> Have a question about default PhraseQuery boost processing. The
> Query.setBoost()
> <http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/Query.html#setBoost(float)>
> says:
> /
> Sets the boost for this query clause to b. Documents matching this clause
> will (in addition to the normal weightings) have their score multiplied by b
> /
>
> However, that's not true for /PhraseQuery/. Example:
>
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.TextField;
> import org.apache.lucene.index.DirectoryReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.PhraseQuery;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.TopDocs;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.store.RAMDirectory;
> import org.apache.lucene.util.Version;
>
> import java.io.IOException;
>
> public class Test {
>     public static void main(String[] args) throws IOException {
>         RAMDirectory dir = new RAMDirectory();
>         Version version = Version.LUCENE_44;
>         try (IndexWriter writer = new IndexWriter(dir, new
> IndexWriterConfig(version, new StandardAnalyzer(version)))) {
>             Document document = new Document();
>             document.add(new TextField("data", "1 2 3", Field.Store.YES));
>             writer.addDocument(document);
>         }
>
>         IndexSearcher searcher = new
> IndexSearcher(DirectoryReader.open(dir));
>         search(searcher, 1);
>         search(searcher, 5);
>     }
>
>     private static void search(IndexSearcher searcher, float boost) throws
> IOException {
>         PhraseQuery query = new PhraseQuery();
>         query.add(new Term("data", "2"));
>         query.add(new Term("data", "3"));
>         query.setBoost(boost);
>         TopDocs hits = searcher.search(query, 1);
>         assert hits != null && hits.scoreDocs.length == 1;
>         ScoreDoc doc = hits.scoreDocs[0];
>         System.out.printf("Boost %g, score %g:%n%s%n", boost, doc.score,
> searcher.explain(query, doc.doc));
>     }
> }
>
> *Output:*
> /
> Boost 1.00000, score 0.306853:
> 0.30685282 = (MATCH) weight(data:"2 3" in 0) [DefaultSimilarity], result of:
>   0.30685282 = fieldWeight in 0, product of:
>     1.0 = tf(freq=1.0), with freq of:
>       1.0 = phraseFreq=1.0
>     0.61370564 = idf(), sum of:
>       0.30685282 = idf(docFreq=1, maxDocs=1)
>       0.30685282 = idf(docFreq=1, maxDocs=1)
>     0.5 = fieldNorm(doc=0)
>
> Boost 5.00000, score 0.306853:
> 0.30685282 = (MATCH) weight(data:"2 3"^5.0 in 0) [DefaultSimilarity], result
> of:
>   0.30685282 = fieldWeight in 0, product of:
>     1.0 = tf(freq=1.0), with freq of:
>       1.0 = phraseFreq=1.0
>     0.61370564 = idf(), sum of:
>       0.30685282 = idf(docFreq=1, maxDocs=1)
>       0.30685282 = idf(docFreq=1, maxDocs=1)
>     0.5 = fieldNorm(doc=0)
> /
>
> I.e. resulting ScoreDoc.score is the same for boosted and non-boosted query.
> I see that that contradicts to the /Query.setBoost()/ contract. Did I miss
> something?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/PhraseQuery-boost-doesn-t-affect-ScoreDoc-score-tp4095791.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org