You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Bruno Dery <br...@evidencematters.com> on 2007/10/30 21:26:04 UTC

Document boost, is it working?

Hi all the following is using Lucene  2.2.0. 
 
I've been trying to alter the scoring of my search results to boost by
date. My idea was to boost documents while indexing using the date but
it doesn't work. So I put together this little sample piece of code to
investigate further and apparently setting the document boost does
nothing. In my example below, you'd expect the display the output 20 2
and 10 but I get 1 1 1. Is this normal behavior and if so how am I
supposed to use document boosting because it seems like I'm missing
something...
 
Here's the sample  of code:
 
-----------
 
 
import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexReader;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.store.FSDirectory;

public class IndexTest {

/**

* @param args

*/

public static void main(String[] args) throws Exception {

// Create bogus index 

IndexWriter writer = new
IndexWriter(FSDirectory.getDirectory("C:/lucene_test/"), new
StandardAnalyzer(), true);

writer.setUseCompoundFile(true);

Document doc = new Document();

doc.add(new Field("testfield", "high ranking", Field.Store.YES,
Field.Index.TOKENIZED));

doc.setBoost(20);

writer.addDocument(doc);

doc = new Document();

doc.add(new Field("testfield", "low ranking", Field.Store.YES,
Field.Index.TOKENIZED));

doc.setBoost(2);

writer.addDocument(doc);

doc = new Document();

doc.add(new Field("testfield", "mid ranking", Field.Store.YES,
Field.Index.TOKENIZED));

doc.setBoost(10);

writer.addDocument(doc);

writer.close();

// Read bogus index 

IndexReader reader =
IndexReader.open(FSDirectory.getDirectory("C:/lucene_test/"));

System.out.println(reader.document(0).getBoost());

System.out.println(reader.document(1).getBoost());

System.out.println(reader.document(2).getBoost());

}

}

Re: Document boost, is it working?

Posted by John Griffin <jg...@thebluezone.net>.

Bruno Dery wrote:
> Hi all the following is using Lucene  2.2.0. 
>  
> I've been trying to alter the scoring of my search results to boost by
> date. My idea was to boost documents while indexing using the date but
> it doesn't work. So I put together this little sample piece of code to
> investigate further and apparently setting the document boost does
> nothing. In my example below, you'd expect the display the output 20 2
> and 10 but I get 1 1 1. Is this normal behavior and if so how am I
> supposed to use document boosting because it seems like I'm missing
> something...
>  
> Here's the sample  of code:
>  
> -----------
>  
>  
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>
> import org.apache.lucene.document.Document;
>
> import org.apache.lucene.document.Field;
>
> import org.apache.lucene.index.IndexReader;
>
> import org.apache.lucene.index.IndexWriter;
>
> import org.apache.lucene.store.FSDirectory;
>
> public class IndexTest {
>
> /**
>
> * @param args
>
> */
>
> public static void main(String[] args) throws Exception {
>
> // Create bogus index 
>
> IndexWriter writer = new
> IndexWriter(FSDirectory.getDirectory("C:/lucene_test/"), new
> StandardAnalyzer(), true);
>
> writer.setUseCompoundFile(true);
>
> Document doc = new Document();
>
> doc.add(new Field("testfield", "high ranking", Field.Store.YES,
> Field.Index.TOKENIZED));
>
> doc.setBoost(20);
>
> writer.addDocument(doc);
>
> doc = new Document();
>
> doc.add(new Field("testfield", "low ranking", Field.Store.YES,
> Field.Index.TOKENIZED));
>
> doc.setBoost(2);
>
> writer.addDocument(doc);
>
> doc = new Document();
>
> doc.add(new Field("testfield", "mid ranking", Field.Store.YES,
> Field.Index.TOKENIZED));
>
> doc.setBoost(10);
>
> writer.addDocument(doc);
>
> writer.close();
>
> // Read bogus index 
>
> IndexReader reader =
> IndexReader.open(FSDirectory.getDirectory("C:/lucene_test/"));
>
> System.out.println(reader.document(0).getBoost());
>
> System.out.println(reader.document(1).getBoost());
>
> System.out.println(reader.document(2).getBoost());
>
> }
>
> }
>
>  
>
>   
Bruno,

Use Luke to examine your simple index and see if the boost is 
registering. It will be on Luks's documents page about in the middle of 
the screen. That'll at least tell you if it's registering with the 
documents.

John G.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Document boost, is it working?

Posted by Bruno Dery <br...@evidencematters.com>.

Thanks! I also noticed there is a mention of this in the documentation
of Document.getBoost():

"Note: This value is not stored directly with the document in the index.
Documents returned from IndexReader.document(int) and Hits.doc(int) may
thus not have the same value present as when this document was indexed."

Then maybe it should simply be removed from Luke's display as you
mention. 

-----Original Message-----
From: Andrzej Bialecki [mailto:ab@getopt.org] 
Sent: Wednesday, October 31, 2007 4:13 AM
To: java-user@lucene.apache.org
Subject: Re: Document boost, is it working?

Bruno Dery wrote:
> Thanks for the help, you're right your example works. However looking 
> in Luke I also see only ones (1 1 1) as the document boost.

Then perhaps this value should be removed from the Luke's display ... 
because it will always read 1, and it's a correct value (see below).

> 
> I imagine Luke use's Lucene's Document.getBoost() function. Shouldn't 
> this be considered a bug, as I'd expect to retrieve the same boost 
> number (or at least some factor of it) when I look at my documents 
> once indexed? Or perhaps there is something I'm not understanding 
> about this Document.getBoost function, normally you expect a getter to

> return you the exact value you entered with the setter.

Document.setBoost / getBoost is _only_ guaranteed to return the same 
value if a document is newly created, i.e. it hasn't been stored & 
retrieved. This is related to the way Lucene stores boost values in the 
index - in order to save storage space the boost value is never stored 
explicitly, instead during the storing process it's multiplied by every 
field's lengthNorm, and stored together as a single-byte float.

(Actually, it's a function of the current index format - perhaps in the 
future Lucene will be able to store these values separately using 
another type of storage. So far there was no pressing need to do this).

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||
\|  ||  |  Embedded Unix, System Integration http://www.sigram.com
Contact: info at sigram dot com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Document boost, is it working?

Posted by Andrzej Bialecki <ab...@getopt.org>.

Bruno Dery wrote:
> Thanks for the help, you're right your example works. However looking in
> Luke I also see only ones (1 1 1) as the document boost. 

Then perhaps this value should be removed from the Luke's display ... 
because it will always read 1, and it's a correct value (see below).

> 
> I imagine Luke use's Lucene's Document.getBoost() function. Shouldn't
> this be considered a bug, as I'd expect to retrieve the same boost
> number (or at least some factor of it) when I look at my documents once
> indexed? Or perhaps there is something I'm not understanding about this
> Document.getBoost function, normally you expect a getter to return you
> the exact value you entered with the setter.

Document.setBoost / getBoost is _only_ guaranteed to return the same 
value if a document is newly created, i.e. it hasn't been stored & 
retrieved. This is related to the way Lucene stores boost values in the 
index - in order to save storage space the boost value is never stored 
explicitly, instead during the storing process it's multiplied by every 
field's lengthNorm, and stored together as a single-byte float.

(Actually, it's a function of the current index format - perhaps in the 
future Lucene will be able to store these values separately using 
another type of storage. So far there was no pressing need to do this).

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Document boost, is it working?

Posted by Bruno Dery <br...@evidencematters.com>.

Thanks for the help, you're right your example works. However looking in
Luke I also see only ones (1 1 1) as the document boost. 

I imagine Luke use's Lucene's Document.getBoost() function. Shouldn't
this be considered a bug, as I'd expect to retrieve the same boost
number (or at least some factor of it) when I look at my documents once
indexed? Or perhaps there is something I'm not understanding about this
Document.getBoost function, normally you expect a getter to return you
the exact value you entered with the setter.


-----Original Message-----
From: John Griffin [mailto:jgriffin@thebluezone.net] 
Sent: Tuesday, October 30, 2007 5:40 PM
To: java-user@lucene.apache.org
Subject: Re: Document boost, is it working?


Bruno Dery wrote:
> Hi all the following is using Lucene  2.2.0.
>  
> I've been trying to alter the scoring of my search results to boost by

> date. My idea was to boost documents while indexing using the date but

> it doesn't work. So I put together this little sample piece of code to

> investigate further and apparently setting the document boost does 
> nothing. In my example below, you'd expect the display the output 20 2

> and 10 but I get 1 1 1. Is this normal behavior and if so how am I 
> supposed to use document boosting because it seems like I'm missing 
> something...
>  
> Here's the sample  of code:
>  
> -----------
>  
>  
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>
> import org.apache.lucene.document.Document;
>
> import org.apache.lucene.document.Field;
>
> import org.apache.lucene.index.IndexReader;
>
> import org.apache.lucene.index.IndexWriter;
>
> import org.apache.lucene.store.FSDirectory;
>
> public class IndexTest {
>
> /**
>
> * @param args
>
> */
>
> public static void main(String[] args) throws Exception {
>
> // Create bogus index
>
> IndexWriter writer = new 
> IndexWriter(FSDirectory.getDirectory("C:/lucene_test/"), new 
> StandardAnalyzer(), true);
>
> writer.setUseCompoundFile(true);
>
> Document doc = new Document();
>
> doc.add(new Field("testfield", "high ranking", Field.Store.YES, 
> Field.Index.TOKENIZED));
>
> doc.setBoost(20);
>
> writer.addDocument(doc);
>
> doc = new Document();
>
> doc.add(new Field("testfield", "low ranking", Field.Store.YES, 
> Field.Index.TOKENIZED));
>
> doc.setBoost(2);
>
> writer.addDocument(doc);
>
> doc = new Document();
>
> doc.add(new Field("testfield", "mid ranking", Field.Store.YES, 
> Field.Index.TOKENIZED));
>
> doc.setBoost(10);
>
> writer.addDocument(doc);
>
> writer.close();
>
> // Read bogus index
>
> IndexReader reader = 
> IndexReader.open(FSDirectory.getDirectory("C:/lucene_test/"));
>
> System.out.println(reader.document(0).getBoost());
>
> System.out.println(reader.document(1).getBoost());
>
> System.out.println(reader.document(2).getBoost());
>
> }
>
> }
>
>  
>  
>  
>
>   
Bruno,

After your comment // read bogus index, replace your code with this.

      IndexSearcher searcher = new 
IndexSearcher("/home/griffij/lucene_test/");
      final QueryParser parser = new QueryParser("testfield", new 
StandardAnalyzer());
      final Query query = parser.parse("ranking");
      Hits hits = searcher.search(query);
      for (int x = 0; x < hits.length(); x++)
      {
         System.out.println(hits.doc(x) + "score=> " + hits.score(x));
      }


You'll see that the boost does indeed take effect. The boost value isn't

stored with the document you set it against. It takes effect during the 
scoring process and effects all fields in the document.

John G.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Document boost, is it working?

Posted by John Griffin <jg...@thebluezone.net>.

Bruno Dery wrote:
> Hi all the following is using Lucene  2.2.0. 
>  
> I've been trying to alter the scoring of my search results to boost by
> date. My idea was to boost documents while indexing using the date but
> it doesn't work. So I put together this little sample piece of code to
> investigate further and apparently setting the document boost does
> nothing. In my example below, you'd expect the display the output 20 2
> and 10 but I get 1 1 1. Is this normal behavior and if so how am I
> supposed to use document boosting because it seems like I'm missing
> something...
>  
> Here's the sample  of code:
>  
> -----------
>  
>  
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
>
> import org.apache.lucene.document.Document;
>
> import org.apache.lucene.document.Field;
>
> import org.apache.lucene.index.IndexReader;
>
> import org.apache.lucene.index.IndexWriter;
>
> import org.apache.lucene.store.FSDirectory;
>
> public class IndexTest {
>
> /**
>
> * @param args
>
> */
>
> public static void main(String[] args) throws Exception {
>
> // Create bogus index 
>
> IndexWriter writer = new
> IndexWriter(FSDirectory.getDirectory("C:/lucene_test/"), new
> StandardAnalyzer(), true);
>
> writer.setUseCompoundFile(true);
>
> Document doc = new Document();
>
> doc.add(new Field("testfield", "high ranking", Field.Store.YES,
> Field.Index.TOKENIZED));
>
> doc.setBoost(20);
>
> writer.addDocument(doc);
>
> doc = new Document();
>
> doc.add(new Field("testfield", "low ranking", Field.Store.YES,
> Field.Index.TOKENIZED));
>
> doc.setBoost(2);
>
> writer.addDocument(doc);
>
> doc = new Document();
>
> doc.add(new Field("testfield", "mid ranking", Field.Store.YES,
> Field.Index.TOKENIZED));
>
> doc.setBoost(10);
>
> writer.addDocument(doc);
>
> writer.close();
>
> // Read bogus index 
>
> IndexReader reader =
> IndexReader.open(FSDirectory.getDirectory("C:/lucene_test/"));
>
> System.out.println(reader.document(0).getBoost());
>
> System.out.println(reader.document(1).getBoost());
>
> System.out.println(reader.document(2).getBoost());
>
> }
>
> }
>
>  
>  
>  
>
>   
Bruno,

After your comment // read bogus index, replace your code with this.

      IndexSearcher searcher = new 
IndexSearcher("/home/griffij/lucene_test/");
      final QueryParser parser = new QueryParser("testfield", new 
StandardAnalyzer());
      final Query query = parser.parse("ranking");
      Hits hits = searcher.search(query);
      for (int x = 0; x < hits.length(); x++)
      {
         System.out.println(hits.doc(x) + "score=> " + hits.score(x));
      }


You'll see that the boost does indeed take effect. The boost value isn't 
stored with the document you set it against. It takes effect during the 
scoring process and effects all fields in the document.

John G.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org