You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by AlexeyG <ag...@z-techcorp.com> on 2006/08/24 19:06:35 UTC

Boosting Documents and score calculation

Hello,

I ran into some very strange behavior by Lucene 1.9.  Boost factor under 1.3
does not effect the result score!  I wrote a simple test to isolate the
issue:

Writing test index
Creating 3 documents with same KEY and boosts of default, 1.1, 1.2, and 1.3

	public static void writeTestIndex() throws IOException {
		
		// opening index writer
		IndexWriter writer = null;
		writer = new IndexWriter("C:\\a_temp", new StandardAnalyzer(), true);
		
		Document currentDocument = null;
		
		// creating and adding document with DEFAULT boost
		currentDocument = new Document();
		currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
Field.Index.UN_TOKENIZED));
		currentDocument.add(new Field("BOOST_FACTOR", "1", Field.Store.YES,
Field.Index.UN_TOKENIZED));
		writer.addDocument(currentDocument);

		// creating and adding document with 1.1 boost
		currentDocument = new Document();
		currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
Field.Index.UN_TOKENIZED));
		currentDocument.add( new Field("BOOST_FACTOR", "1.1", Field.Store.YES,
Field.Index.UN_TOKENIZED));
		currentDocument.setBoost((float)1.1);
		writer.addDocument(currentDocument);

		// creating and adding document with 1.2 boost
		currentDocument = new Document();
		currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
Field.Index.UN_TOKENIZED));
		currentDocument.add( new Field("BOOST_FACTOR", "1.2", Field.Store.YES,
Field.Index.UN_TOKENIZED));
		currentDocument.setBoost((float)1.2);
		writer.addDocument(currentDocument);
		
		// creating and adding document with 1.3 boost
		currentDocument = new Document();
		currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
Field.Index.UN_TOKENIZED));
		currentDocument.add(new Field("BOOST_FACTOR", "1.3", Field.Store.YES,
Field.Index.UN_TOKENIZED));
		currentDocument.setBoost((float)1.3);
		writer.addDocument(currentDocument);			

		// optimizing and closing IndexWriter
		writer.optimize();
		writer.close();
	}


Test Search
Searching for the KEY value, which is the same in all 4 documents

	public static void testIndex() throws IOException {

		// opening IndexSearcher
		IndexSearcher searcher = null;
		searcher = new IndexSearcher("C:\\a_temp");

		// searching for KEY
		Hits hits = searcher.search(new TermQuery(new Term("KEY", "AA")));

		// listing documents and their BOOST_FACTOR field
		Document doc = null;
		if (null != hits) {
			logger.debug("Listing results: ");
			for (int i = 0; i < hits.length(); i++) {
				doc = hits.doc(i);
				logger.debug("BOOST_FACTOR field: " + doc.get("BOOST_FACTOR") + " Score:
" + hits.score(i));
			}
		}
		
		// closing IndexSearcher
		searcher.close();
	}

Output

BOOST_FACTOR field: 1.3 Score: 0.9710705
BOOST_FACTOR field: 1 Score: 0.7768564
BOOST_FACTOR field: 1.1 Score: 0.7768564
BOOST_FACTOR field: 1.2 Score: 0.7768564

Boost of 1.1 and 1.2 did not effect score for the last 2 documents! 
Document with boost of 1.3 jumped to the top, but the rest were returned in
the order they were added to the index.

What am I missing here?  I thought document score would reflect all levels
of boost, not just 1.3 and above?  Please help.
-- 
View this message in context: http://www.nabble.com/Boosting-Documents-and-score-calculation-tf2159899.html#a5968287
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Boosting Documents and score calculation

Posted by Chris Hostetter <ho...@fucit.org>.
First off, when trying to make sense of socres you should allways use
either HitCollector or one of  the TopDocs methods of the Searcher
interface -- otherwise the "normalize if greater then 1" logic of the Hits
class might confuse you.

Second: Searcher.explain(Query,int) is your friend ... it will help you
understand exactly where your scores are coming from

Third: index time document boosts are folded into the "norm" value for
that field (along with any index time field boosts and the length norm)
... these norms are "encoded" as a single byte, which can result in a loss
of precision, so it wouldn't be too suprising if boosts of 1.0, 1.1,
and 1.2 all encoded as the same value.  (you can use
Similarity.decodeNorm(Similarity.encodeNorm(some_float)) to see exactly
how much precision is lost for any given float value.



: Date: Thu, 24 Aug 2006 10:06:35 -0700 (PDT)
: From: AlexeyG <ag...@z-techcorp.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Boosting Documents and score calculation
:
:
: Hello,
:
: I ran into some very strange behavior by Lucene 1.9.  Boost factor under 1.3
: does not effect the result score!  I wrote a simple test to isolate the
: issue:
:
: Writing test index
: Creating 3 documents with same KEY and boosts of default, 1.1, 1.2, and 1.3
:
: 	public static void writeTestIndex() throws IOException {
:
: 		// opening index writer
: 		IndexWriter writer = null;
: 		writer = new IndexWriter("C:\\a_temp", new StandardAnalyzer(), true);
:
: 		Document currentDocument = null;
:
: 		// creating and adding document with DEFAULT boost
: 		currentDocument = new Document();
: 		currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
: Field.Index.UN_TOKENIZED));
: 		currentDocument.add(new Field("BOOST_FACTOR", "1", Field.Store.YES,
: Field.Index.UN_TOKENIZED));
: 		writer.addDocument(currentDocument);
:
: 		// creating and adding document with 1.1 boost
: 		currentDocument = new Document();
: 		currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
: Field.Index.UN_TOKENIZED));
: 		currentDocument.add( new Field("BOOST_FACTOR", "1.1", Field.Store.YES,
: Field.Index.UN_TOKENIZED));
: 		currentDocument.setBoost((float)1.1);
: 		writer.addDocument(currentDocument);
:
: 		// creating and adding document with 1.2 boost
: 		currentDocument = new Document();
: 		currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
: Field.Index.UN_TOKENIZED));
: 		currentDocument.add( new Field("BOOST_FACTOR", "1.2", Field.Store.YES,
: Field.Index.UN_TOKENIZED));
: 		currentDocument.setBoost((float)1.2);
: 		writer.addDocument(currentDocument);
:
: 		// creating and adding document with 1.3 boost
: 		currentDocument = new Document();
: 		currentDocument.add(new Field("KEY", "AA", Field.Store.YES,
: Field.Index.UN_TOKENIZED));
: 		currentDocument.add(new Field("BOOST_FACTOR", "1.3", Field.Store.YES,
: Field.Index.UN_TOKENIZED));
: 		currentDocument.setBoost((float)1.3);
: 		writer.addDocument(currentDocument);
:
: 		// optimizing and closing IndexWriter
: 		writer.optimize();
: 		writer.close();
: 	}
:
:
: Test Search
: Searching for the KEY value, which is the same in all 4 documents
:
: 	public static void testIndex() throws IOException {
:
: 		// opening IndexSearcher
: 		IndexSearcher searcher = null;
: 		searcher = new IndexSearcher("C:\\a_temp");
:
: 		// searching for KEY
: 		Hits hits = searcher.search(new TermQuery(new Term("KEY", "AA")));
:
: 		// listing documents and their BOOST_FACTOR field
: 		Document doc = null;
: 		if (null != hits) {
: 			logger.debug("Listing results: ");
: 			for (int i = 0; i < hits.length(); i++) {
: 				doc = hits.doc(i);
: 				logger.debug("BOOST_FACTOR field: " + doc.get("BOOST_FACTOR") + " Score:
: " + hits.score(i));
: 			}
: 		}
:
: 		// closing IndexSearcher
: 		searcher.close();
: 	}
:
: Output
:
: BOOST_FACTOR field: 1.3 Score: 0.9710705
: BOOST_FACTOR field: 1 Score: 0.7768564
: BOOST_FACTOR field: 1.1 Score: 0.7768564
: BOOST_FACTOR field: 1.2 Score: 0.7768564
:
: Boost of 1.1 and 1.2 did not effect score for the last 2 documents!
: Document with boost of 1.3 jumped to the top, but the rest were returned in
: the order they were added to the index.
:
: What am I missing here?  I thought document score would reflect all levels
: of boost, not just 1.3 and above?  Please help.
: --
: View this message in context: http://www.nabble.com/Boosting-Documents-and-score-calculation-tf2159899.html#a5968287
: Sent from the Lucene - Java Users forum at Nabble.com.
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org