You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dominika Puzio <do...@gmail.com> on 2012/02/16 21:32:40 UTC

Question about CustomScoreQuery

Hello,
I'm trying to understand the behavior of CustomScoreQuery. It seemed to me,
that default
CustomScoreQuery(Query subQuery, ValueSourceQuery valSrcQuery)
should return a score that is a product of subQuery score
and valSrcQuery score. So I wrote a simple test case given below:

      @Test
      public void createAndQueryIndex() throws CorruptIndexException,
LockObtainFailedException, IOException {

              RAMDirectory dir = new RAMDirectory();
              IndexWriter writer = new IndexWriter(dir,
                              new IndexWriterConfig(Version.LUCENE_35,
new StandardAnalyzer(
                                              Version.LUCENE_35)));

              Document doc = new Document();
              doc.add(new Field("name", "name", Store.YES, Index.NOT_ANALYZED));
              doc.add(new Field("ratio", "0.5", Store.YES, Index.NOT_ANALYZED));

              writer.addDocument(doc);
              writer.close();


              IndexSearcher searcher = new IndexSearcher(IndexReader.open(dir));

              Query matchAllDocs = new MatchAllDocsQuery();
              float score1 = searcher.search(matchAllDocs,
1).scoreDocs[0].score;
              Assert.assertEquals(1.0f, score1, 0);

              ValueSourceQuery fieldScoreQuery = new FieldScoreQuery("ratio",
FieldScoreQuery.Type.FLOAT);
              float score2 = searcher.search(fieldScoreQuery,
1).scoreDocs[0].score;
              Assert.assertEquals(0.5f, score2, 0);


              CustomScoreQuery csq = new CustomScoreQuery(matchAllDocs,
fieldScoreQuery);
              float score3 = searcher.search(csq, 1).scoreDocs[0].score;
              Assert.assertEquals(score1*score2, score3, 0);

              searcher.close();
      }

But this test fails in third assertion - score3 is 0.249. Can someone
explain why? How the score3 is computed in this case?

-- 
Regards
Dominika Puzio

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question about CustomScoreQuery

Posted by Dominika Puzio <st...@wp-sa.pl>.
Thanks for the help! I feel this performance advice about FieldCache in
ctor saved me a lot of time :) I've done what you said and it works.

--
Dominika

On 21.02.2012 10:46, Uwe Schindler wrote:
> It looks like you already implemented a CustomScoreProvider.
>
> You are retrieving the FieldCache on every document, which slows down
> immense (it's a sycronized cache lookup). The correct way is:
>
> Override CSQ.getCustomScoreProvider and return your own CSP there. The CSP
> itself should get the FieldCache for the required field in the ctor and
> store it as a class instance field (Lucene will as for a new
> CustomScoreProvider for each index segment).
>
> In CSQ itself don't use ValueSources at all (I never use them because I
> never understood how they are intended to work). Providing an own
> CustomScoreProvider is in most cases the easiest you can do. Just follow my
> advice from above to get the FieldCache in the ctor of the provider (from
> the reader). Don't call super in the provider just return your own score
> (some formula using subQueryScore and FieldCache, in most cases a simple
> multiplication).
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Dominika Puzio [mailto:dominika.puzio@gmail.com]
>> Sent: Tuesday, February 21, 2012 10:27 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Question about CustomScoreQuery
>>
>> Thanks for your answer. I checked what explain() says about my queries,
> and:
>>
>> MatchAllDocsQuery:
>> 1.0 = (MATCH) MatchAllDocsQuery, product of:
>>     1.0 = queryNorm
>>
>> FieldScoreQuery:
>> 0.5 = (MATCH) float(ratio), product of:
>>     0.5 = float(ratio)=0.5
>>     1.0 = boost
>>     1.0 = queryNorm
>>
>> CustomScoreQuery:
>> 0.24999999 = (MATCH) custom(*:*, float(ratio)), product of:
>>     0.24999999 = custom score: product of:
>>       0.70710677 = (MATCH) MatchAllDocsQuery, product of:
>>         0.70710677 = queryNorm
>>       0.35355338 = (MATCH) float(ratio), product of:
>>         0.5 = float(ratio)=0.5
>>         1.0 = boost
>>         0.70710677 = queryNorm
>>     1.0 = queryBoost
>>
>> so the reason is queryNorm.
>> I've done this test, because I wanted to write my own CustomScoreQuery and
> I
>> was trying to understand what's happening in customScore(int doc, float
>> subQueryScore, float valSrcScore) method of CustomScoreProvider. I
> assumed,
>> that if I use FieldScoreQuery as valueSourceQuery, valSrcScore parameter
> will
>> be equal to field value,  but it's not true. So now I'm using FieldCache
> to obtain
>> field value from index and use it in my own score function (there is one
> value
>> per document in this field in my index). It looks something like this:
>>
>> public float customScore(int doc, float subQueryScore, float valSrcScore)
>>              throws IOException {
>>
>>      float fieldValue = FieldCache.DEFAULT.getFloats(reader,
> "ratio")[doc];
>>
>>      float newValSrcScore = someFunction(someQueryTimeVariable,
>> fieldValue);
>>
>>      return super.customScore(doc, subQueryScore, newValSrcScore); }
>>
>> Am I right, or is there a better way to get field value?
>>
>> --
>> Dominika
>>
>>
>> On 20.02.2012 18:38, Ian Lea wrote:
>>> I can't explain this.  Can you get at an oal.search.Explanation?  You
>>> could write your own CustomScoreProvider - that might help you to
>>> double check what is being passed to it, and/or allow you to provide
>>> your own calculation.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Thu, Feb 16, 2012 at 8:32 PM, Dominika Puzio
>>> <do...@gmail.com>   wrote:
>>>> Hello,
>>>> I'm trying to understand the behavior of CustomScoreQuery. It seemed to
>> me,
>>>> that default
>>>> CustomScoreQuery(Query subQuery, ValueSourceQuery valSrcQuery)
>>>> should return a score that is a product of subQuery score
>>>> and valSrcQuery score. So I wrote a simple test case given below:
>>>>
>>>>        @Test
>>>>        public void createAndQueryIndex() throws CorruptIndexException,
>>>> LockObtainFailedException, IOException {
>>>>
>>>>                RAMDirectory dir = new RAMDirectory();
>>>>                IndexWriter writer = new IndexWriter(dir,
>>>>                                new IndexWriterConfig(Version.LUCENE_35,
>>>> new StandardAnalyzer(
>>>>                                                Version.LUCENE_35)));
>>>>
>>>>                Document doc = new Document();
>>>>                doc.add(new Field("name", "name", Store.YES,
>> Index.NOT_ANALYZED));
>>>>                doc.add(new Field("ratio", "0.5", Store.YES,
> Index.NOT_ANALYZED));
>>>>
>>>>                writer.addDocument(doc);
>>>>                writer.close();
>>>>
>>>>
>>>>                IndexSearcher searcher = new
>> IndexSearcher(IndexReader.open(dir));
>>>>
>>>>                Query matchAllDocs = new MatchAllDocsQuery();
>>>>                float score1 = searcher.search(matchAllDocs,
>>>> 1).scoreDocs[0].score;
>>>>                Assert.assertEquals(1.0f, score1, 0);
>>>>
>>>>                ValueSourceQuery fieldScoreQuery = new
> FieldScoreQuery("ratio",
>>>> FieldScoreQuery.Type.FLOAT);
>>>>                float score2 = searcher.search(fieldScoreQuery,
>>>> 1).scoreDocs[0].score;
>>>>                Assert.assertEquals(0.5f, score2, 0);
>>>>
>>>>
>>>>                CustomScoreQuery csq = new CustomScoreQuery(matchAllDocs,
>>>> fieldScoreQuery);
>>>>                float score3 = searcher.search(csq,
> 1).scoreDocs[0].score;
>>>>                Assert.assertEquals(score1*score2, score3, 0);
>>>>
>>>>                searcher.close();
>>>>        }
>>>>
>>>> But this test fails in third assertion - score3 is 0.249. Can someone
>>>> explain why? How the score3 is computed in this case?
>>>>
>>>> --
>>>> Regards
>>>> Dominika Puzio
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--
Dominika Puzio
Zespół Usług Wyszukiwania, SEO i Ergonomii
pok. 133(GDA), tel. 52.15.325


"WIRTUALNA POLSKA" Spolka Akcyjna z siedziba w Gdansku przy ul. Traugutta 115 C, wpisana do Krajowego Rejestru Sadowego - Rejestru Przedsiebiorcow prowadzonego przez Sad Rejonowy Gdansk - Polnoc w Gdansku pod numerem KRS 0000068548, o kapitale zakladowym 67.980.024,00 zlotych oplaconym w calosci oraz Numerze Identyfikacji Podatkowej 957-07-51-216.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Question about CustomScoreQuery

Posted by Uwe Schindler <uw...@thetaphi.de>.
It looks like you already implemented a CustomScoreProvider.

You are retrieving the FieldCache on every document, which slows down
immense (it's a sycronized cache lookup). The correct way is:

Override CSQ.getCustomScoreProvider and return your own CSP there. The CSP
itself should get the FieldCache for the required field in the ctor and
store it as a class instance field (Lucene will as for a new
CustomScoreProvider for each index segment).

In CSQ itself don't use ValueSources at all (I never use them because I
never understood how they are intended to work). Providing an own
CustomScoreProvider is in most cases the easiest you can do. Just follow my
advice from above to get the FieldCache in the ctor of the provider (from
the reader). Don't call super in the provider just return your own score
(some formula using subQueryScore and FieldCache, in most cases a simple
multiplication).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Dominika Puzio [mailto:dominika.puzio@gmail.com]
> Sent: Tuesday, February 21, 2012 10:27 AM
> To: java-user@lucene.apache.org
> Subject: Re: Question about CustomScoreQuery
> 
> Thanks for your answer. I checked what explain() says about my queries,
and:
> 
> MatchAllDocsQuery:
> 1.0 = (MATCH) MatchAllDocsQuery, product of:
>    1.0 = queryNorm
> 
> FieldScoreQuery:
> 0.5 = (MATCH) float(ratio), product of:
>    0.5 = float(ratio)=0.5
>    1.0 = boost
>    1.0 = queryNorm
> 
> CustomScoreQuery:
> 0.24999999 = (MATCH) custom(*:*, float(ratio)), product of:
>    0.24999999 = custom score: product of:
>      0.70710677 = (MATCH) MatchAllDocsQuery, product of:
>        0.70710677 = queryNorm
>      0.35355338 = (MATCH) float(ratio), product of:
>        0.5 = float(ratio)=0.5
>        1.0 = boost
>        0.70710677 = queryNorm
>    1.0 = queryBoost
> 
> so the reason is queryNorm.
> I've done this test, because I wanted to write my own CustomScoreQuery and
I
> was trying to understand what's happening in customScore(int doc, float
> subQueryScore, float valSrcScore) method of CustomScoreProvider. I
assumed,
> that if I use FieldScoreQuery as valueSourceQuery, valSrcScore parameter
will
> be equal to field value,  but it's not true. So now I'm using FieldCache
to obtain
> field value from index and use it in my own score function (there is one
value
> per document in this field in my index). It looks something like this:
> 
> public float customScore(int doc, float subQueryScore, float valSrcScore)
> 		throws IOException {
> 
> 	float fieldValue = FieldCache.DEFAULT.getFloats(reader,
"ratio")[doc];
> 
> 	float newValSrcScore = someFunction(someQueryTimeVariable,
> fieldValue);
> 
> 	return super.customScore(doc, subQueryScore, newValSrcScore); }
> 
> Am I right, or is there a better way to get field value?
> 
> --
> Dominika
> 
> 
> On 20.02.2012 18:38, Ian Lea wrote:
> > I can't explain this.  Can you get at an oal.search.Explanation?  You
> > could write your own CustomScoreProvider - that might help you to
> > double check what is being passed to it, and/or allow you to provide
> > your own calculation.
> >
> >
> > --
> > Ian.
> >
> >
> > On Thu, Feb 16, 2012 at 8:32 PM, Dominika Puzio
> > <do...@gmail.com>  wrote:
> >> Hello,
> >> I'm trying to understand the behavior of CustomScoreQuery. It seemed to
> me,
> >> that default
> >> CustomScoreQuery(Query subQuery, ValueSourceQuery valSrcQuery)
> >> should return a score that is a product of subQuery score
> >> and valSrcQuery score. So I wrote a simple test case given below:
> >>
> >>       @Test
> >>       public void createAndQueryIndex() throws CorruptIndexException,
> >> LockObtainFailedException, IOException {
> >>
> >>               RAMDirectory dir = new RAMDirectory();
> >>               IndexWriter writer = new IndexWriter(dir,
> >>                               new IndexWriterConfig(Version.LUCENE_35,
> >> new StandardAnalyzer(
> >>                                               Version.LUCENE_35)));
> >>
> >>               Document doc = new Document();
> >>               doc.add(new Field("name", "name", Store.YES,
> Index.NOT_ANALYZED));
> >>               doc.add(new Field("ratio", "0.5", Store.YES,
Index.NOT_ANALYZED));
> >>
> >>               writer.addDocument(doc);
> >>               writer.close();
> >>
> >>
> >>               IndexSearcher searcher = new
> IndexSearcher(IndexReader.open(dir));
> >>
> >>               Query matchAllDocs = new MatchAllDocsQuery();
> >>               float score1 = searcher.search(matchAllDocs,
> >> 1).scoreDocs[0].score;
> >>               Assert.assertEquals(1.0f, score1, 0);
> >>
> >>               ValueSourceQuery fieldScoreQuery = new
FieldScoreQuery("ratio",
> >> FieldScoreQuery.Type.FLOAT);
> >>               float score2 = searcher.search(fieldScoreQuery,
> >> 1).scoreDocs[0].score;
> >>               Assert.assertEquals(0.5f, score2, 0);
> >>
> >>
> >>               CustomScoreQuery csq = new CustomScoreQuery(matchAllDocs,
> >> fieldScoreQuery);
> >>               float score3 = searcher.search(csq,
1).scoreDocs[0].score;
> >>               Assert.assertEquals(score1*score2, score3, 0);
> >>
> >>               searcher.close();
> >>       }
> >>
> >> But this test fails in third assertion - score3 is 0.249. Can someone
> >> explain why? How the score3 is computed in this case?
> >>
> >> --
> >> Regards
> >> Dominika Puzio
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question about CustomScoreQuery

Posted by Dominika Puzio <do...@gmail.com>.
Thanks for your answer. I checked what explain() says about my queries, and:

MatchAllDocsQuery:
1.0 = (MATCH) MatchAllDocsQuery, product of:
   1.0 = queryNorm

FieldScoreQuery:
0.5 = (MATCH) float(ratio), product of:
   0.5 = float(ratio)=0.5
   1.0 = boost
   1.0 = queryNorm

CustomScoreQuery:
0.24999999 = (MATCH) custom(*:*, float(ratio)), product of:
   0.24999999 = custom score: product of:
     0.70710677 = (MATCH) MatchAllDocsQuery, product of:
       0.70710677 = queryNorm
     0.35355338 = (MATCH) float(ratio), product of:
       0.5 = float(ratio)=0.5
       1.0 = boost
       0.70710677 = queryNorm
   1.0 = queryBoost

so the reason is queryNorm.
I've done this test, because I wanted to write my own CustomScoreQuery 
and I was trying to understand what's happening in customScore(int doc, 
float subQueryScore, float valSrcScore) method of CustomScoreProvider. I 
assumed, that if I use FieldScoreQuery as valueSourceQuery, valSrcScore 
parameter will be equal to field value,  but it's not true. So now I'm 
using FieldCache to obtain field value from index and use it in my own 
score function (there is one value per document in this field in my 
index). It looks something like this:

public float customScore(int doc, float subQueryScore, float valSrcScore)
		throws IOException {
				
	float fieldValue = FieldCache.DEFAULT.getFloats(reader, "ratio")[doc];
	
	float newValSrcScore = someFunction(someQueryTimeVariable, fieldValue);
			
	return super.customScore(doc, subQueryScore, newValSrcScore);
}

Am I right, or is there a better way to get field value?

-- 
Dominika


On 20.02.2012 18:38, Ian Lea wrote:
> I can't explain this.  Can you get at an oal.search.Explanation?  You
> could write your own CustomScoreProvider - that might help you to
> double check what is being passed to it, and/or allow you to provide
> your own calculation.
>
>
> --
> Ian.
>
>
> On Thu, Feb 16, 2012 at 8:32 PM, Dominika Puzio
> <do...@gmail.com>  wrote:
>> Hello,
>> I'm trying to understand the behavior of CustomScoreQuery. It seemed to me,
>> that default
>> CustomScoreQuery(Query subQuery, ValueSourceQuery valSrcQuery)
>> should return a score that is a product of subQuery score
>> and valSrcQuery score. So I wrote a simple test case given below:
>>
>>       @Test
>>       public void createAndQueryIndex() throws CorruptIndexException,
>> LockObtainFailedException, IOException {
>>
>>               RAMDirectory dir = new RAMDirectory();
>>               IndexWriter writer = new IndexWriter(dir,
>>                               new IndexWriterConfig(Version.LUCENE_35,
>> new StandardAnalyzer(
>>                                               Version.LUCENE_35)));
>>
>>               Document doc = new Document();
>>               doc.add(new Field("name", "name", Store.YES, Index.NOT_ANALYZED));
>>               doc.add(new Field("ratio", "0.5", Store.YES, Index.NOT_ANALYZED));
>>
>>               writer.addDocument(doc);
>>               writer.close();
>>
>>
>>               IndexSearcher searcher = new IndexSearcher(IndexReader.open(dir));
>>
>>               Query matchAllDocs = new MatchAllDocsQuery();
>>               float score1 = searcher.search(matchAllDocs,
>> 1).scoreDocs[0].score;
>>               Assert.assertEquals(1.0f, score1, 0);
>>
>>               ValueSourceQuery fieldScoreQuery = new FieldScoreQuery("ratio",
>> FieldScoreQuery.Type.FLOAT);
>>               float score2 = searcher.search(fieldScoreQuery,
>> 1).scoreDocs[0].score;
>>               Assert.assertEquals(0.5f, score2, 0);
>>
>>
>>               CustomScoreQuery csq = new CustomScoreQuery(matchAllDocs,
>> fieldScoreQuery);
>>               float score3 = searcher.search(csq, 1).scoreDocs[0].score;
>>               Assert.assertEquals(score1*score2, score3, 0);
>>
>>               searcher.close();
>>       }
>>
>> But this test fails in third assertion - score3 is 0.249. Can someone
>> explain why? How the score3 is computed in this case?
>>
>> --
>> Regards
>> Dominika Puzio
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question about CustomScoreQuery

Posted by Ian Lea <ia...@gmail.com>.
I can't explain this.  Can you get at an oal.search.Explanation?  You
could write your own CustomScoreProvider - that might help you to
double check what is being passed to it, and/or allow you to provide
your own calculation.


--
Ian.


On Thu, Feb 16, 2012 at 8:32 PM, Dominika Puzio
<do...@gmail.com> wrote:
> Hello,
> I'm trying to understand the behavior of CustomScoreQuery. It seemed to me,
> that default
> CustomScoreQuery(Query subQuery, ValueSourceQuery valSrcQuery)
> should return a score that is a product of subQuery score
> and valSrcQuery score. So I wrote a simple test case given below:
>
>      @Test
>      public void createAndQueryIndex() throws CorruptIndexException,
> LockObtainFailedException, IOException {
>
>              RAMDirectory dir = new RAMDirectory();
>              IndexWriter writer = new IndexWriter(dir,
>                              new IndexWriterConfig(Version.LUCENE_35,
> new StandardAnalyzer(
>                                              Version.LUCENE_35)));
>
>              Document doc = new Document();
>              doc.add(new Field("name", "name", Store.YES, Index.NOT_ANALYZED));
>              doc.add(new Field("ratio", "0.5", Store.YES, Index.NOT_ANALYZED));
>
>              writer.addDocument(doc);
>              writer.close();
>
>
>              IndexSearcher searcher = new IndexSearcher(IndexReader.open(dir));
>
>              Query matchAllDocs = new MatchAllDocsQuery();
>              float score1 = searcher.search(matchAllDocs,
> 1).scoreDocs[0].score;
>              Assert.assertEquals(1.0f, score1, 0);
>
>              ValueSourceQuery fieldScoreQuery = new FieldScoreQuery("ratio",
> FieldScoreQuery.Type.FLOAT);
>              float score2 = searcher.search(fieldScoreQuery,
> 1).scoreDocs[0].score;
>              Assert.assertEquals(0.5f, score2, 0);
>
>
>              CustomScoreQuery csq = new CustomScoreQuery(matchAllDocs,
> fieldScoreQuery);
>              float score3 = searcher.search(csq, 1).scoreDocs[0].score;
>              Assert.assertEquals(score1*score2, score3, 0);
>
>              searcher.close();
>      }
>
> But this test fails in third assertion - score3 is 0.249. Can someone
> explain why? How the score3 is computed in this case?
>
> --
> Regards
> Dominika Puzio
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org