You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Yann-Erwan Perio <ye...@gmail.com> on 2013/01/10 10:47:32 UTC

CustomScoreQuery + Collector + Scoring

Hello,

I am using Lucene 4.0.0, trying to put together a CustomQuery and a
Collector, and have a problem with the calculation of scores.

My context is as follows. I have a big BooleanQuery which works fine,
but I also want to calculate some statistics during the search (i.e.
perform aggregation operations). To do so, I have tried to pass a
custom Collector to the searcher, in charge of retrieving data from
the matched documents (using FieldCache), and performing the
appropriate calculations. The whole thing works fine, the calculations
seem to be done properly (I have yet to test more accurately) :

---
IndexSearcher searcher = new IndexSearcher(reader);
searcher.search(query, collector);
---

Now, my Collector also collects a sample of the matched documents, in
order to display them. These documents are collected one after the
other, at each call of the collect() method, until the sample is
complete. What I would like is to have the documents sorted by their
score, and I also would like to calculate the score myself. In other
words, I want to make a sample of the "best" documents, while still
calculating statistics for all matched documents.

I am confused about how to do that. I have created a CustomQuery,
wrapping the source BooleanQuery, using a CustomScoreProvider.
However, when I execute the search upon this query, with the
collector, my log shows that the CustomScoreProvider is properly
instantiated, but that the customScore() method is never called.
Interestingly, if I execute the custom query without the collector,
retrieving some TopDocs object, then the customScore() is called.

What can I do? So far, my collector does not make use of the Scorer
passed to the setScorer() method - should I use it? I have tried the
simple scorer.score() call in the collector setScorer() method,
resulting in the customScore() method being called, but with no docID.

Thank you in advance for your insight.

Kind regards,
Yep.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: CustomScoreQuery + Collector + Scoring

Posted by Yann-Erwan Perio <ye...@gmail.com>.
On Thu, Jan 10, 2013 at 11:22 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

Hi Uwe,

> The best way to do this ist o wrap the standard Lucene
> TopScoreDocCollector by your own collector (passing all
> calls to the collector also down to the top-docs collector).
> Then you don't have to take care of sorting the results, you
>  collector just does statistics.

This worked like a charm, and the resulting code is even neater - thanks a lot.

Kind regards,
Yep.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: CustomScoreQuery + Collector + Scoring

Posted by Uwe Schindler <uw...@thetaphi.de>.
> I am using Lucene 4.0.0, trying to put together a CustomQuery and a
> Collector, and have a problem with the calculation of scores.
> 
> My context is as follows. I have a big BooleanQuery which works fine, but I
> also want to calculate some statistics during the search (i.e.
> perform aggregation operations). To do so, I have tried to pass a custom
> Collector to the searcher, in charge of retrieving data from the matched
> documents (using FieldCache), and performing the appropriate calculations.
> The whole thing works fine, the calculations seem to be done properly (I
> have yet to test more accurately) :
> 
> ---
> IndexSearcher searcher = new IndexSearcher(reader);
> searcher.search(query, collector);
> ---
> 
> Now, my Collector also collects a sample of the matched documents, in order
> to display them. These documents are collected one after the other, at each
> call of the collect() method, until the sample is complete. What I would like is
> to have the documents sorted by their score, and I also would like to
> calculate the score myself. In other words, I want to make a sample of the
> "best" documents, while still calculating statistics for all matched documents.

The best way to do this ist o wrap the standard Lucene TopScoreDocCollector by your own collector (passing all calls to the collector also down to the top-docs collector). Then you don't have to take care of sorting the results, you collector just does statistics.

> I am confused about how to do that. I have created a CustomQuery,
> wrapping the source BooleanQuery, using a CustomScoreProvider.
> However, when I execute the search upon this query, with the collector, my
> log shows that the CustomScoreProvider is properly instantiated, but that
> the customScore() method is never called.
> Interestingly, if I execute the custom query without the collector, retrieving
> some TopDocs object, then the customScore() is called.

If the collector does not use the score it is not calculated, so the provider is not called.

> What can I do? So far, my collector does not make use of the Scorer passed
> to the setScorer() method - should I use it? I have tried the simple
> scorer.score() call in the collector setScorer() method, resulting in the
> customScore() method being called, but with no docID.

It gets the doc id, for sure.

> Thank you in advance for your insight.
> 
> Kind regards,
> Yep.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org