You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Diego Ceccarelli (JIRA)" <ji...@apache.org> on 2016/03/10 18:08:41 UTC
[jira] [Comment Edited] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189552#comment-15189552 ]
Diego Ceccarelli edited comment on SOLR-8776 at 3/10/16 5:08 PM:
-----------------------------------------------------------------
[~joel.bernstein] thanks for pointing out about the {MergeStrategy}. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and SolrIndexSearcher, I moved {RankQuery} in Lucene and created lucene {SolrRankQuery}. The reason is that the {RankQuery} works by manipulating the collector, through this method:
{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException;
{code}
At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher:
{code:java}
private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException {
Query q = cmd.getQuery();
if (q instanceof RankQuery) {
RankQuery rq = (RankQuery) q;
return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}
Instead of creating a top collector using the {TopScoreDocCollector.create}, we wrap a topScoreCollector into a 'RankQuery' collector.
Let me remind that grouping works in two separate stages:
* in the first stage, we iterate on the documents scoring them and keep a map {<group -> score>} where score is the highest score of a document in the group (the map contains only the TOP-k groups with the highest scores);
* for each group in the top groups documents in the group are ranked and top documents for each group are returned.
This logic is mainly implemented into {Abstract(First|Second)PassGroupingCollector} (within Lucene).
We should probably discuss what means reranking for groups: in my opinion we should keep in mind that the idea behind RankQuery is that you don't want to apply the query to all the documents in the collection, so the "group-reranking"
should:
1 in the first stage, we iterate on the documents scoring them as usual and keep a map {group -> score>};
2 for each group, RankQuery is applied to the top documents in the group;
3 groups will be reranked according to the new scores.
In this patch, I'm able to perform 2. I had to move RankQuery into Lucene, because what happens in the
{AbstractSecondPassGroupingCollector} is that for each group a collector is created:
{code:java}
for (SearchGroup<GROUP_VALUE_TYPE> group : groups) {
//System.out.println(" prep group=" + (group.groupValue == null ? "null" : group.groupValue.utf8ToString()));
TopDocsCollector<?> collector;
if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}
... so no way to 'inject' the reranking collector from Solr. Moving the RankQuery into lucene I modified the code in:
{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
collector = ((RankQuery)query).getTopDocsCollector(collector, null, searcher);
}
{code}
and now documents in groups are reranked. I'll work now on 3. i.e., reordering the groups based on the new rerank score
(I added a new test that fails at the moment).
Happy to discuss about this first change, if you have comments.
Minor notes:
- At the moment {SolrRankQuery} doesn't extend {ExtendedQueryBase}, I have to check if it is a problem. RankQuery could become an interface maybe.
- I did some changes to the interface of {RankQuery.getTopDocsCollector}: {QueryCommand} was in solr but used only for getting {Sort}, len was never used. I added in input the previous collector, instead of creating a new TopDocScore collector inside {RankQuery}.
was (Author: diegoceccarelli):
[~joel.bernstein] thanks for pointing out about the MergeStrategy. I uploaded a new patch with a first step.
I agree that merge strategy must stay there, that's why I wrote "partially moved" :)
as well as there's IndexSearcher and SolrIndexSearcher, I moved {RankQuery} in Lucene and created lucene {SolrRankQuery}.
The reason is that the {RankQuery} works by manipulating the collector, through this method:
{code:java}
public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException;
{code}
At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher:
{code:java}
private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException {
Query q = cmd.getQuery();
if (q instanceof RankQuery) {
RankQuery rq = (RankQuery) q;
return rq.getTopDocsCollector(len, cmd, this);
}
..
{code}
Instead of creating a topCollector using the {TopScoreDocCollector.create}, we wrap a topScoreCollector into a ReRanking
collector.
Let me remind that grouping works in two separate stages:
1. in the first stage, we iterate on the documents scoring them and keep a map {<group -> score>} where score is the highest score of a document in the group (the map contains only the TOP-k groups with the highest scores);
2. for each group in the top groups documents in the group are ranked and top documents for each group are returned.
This logic is mainly implemented into {Abstract(First|Second)PassGroupingCollector} (within Lucene).
We should probably discuss what means reranking for groups: in my opinion we should keep in mind that the idea behind RankQuery is that you don't want to apply the query to all the documents in the collection, so the "group-reranking"
should:
1 in the first stage, we iterate on the documents scoring them as usual and keep a map {group -> score>};
2 for each group, RankQuery is applied to the top documents in the group;
3 groups will be reranked according to the new scores.
In this patch, I'm able to perform 2. I had to move RankQuery into Lucene, because what happens in the
{AbstractSecondPassGroupingCollector} is that for each group a collector is created:
{code:java}
for (SearchGroup<GROUP_VALUE_TYPE> group : groups) {
//System.out.println(" prep group=" + (group.groupValue == null ? "null" : group.groupValue.utf8ToString()));
TopDocsCollector<?> collector;
if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use TopScoreDocCollector
// Sort by score
collector = TopScoreDocCollector.create(maxDocsPerGroup);
...
{code}
... so no way to 'inject' the reranking collector from Solr. Moving the RankQuery into lucene I modified the code in:
{code:java}
collector = TopScoreDocCollector.create(maxDocsPerGroup);
if (query != null && query instanceof RankQuery){
collector = ((RankQuery)query).getTopDocsCollector(collector, null, searcher);
}
{code}
and now documents in groups are reranked. I'll work now on 3. i.e., reordering the groups based on the new rerank score
(I added a new test that fails at the moment).
Happy to discuss about this first change, if you have comments.
Minor notes:
- At the moment {SolrRankQuery} doesn't extend {ExtendedQueryBase}, I have to check if it is a problem. RankQuery could become an interface maybe.
- I did some changes to the interface of {RankQuery.getTopDocsCollector}: {QueryCommand} was in solr but used only for getting {Sort}, len was never used. I added in input the previous collector, instead of creating a new TopDocScore collector inside {RankQuery}.
> Support RankQuery in grouping
> -----------------------------
>
> Key: SOLR-8776
> URL: https://issues.apache.org/jira/browse/SOLR-8776
> Project: Solr
> Issue Type: Improvement
> Components: search
> Affects Versions: master
> Reporter: Diego Ceccarelli
> Priority: Minor
> Fix For: master
>
> Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, 0001-SOLR-8776-Support-RankQuery-in-grouping.patch
>
>
> Currently it is not possible to use RankQuery [1] and Grouping [2] together (see also [3]). In some situations Grouping can be replaced by Collapse and Expand Results [4] (that supports reranking), but i) collapse cannot guarantee that at least a minimum number of groups will be returned for a query, and ii) in the Solr Cloud setting you will have constraints on how to partition the documents among the shards.
> I'm going to start working on supporting RankQuery in grouping. I'll start attaching a patch with a test that fails because grouping does not support the rank query and then I'll try to fix the problem, starting from the non distributed setting (GroupingSearch).
> My feeling is that since grouping is mostly performed by Lucene, RankQuery should be refactored and moved (or partially moved) there.
> Any feedback is welcome.
> [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API
> [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> [3] http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3CCAHM-LpuvsPEsT-Sw63_8a6gt-wOr6dS_T_Nb2rOpe93e4+sTNQ@mail.gmail.com%3E
> [4] https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org