You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Spyros Kapnissis <sk...@gmail.com> on 2020/05/11 07:23:50 UTC

Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

HI all,

On our current master/slave setup (no cloud), we use a a custom sorting
function to get the first pass results (using the sort param), and then we
use LTR for re-ranking. This works fine, i.e. re-ranking is applied on the
topN, after sorting has completed and the order is correct.

However, as we are migrating on SolrCloud (version 7.3.1) with multiple
shards, this does not seem to work as expected. To my understanding, Solr
collects the reranked results from the shards back on a single node to
merge them, and then tries to re-apply sorting.

We would expect the results to at least follow the sorting formula, even if
this is not what we want. But this still not even the case, as the
combination of the two (sorting + reranking) results in erratic ordering.

Example result, where $sort_score is the sorting formula output, and score
is the LTR re-ranked output:

{"id": "152",
"$sort_score": 17.38543,
"score": 0.22140852
},
{"id": "2016",
"$sort_score": 14.612957,
"score": 0.19214153
},
{ "id": "1523",
"$sort_score": 14.4093275,
"score": 0.26738763
},
{ "id": "6704",
"$sort_score": 13.956842,
"score": 0.17357588
},
{ "id": "6512",
"$sort_score": 14.43907,
"score": 0.11575622
},

We also tried with other simple re-rank queries apart from LTR, and the
issue persisted.

Could someone please help troubleshoot? Ideally, we would want to have the
re-rank results merged on the single node, and not re-apply sorting.

Thank you!

Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

Posted by Dmitry Kan <so...@gmail.com>.
Hi Spyros,

Did you manage to solve this issue and if yes, can you please share your
solution?

On Mon, May 11, 2020 at 10:24 AM Spyros Kapnissis <sk...@gmail.com> wrote:

> HI all,
>
> On our current master/slave setup (no cloud), we use a a custom sorting
> function to get the first pass results (using the sort param), and then we
> use LTR for re-ranking. This works fine, i.e. re-ranking is applied on the
> topN, after sorting has completed and the order is correct.
>
> However, as we are migrating on SolrCloud (version 7.3.1) with multiple
> shards, this does not seem to work as expected. To my understanding, Solr
> collects the reranked results from the shards back on a single node to
> merge them, and then tries to re-apply sorting.
>
> We would expect the results to at least follow the sorting formula, even if
> this is not what we want. But this still not even the case, as the
> combination of the two (sorting + reranking) results in erratic ordering.
>
> Example result, where $sort_score is the sorting formula output, and score
> is the LTR re-ranked output:
>
> {"id": "152",
> "$sort_score": 17.38543,
> "score": 0.22140852
> },
> {"id": "2016",
> "$sort_score": 14.612957,
> "score": 0.19214153
> },
> { "id": "1523",
> "$sort_score": 14.4093275,
> "score": 0.26738763
> },
> { "id": "6704",
> "$sort_score": 13.956842,
> "score": 0.17357588
> },
> { "id": "6512",
> "$sort_score": 14.43907,
> "score": 0.11575622
> },
>
> We also tried with other simple re-rank queries apart from LTR, and the
> issue persisted.
>
> Could someone please help troubleshoot? Ideally, we would want to have the
> re-rank results merged on the single node, and not re-apply sorting.
>
> Thank you!
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: https://semanticanalyzer.info

Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

Posted by Dmitry Kan <so...@gmail.com>.
Hi Jörg,

Thanks for this link -- one of our search engineers started looking into
this, because the issue with sorting in a federated setting concerns
non-LTR based ranking as well.
In particular, it becomes visible in cursor based pagination in collections
that have shards with replicas. At any given time a replica can be behind
in stats and that causes issues in sorting and pagination.

I really hope LTR documentation can be updated with notes on handling
federated searches, because this should affect many Solr LTR users.

On Fri, Aug 28, 2020 at 4:28 PM Jörn Franke <jo...@gmail.com> wrote:

> Maybe this can help you?
>
> https://lucene.apache.org/solr/guide/7_5/distributed-requests.html#configuring-statscache-distributed-idf
>
> On Mon, May 11, 2020 at 9:24 AM Spyros Kapnissis <sk...@gmail.com> wrote:
>
> > HI all,
> >
> > On our current master/slave setup (no cloud), we use a a custom sorting
> > function to get the first pass results (using the sort param), and then
> we
> > use LTR for re-ranking. This works fine, i.e. re-ranking is applied on
> the
> > topN, after sorting has completed and the order is correct.
> >
> > However, as we are migrating on SolrCloud (version 7.3.1) with multiple
> > shards, this does not seem to work as expected. To my understanding, Solr
> > collects the reranked results from the shards back on a single node to
> > merge them, and then tries to re-apply sorting.
> >
> > We would expect the results to at least follow the sorting formula, even
> if
> > this is not what we want. But this still not even the case, as the
> > combination of the two (sorting + reranking) results in erratic ordering.
> >
> > Example result, where $sort_score is the sorting formula output, and
> score
> > is the LTR re-ranked output:
> >
> > {"id": "152",
> > "$sort_score": 17.38543,
> > "score": 0.22140852
> > },
> > {"id": "2016",
> > "$sort_score": 14.612957,
> > "score": 0.19214153
> > },
> > { "id": "1523",
> > "$sort_score": 14.4093275,
> > "score": 0.26738763
> > },
> > { "id": "6704",
> > "$sort_score": 13.956842,
> > "score": 0.17357588
> > },
> > { "id": "6512",
> > "$sort_score": 14.43907,
> > "score": 0.11575622
> > },
> >
> > We also tried with other simple re-rank queries apart from LTR, and the
> > issue persisted.
> >
> > Could someone please help troubleshoot? Ideally, we would want to have
> the
> > re-rank results merged on the single node, and not re-apply sorting.
> >
> > Thank you!
> >
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: https://semanticanalyzer.info

Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

Posted by Dmitry Kan <so...@gmail.com>.
Hi Spyros,

Thanks for sharing! This is certainly subject for a test, but I think that
LTR plugin could be modified to rerank the documents on the merging node.
For instance, if instead of solr cloud endpoint, you use a separate solr
instance to route and aggregate the federated results, the reranking could
happen only once inside that instance.

Another approach with score normalization is mentioned here:
https://sease.io/2016/10/apache-solr-learning-to-rank-better-part-4.html

On Fri, Aug 28, 2020 at 7:39 PM Spyros Kapnissis <sk...@gmail.com> wrote:

> Hi Dmitry,
>
> No, we were not able to solve the sorting/re-ranking issue. In the end we
> migrated the custom sorting formula to using the 'q' param instead of
> 'sort' to get back the results sorted by score as expected.
>
> That mostly solved our issues with inconsistent Solr scores. Maybe sorting
> and re-ranking are conflicting concepts.
>
> Hope this helps.
>
>
> On Fri, Aug 28, 2020 at 4:28 PM Jörn Franke <jo...@gmail.com> wrote:
>
> > Maybe this can help you?
> >
> >
> https://lucene.apache.org/solr/guide/7_5/distributed-requests.html#configuring-statscache-distributed-idf
> >
> > On Mon, May 11, 2020 at 9:24 AM Spyros Kapnissis <sk...@gmail.com>
> wrote:
> >
> > > HI all,
> > >
> > > On our current master/slave setup (no cloud), we use a a custom sorting
> > > function to get the first pass results (using the sort param), and then
> > we
> > > use LTR for re-ranking. This works fine, i.e. re-ranking is applied on
> > the
> > > topN, after sorting has completed and the order is correct.
> > >
> > > However, as we are migrating on SolrCloud (version 7.3.1) with multiple
> > > shards, this does not seem to work as expected. To my understanding,
> Solr
> > > collects the reranked results from the shards back on a single node to
> > > merge them, and then tries to re-apply sorting.
> > >
> > > We would expect the results to at least follow the sorting formula,
> even
> > if
> > > this is not what we want. But this still not even the case, as the
> > > combination of the two (sorting + reranking) results in erratic
> ordering.
> > >
> > > Example result, where $sort_score is the sorting formula output, and
> > score
> > > is the LTR re-ranked output:
> > >
> > > {"id": "152",
> > > "$sort_score": 17.38543,
> > > "score": 0.22140852
> > > },
> > > {"id": "2016",
> > > "$sort_score": 14.612957,
> > > "score": 0.19214153
> > > },
> > > { "id": "1523",
> > > "$sort_score": 14.4093275,
> > > "score": 0.26738763
> > > },
> > > { "id": "6704",
> > > "$sort_score": 13.956842,
> > > "score": 0.17357588
> > > },
> > > { "id": "6512",
> > > "$sort_score": 14.43907,
> > > "score": 0.11575622
> > > },
> > >
> > > We also tried with other simple re-rank queries apart from LTR, and the
> > > issue persisted.
> > >
> > > Could someone please help troubleshoot? Ideally, we would want to have
> > the
> > > re-rank results merged on the single node, and not re-apply sorting.
> > >
> > > Thank you!
> > >
> >
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: https://semanticanalyzer.info

Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

Posted by Spyros Kapnissis <sk...@gmail.com>.
Hi Dmitry,

No, we were not able to solve the sorting/re-ranking issue. In the end we
migrated the custom sorting formula to using the 'q' param instead of
'sort' to get back the results sorted by score as expected.

That mostly solved our issues with inconsistent Solr scores. Maybe sorting
and re-ranking are conflicting concepts.

Hope this helps.


On Fri, Aug 28, 2020 at 4:28 PM Jörn Franke <jo...@gmail.com> wrote:

> Maybe this can help you?
>
> https://lucene.apache.org/solr/guide/7_5/distributed-requests.html#configuring-statscache-distributed-idf
>
> On Mon, May 11, 2020 at 9:24 AM Spyros Kapnissis <sk...@gmail.com> wrote:
>
> > HI all,
> >
> > On our current master/slave setup (no cloud), we use a a custom sorting
> > function to get the first pass results (using the sort param), and then
> we
> > use LTR for re-ranking. This works fine, i.e. re-ranking is applied on
> the
> > topN, after sorting has completed and the order is correct.
> >
> > However, as we are migrating on SolrCloud (version 7.3.1) with multiple
> > shards, this does not seem to work as expected. To my understanding, Solr
> > collects the reranked results from the shards back on a single node to
> > merge them, and then tries to re-apply sorting.
> >
> > We would expect the results to at least follow the sorting formula, even
> if
> > this is not what we want. But this still not even the case, as the
> > combination of the two (sorting + reranking) results in erratic ordering.
> >
> > Example result, where $sort_score is the sorting formula output, and
> score
> > is the LTR re-ranked output:
> >
> > {"id": "152",
> > "$sort_score": 17.38543,
> > "score": 0.22140852
> > },
> > {"id": "2016",
> > "$sort_score": 14.612957,
> > "score": 0.19214153
> > },
> > { "id": "1523",
> > "$sort_score": 14.4093275,
> > "score": 0.26738763
> > },
> > { "id": "6704",
> > "$sort_score": 13.956842,
> > "score": 0.17357588
> > },
> > { "id": "6512",
> > "$sort_score": 14.43907,
> > "score": 0.11575622
> > },
> >
> > We also tried with other simple re-rank queries apart from LTR, and the
> > issue persisted.
> >
> > Could someone please help troubleshoot? Ideally, we would want to have
> the
> > re-rank results merged on the single node, and not re-apply sorting.
> >
> > Thank you!
> >
>

Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

Posted by Jörn Franke <jo...@gmail.com>.
Maybe this can help you?
https://lucene.apache.org/solr/guide/7_5/distributed-requests.html#configuring-statscache-distributed-idf

On Mon, May 11, 2020 at 9:24 AM Spyros Kapnissis <sk...@gmail.com> wrote:

> HI all,
>
> On our current master/slave setup (no cloud), we use a a custom sorting
> function to get the first pass results (using the sort param), and then we
> use LTR for re-ranking. This works fine, i.e. re-ranking is applied on the
> topN, after sorting has completed and the order is correct.
>
> However, as we are migrating on SolrCloud (version 7.3.1) with multiple
> shards, this does not seem to work as expected. To my understanding, Solr
> collects the reranked results from the shards back on a single node to
> merge them, and then tries to re-apply sorting.
>
> We would expect the results to at least follow the sorting formula, even if
> this is not what we want. But this still not even the case, as the
> combination of the two (sorting + reranking) results in erratic ordering.
>
> Example result, where $sort_score is the sorting formula output, and score
> is the LTR re-ranked output:
>
> {"id": "152",
> "$sort_score": 17.38543,
> "score": 0.22140852
> },
> {"id": "2016",
> "$sort_score": 14.612957,
> "score": 0.19214153
> },
> { "id": "1523",
> "$sort_score": 14.4093275,
> "score": 0.26738763
> },
> { "id": "6704",
> "$sort_score": 13.956842,
> "score": 0.17357588
> },
> { "id": "6512",
> "$sort_score": 14.43907,
> "score": 0.11575622
> },
>
> We also tried with other simple re-rank queries apart from LTR, and the
> issue persisted.
>
> Could someone please help troubleshoot? Ideally, we would want to have the
> re-rank results merged on the single node, and not re-apply sorting.
>
> Thank you!
>