You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by gnandre <ar...@gmail.com> on 2022/10/07 16:31:48 UTC

Understanding LTR debug query output

Hi,

I have implemented LTR (LambdaRank) functionality but there are some search
cases where the relevancy is actually getting worse. I am trying to
understand why some results are ranked over the others. Naturally, I am
using a debug query to understand what is going on.

e.g. here is the explain response for one of the document:

doc:en:/help/coder/index.html":"\n0.93952394 =
(name=model,featureValues=[linkScore=1.7102735,hierScore=3.9314165,originalScore=0.029598212,tfidf_title=-0.3270329,tfidf_body=-0.6185444,tfidf_url=-0.8011434,tfidf_file_name=-0.37964302,tfidf_primary_header_en=-0.32059863,tfidf_secondary_header_en=0.36570454,tfidf_meta_description_en=-0.09497543,tfidf_inlink_text_en=-0.08638504,tfidf_indexed_not_highlighted_en=-0.2544066],layers=[(matrix=75x12,activation=relu),(matrix=1x75,activation=sigmoid)])\n

Can somebody tell me how the final score of 0.93952394 is getting
calculated for this document? Also, how are the featureValues
calculated? e.g. hierScore field value for this document is actually
0.5 but it shows up here as 3.9314165.

Re: Understanding LTR debug query output

Posted by Alessandro Benedetti <a....@sease.io>.
Hi,
from what I see you are using a Neural Network implementation as the model
(org.apache.solr.ltr.model.NeuralNetworkModel ?) and I agree is
definitely not the best in terms of explainability
(org.apache.solr.ltr.model.NeuralNetworkModel#explain).

Effectively it just summarizes the layers, the way the score is calculated
is using the weights in the layers and the activation function.
To be fair, even with a detailed formula, I suspect, as a human, you
wouldn't be getting much more anyway.

For the features, it should be easier to explain why they have that value,
you should take a look to the way you defined those in the features.json .
If hierScore is just a field value and doesn't match, possibly a bug? maybe
related with the numerical representation? What is the field type?

Cheers


--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benedetti@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Fri, 7 Oct 2022 at 12:32, gnandre <ar...@gmail.com> wrote:

> Hi,
>
> I have implemented LTR (LambdaRank) functionality but there are some search
> cases where the relevancy is actually getting worse. I am trying to
> understand why some results are ranked over the others. Naturally, I am
> using a debug query to understand what is going on.
>
> e.g. here is the explain response for one of the document:
>
> doc:en:/help/coder/index.html":"\n0.93952394 =
>
> (name=model,featureValues=[linkScore=1.7102735,hierScore=3.9314165,originalScore=0.029598212,tfidf_title=-0.3270329,tfidf_body=-0.6185444,tfidf_url=-0.8011434,tfidf_file_name=-0.37964302,tfidf_primary_header_en=-0.32059863,tfidf_secondary_header_en=0.36570454,tfidf_meta_description_en=-0.09497543,tfidf_inlink_text_en=-0.08638504,tfidf_indexed_not_highlighted_en=-0.2544066],layers=[(matrix=75x12,activation=relu),(matrix=1x75,activation=sigmoid)])\n
>
> Can somebody tell me how the final score of 0.93952394 is getting
> calculated for this document? Also, how are the featureValues
> calculated? e.g. hierScore field value for this document is actually
> 0.5 but it shows up here as 3.9314165.
>