You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lavanya Thirumalaisami <la...@yahoo.co.in.INVALID> on 2019/01/01 23:03:41 UTC
Debugging Solr Search results & Issues with Distributed IDF
Hi,
I am trying to debug a query to find out why one documentgets more score than the other. The below are two similar products.
Below is the debug results I get from Solr admin console.
"Doc1": "\n15.20965 = sum of:\n 4.7573533 = max of:\n 4.7573533= weight(All:2x in 962) [], result of:\n 4.7573533 = score(doc=962,freq=2.0 =termFreq=2.0\n), product of:\n 3.4598935 = idf(docFreq=1346, docCount=42836)\n 1.375 = tfNorm, computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.0 = parameter b (norms omitted forfield)\n 10.452296 = max of:\n 5.9166136 = weight(All:powerpoint in 962)[], result of:\n 5.9166136 =score(doc=962,freq=2.0 = termFreq=2.0\n), product of:\n 4.302992 = idf(docFreq=579,docCount=42836)\n 1.375 = tfNorm,computed from:\n 2.0 =termFreq=2.0\n 1.2 = parameterk1\n 0.0 = parameter b (normsomitted for field)\n 10.452296 =weight(All:\"socket outlet\" in 962) [], result of:\n 10.452296 = score(doc=962,freq=2.0 =phraseFreq=2.0\n), product of:\n 7.60167 = idf(), sum of:\n 3.5370626 = idf(docFreq=1246, docCount=42836)\n 4.064607 = idf(docFreq=735,docCount=42836)\n 1.375 = tfNorm,computed from:\n 2.0 =phraseFreq=2.0\n 1.2 = parameterk1\n 0.0 = parameter b (normsomitted for field)\n",
"Doc15":"\n13.258003 = sum of:\n 5.7317085 = max of:\n 5.7317085 = weight(All:doubl in 2122) [],result of:\n 5.7317085 =score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n 4.168515 = idf(docFreq=663,docCount=42874)\n 1.375 = tfNorm,computed from:\n 2.0 =termFreq=2.0\n 1.2 = parameterk1\n 0.0 = parameter b (normsomitted for field)\n 4.7657394 =weight(All:2x in 2122) [], result of:\n 4.7657394 = score(doc=2122,freq=2.0 = termFreq=2.0\n), productof:\n 3.4659925 =idf(docFreq=1339, docCount=42874)\n 1.375 = tfNorm, computed from:\n 2.0 = termFreq=2.0\n 1.2= parameter k1\n 0.0 = parameterb (norms omitted for field)\n 5.390302= weight(All:2g in 2122) [], result of:\n 5.390302 = score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n 3.9202197 = idf(docFreq=850,docCount=42874)\n 1.375 = tfNorm,computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.0 = parameter b (norms omitted forfield)\n 7.526294 = max of:\n 5.8597584 = weight(All:powerpoint in 2122)[], result of:\n 5.8597584 =score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n 4.2616425 = idf(docFreq=604,docCount=42874)\n 1.375 = tfNorm,computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.0 = parameter b (norms omitted forfield)\n 7.526294 =weight(All:\"socket outlet\" in 2122) [], result of:\n 7.526294 = score(doc=2122,freq=1.0 =phraseFreq=1.0\n), product of:\n 7.526294 = idf(), sum of:\n 3.4955401 = idf(docFreq=1300, docCount=42874)\n 4.030754 = idf(docFreq=761,docCount=42874)\n 1.0 = tfNorm,computed from:\n 1.0 =phraseFreq=1.0\n 1.2 = parameterk1\n 0.0 = parameter b (normsomitted for field)\n",
My Questions
1. IDF : I understand from solr documents that IDFis calculated for each separate shards, I have added the following stats cacheconfig to solrconfig.xml and reloaded collection
<statsCacheclass="org.apache.solr.search.stats.ExactStatsCache"/>
But even after that there is no change incalculated IDF.
2. What are parameter b and parameter K1?
3. Why there are lots of parameters included in myDoc15 rather than Doc1?
Is there any documentations I can refer to understand thesolr query calculations in depth.
We are using Solr 6.1in Cloud with 3 zookeepers and 3 masters and 3 replicas.
Regards,
Lavanya
Re: Debugging Solr Search results & Issues with Distributed IDF
Posted by Lavanya Thirumalaisami <la...@yahoo.co.in.INVALID>.
Thank you for the inputs Doug and Charlie.
On Wednesday, 2 January 2019, 11:39:13 pm AEDT, Doug Turnbull <dt...@opensourceconnections.com> wrote:
On (2) these are BM25 parameters. There are several articles that discuss
BM25 in depth
https://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/
https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables
On Tue, Jan 1, 2019 at 6:04 PM Lavanya Thirumalaisami
<la...@yahoo.co.in.invalid> wrote:
>
> Hi,
>
> I am trying to debug a query to find out why one documentgets more score
> than the other. The below are two similar products.
>
> Below is the debug results I get from Solr admin console.
>
> "Doc1": "\n15.20965 = sum of:\n 4.7573533 = max of:\n 4.7573533=
> weight(All:2x in 962) [], result of:\n 4.7573533 =
> score(doc=962,freq=2.0 =termFreq=2.0\n), product of:\n 3.4598935 =
> idf(docFreq=1346, docCount=42836)\n 1.375 = tfNorm, computed
> from:\n 2.0 = termFreq=2.0\n 1.2 = parameter
> k1\n 0.0 = parameter b (norms omitted forfield)\n 10.452296 = max
> of:\n 5.9166136 = weight(All:powerpoint in 962)[], result of:\n
> 5.9166136 =score(doc=962,freq=2.0 = termFreq=2.0\n), product of:\n
> 4.302992 = idf(docFreq=579,docCount=42836)\n 1.375 = tfNorm,computed
> from:\n 2.0 =termFreq=2.0\n 1.2 = parameterk1\n
> 0.0 = parameter b (normsomitted for field)\n 10.452296
> =weight(All:\"socket outlet\" in 962) [], result of:\n 10.452296 =
> score(doc=962,freq=2.0 =phraseFreq=2.0\n), product of:\n 7.60167 =
> idf(), sum of:\n 3.5370626 = idf(docFreq=1246,
> docCount=42836)\n 4.064607 =
> idf(docFreq=735,docCount=42836)\n 1.375 = tfNorm,computed
> from:\n 2.0 =phraseFreq=2.0\n 1.2 =
> parameterk1\n 0.0 = parameter b (normsomitted for field)\n",
>
> "Doc15":"\n13.258003 = sum of:\n 5.7317085 = max of:\n 5.7317085 =
> weight(All:doubl in 2122) [],result of:\n 5.7317085
> =score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n 4.168515 =
> idf(docFreq=663,docCount=42874)\n 1.375 = tfNorm,computed
> from:\n 2.0 =termFreq=2.0\n 1.2 = parameterk1\n
> 0.0 = parameter b (normsomitted for field)\n 4.7657394 =weight(All:2x in
> 2122) [], result of:\n 4.7657394 = score(doc=2122,freq=2.0 =
> termFreq=2.0\n), productof:\n 3.4659925 =idf(docFreq=1339,
> docCount=42874)\n 1.375 = tfNorm, computed from:\n 2.0 =
> termFreq=2.0\n 1.2= parameter k1\n 0.0 = parameterb
> (norms omitted for field)\n 5.390302= weight(All:2g in 2122) [], result
> of:\n 5.390302 = score(doc=2122,freq=2.0 = termFreq=2.0\n), product
> of:\n 3.9202197 = idf(docFreq=850,docCount=42874)\n 1.375 =
> tfNorm,computed from:\n 2.0 = termFreq=2.0\n 1.2 =
> parameter k1\n 0.0 = parameter b (norms omitted forfield)\n
> 7.526294 = max of:\n 5.8597584 = weight(All:powerpoint in 2122)[],
> result of:\n 5.8597584 =score(doc=2122,freq=2.0 = termFreq=2.0\n),
> product of:\n 4.2616425 = idf(docFreq=604,docCount=42874)\n
> 1.375 = tfNorm,computed from:\n 2.0 = termFreq=2.0\n 1.2
> = parameter k1\n 0.0 = parameter b (norms omitted forfield)\n
> 7.526294 =weight(All:\"socket outlet\" in 2122) [], result of:\n
> 7.526294 = score(doc=2122,freq=1.0 =phraseFreq=1.0\n), product
> of:\n 7.526294 = idf(), sum of:\n 3.4955401 =
> idf(docFreq=1300, docCount=42874)\n 4.030754 =
> idf(docFreq=761,docCount=42874)\n 1.0 = tfNorm,computed
> from:\n 1.0 =phraseFreq=1.0\n 1.2 =
> parameterk1\n 0.0 = parameter b (normsomitted for field)\n",
>
>
>
> My Questions
>
> 1. IDF : I understand from solr documents that IDFis calculated for
> each separate shards, I have added the following stats cacheconfig to
> solrconfig.xml and reloaded collection
>
> <statsCacheclass="org.apache.solr.search.stats.ExactStatsCache"/>
>
> But even after that there is no change incalculated IDF.
>
> 2. What are parameter b and parameter K1?
>
> 3. Why there are lots of parameters included in myDoc15 rather than
> Doc1?
>
> Is there any documentations I can refer to understand thesolr query
> calculations in depth.
>
> We are using Solr 6.1in Cloud with 3 zookeepers and 3 masters and 3
> replicas.
>
> Regards,
> Lavanya
>
--
*Doug Turnbull **| CTO* | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
Re: Debugging Solr Search results & Issues with Distributed IDF
Posted by Doug Turnbull <dt...@opensourceconnections.com>.
On (2) these are BM25 parameters. There are several articles that discuss
BM25 in depth
https://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/
https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables
On Tue, Jan 1, 2019 at 6:04 PM Lavanya Thirumalaisami
<la...@yahoo.co.in.invalid> wrote:
>
> Hi,
>
> I am trying to debug a query to find out why one documentgets more score
> than the other. The below are two similar products.
>
> Below is the debug results I get from Solr admin console.
>
> "Doc1": "\n15.20965 = sum of:\n 4.7573533 = max of:\n 4.7573533=
> weight(All:2x in 962) [], result of:\n 4.7573533 =
> score(doc=962,freq=2.0 =termFreq=2.0\n), product of:\n 3.4598935 =
> idf(docFreq=1346, docCount=42836)\n 1.375 = tfNorm, computed
> from:\n 2.0 = termFreq=2.0\n 1.2 = parameter
> k1\n 0.0 = parameter b (norms omitted forfield)\n 10.452296 = max
> of:\n 5.9166136 = weight(All:powerpoint in 962)[], result of:\n
> 5.9166136 =score(doc=962,freq=2.0 = termFreq=2.0\n), product of:\n
> 4.302992 = idf(docFreq=579,docCount=42836)\n 1.375 = tfNorm,computed
> from:\n 2.0 =termFreq=2.0\n 1.2 = parameterk1\n
> 0.0 = parameter b (normsomitted for field)\n 10.452296
> =weight(All:\"socket outlet\" in 962) [], result of:\n 10.452296 =
> score(doc=962,freq=2.0 =phraseFreq=2.0\n), product of:\n 7.60167 =
> idf(), sum of:\n 3.5370626 = idf(docFreq=1246,
> docCount=42836)\n 4.064607 =
> idf(docFreq=735,docCount=42836)\n 1.375 = tfNorm,computed
> from:\n 2.0 =phraseFreq=2.0\n 1.2 =
> parameterk1\n 0.0 = parameter b (normsomitted for field)\n",
>
> "Doc15":"\n13.258003 = sum of:\n 5.7317085 = max of:\n 5.7317085 =
> weight(All:doubl in 2122) [],result of:\n 5.7317085
> =score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n 4.168515 =
> idf(docFreq=663,docCount=42874)\n 1.375 = tfNorm,computed
> from:\n 2.0 =termFreq=2.0\n 1.2 = parameterk1\n
> 0.0 = parameter b (normsomitted for field)\n 4.7657394 =weight(All:2x in
> 2122) [], result of:\n 4.7657394 = score(doc=2122,freq=2.0 =
> termFreq=2.0\n), productof:\n 3.4659925 =idf(docFreq=1339,
> docCount=42874)\n 1.375 = tfNorm, computed from:\n 2.0 =
> termFreq=2.0\n 1.2= parameter k1\n 0.0 = parameterb
> (norms omitted for field)\n 5.390302= weight(All:2g in 2122) [], result
> of:\n 5.390302 = score(doc=2122,freq=2.0 = termFreq=2.0\n), product
> of:\n 3.9202197 = idf(docFreq=850,docCount=42874)\n 1.375 =
> tfNorm,computed from:\n 2.0 = termFreq=2.0\n 1.2 =
> parameter k1\n 0.0 = parameter b (norms omitted forfield)\n
> 7.526294 = max of:\n 5.8597584 = weight(All:powerpoint in 2122)[],
> result of:\n 5.8597584 =score(doc=2122,freq=2.0 = termFreq=2.0\n),
> product of:\n 4.2616425 = idf(docFreq=604,docCount=42874)\n
> 1.375 = tfNorm,computed from:\n 2.0 = termFreq=2.0\n 1.2
> = parameter k1\n 0.0 = parameter b (norms omitted forfield)\n
> 7.526294 =weight(All:\"socket outlet\" in 2122) [], result of:\n
> 7.526294 = score(doc=2122,freq=1.0 =phraseFreq=1.0\n), product
> of:\n 7.526294 = idf(), sum of:\n 3.4955401 =
> idf(docFreq=1300, docCount=42874)\n 4.030754 =
> idf(docFreq=761,docCount=42874)\n 1.0 = tfNorm,computed
> from:\n 1.0 =phraseFreq=1.0\n 1.2 =
> parameterk1\n 0.0 = parameter b (normsomitted for field)\n",
>
>
>
> My Questions
>
> 1. IDF : I understand from solr documents that IDFis calculated for
> each separate shards, I have added the following stats cacheconfig to
> solrconfig.xml and reloaded collection
>
> <statsCacheclass="org.apache.solr.search.stats.ExactStatsCache"/>
>
> But even after that there is no change incalculated IDF.
>
> 2. What are parameter b and parameter K1?
>
> 3. Why there are lots of parameters included in myDoc15 rather than
> Doc1?
>
> Is there any documentations I can refer to understand thesolr query
> calculations in depth.
>
> We are using Solr 6.1in Cloud with 3 zookeepers and 3 masters and 3
> replicas.
>
> Regards,
> Lavanya
>
--
*Doug Turnbull **| CTO* | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.
Re: Debugging Solr Search results & Issues with Distributed IDF
Posted by Charlie Hull <ch...@flax.co.uk>.
On 01/01/2019 23:03, Lavanya Thirumalaisami wrote:
>
> Hi,
>
> I am trying to debug a query to find out why one documentgets more score than the other. The below are two similar products.
You might take a look at OSC's Splainer http://splainer.io/ or some of
the other tools I've written about recently at
http://www.flax.co.uk/blog/2018/11/15/defining-relevance-engineering-part-4-tools/
- note that this also covers some commercial offerings (and also that
I'm very happy to take any comments or additions!).
Cheers
Charlie
>
> Below is the debug results I get from Solr admin console.
>
> "Doc1": "\n15.20965 = sum of:\n 4.7573533 = max of:\n 4.7573533= weight(All:2x in 962) [], result of:\n 4.7573533 = score(doc=962,freq=2.0 =termFreq=2.0\n), product of:\n 3.4598935 = idf(docFreq=1346, docCount=42836)\n 1.375 = tfNorm, computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.0 = parameter b (norms omitted forfield)\n 10.452296 = max of:\n 5.9166136 = weight(All:powerpoint in 962)[], result of:\n 5.9166136 =score(doc=962,freq=2.0 = termFreq=2.0\n), product of:\n 4.302992 = idf(docFreq=579,docCount=42836)\n 1.375 = tfNorm,computed from:\n 2.0 =termFreq=2.0\n 1.2 = parameterk1\n 0.0 = parameter b (normsomitted for field)\n 10.452296 =weight(All:\"socket outlet\" in 962) [], result of:\n 10.452296 = score(doc=962,freq=2.0 =phraseFreq=2.0\n), product of:\n 7.60167 = idf(), sum of:\n 3.5370626 = idf(docFreq=1246, docCount=42836)\n 4.064607 = idf(docFreq=735,docCount=42836)\n 1.375 = tfNorm,computed from:\n 2.0 =phraseFreq=2.0\n 1.2 = parameterk1\n 0.0 = parameter b (normsomitted for field)\n",
>
> "Doc15":"\n13.258003 = sum of:\n 5.7317085 = max of:\n 5.7317085 = weight(All:doubl in 2122) [],result of:\n 5.7317085 =score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n 4.168515 = idf(docFreq=663,docCount=42874)\n 1.375 = tfNorm,computed from:\n 2.0 =termFreq=2.0\n 1.2 = parameterk1\n 0.0 = parameter b (normsomitted for field)\n 4.7657394 =weight(All:2x in 2122) [], result of:\n 4.7657394 = score(doc=2122,freq=2.0 = termFreq=2.0\n), productof:\n 3.4659925 =idf(docFreq=1339, docCount=42874)\n 1.375 = tfNorm, computed from:\n 2.0 = termFreq=2.0\n 1.2= parameter k1\n 0.0 = parameterb (norms omitted for field)\n 5.390302= weight(All:2g in 2122) [], result of:\n 5.390302 = score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n 3.9202197 = idf(docFreq=850,docCount=42874)\n 1.375 = tfNorm,computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.0 = parameter b (norms omitted forfield)\n 7.526294 = max of:\n 5.8597584 = weight(All:powerpoint in 2122)[], result of:\n 5.8597584 =score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n 4.2616425 = idf(docFreq=604,docCount=42874)\n 1.375 = tfNorm,computed from:\n 2.0 = termFreq=2.0\n 1.2 = parameter k1\n 0.0 = parameter b (norms omitted forfield)\n 7.526294 =weight(All:\"socket outlet\" in 2122) [], result of:\n 7.526294 = score(doc=2122,freq=1.0 =phraseFreq=1.0\n), product of:\n 7.526294 = idf(), sum of:\n 3.4955401 = idf(docFreq=1300, docCount=42874)\n 4.030754 = idf(docFreq=761,docCount=42874)\n 1.0 = tfNorm,computed from:\n 1.0 =phraseFreq=1.0\n 1.2 = parameterk1\n 0.0 = parameter b (normsomitted for field)\n",
>
>
>
> My Questions
>
> 1. IDF : I understand from solr documents that IDFis calculated for each separate shards, I have added the following stats cacheconfig to solrconfig.xml and reloaded collection
>
> <statsCacheclass="org.apache.solr.search.stats.ExactStatsCache"/>
>
> But even after that there is no change incalculated IDF.
>
> 2. What are parameter b and parameter K1?
>
> 3. Why there are lots of parameters included in myDoc15 rather than Doc1?
>
> Is there any documentations I can refer to understand thesolr query calculations in depth.
>
> We are using Solr 6.1in Cloud with 3 zookeepers and 3 masters and 3 replicas.
>
> Regards,
> Lavanya
>
--
Charlie Hull
Flax - Open Source Enterprise Search
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk