You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by John Bickerstaff <jo...@johnbickerstaff.com> on 2018/10/18 18:43:11 UTC

More Like This Query problems

All,


I am having trouble with a “more like this” query in Solr.


Here’s what I think should be happening:


1. Query contains Document ID (q=id:"942316176:9009:66
<http://10.157.117.55:10001/solr/BPS/select?=&debug=true&indent=on&mlt.fl=surnames,genders,givennames,birthlocations,deathlocations&mlt=true&q=id:%22942316176:9009:66%22&wt=json>
”)

2. I add the following (on the solr admin page, raw query parameters field)

          &mlt=true&mlt.fl=field1,field2,field3

3. More Like This will take the Document ID, look at the fields (field1,
field2, field3) and return a list of documents that have the best match to
the contents of those fields in “document Id”


What is happening is that I’m getting only one result and it is the same
document id as the one I sent in on the query.  What I expected was a list
of Doc ID’s for documents that have some kind of match to the submitted Doc
ID.


Any thoughts or advice would be appreciated.


===================


Here is an example of the query URL:


http://XX.XXX.XXX.XX:10001/solr/BPS/select?=&debug=true&indent=on&mlt.fl=field1,field2,field3&mlt=true&q=id:%22942316176:9009:66%22&wt=json


However, when I submit the query, I get only one document ID returned - the
same one I submitted in the first place.


Here is the important section of the response:


{

  "*responseHeader*":{

    "*zkConnected*":true,

    "*status*":0,

    "*QTime*":26,

    "*params*":{

      "*q*":"id:\"942316176:9009:66\"",

      "*debug*":"true",

      "*mlt*":"true",

      "*indent*":"on",

      "*mlt.fl*”:”field1,field2,field3",

      "*wt*":"json",

      "*_*":"1539881180264"}},

  "*response*":{"*numFound*":1,"*start*":0,"*maxScore*":1.0,"*docs*":[

      {

        "*id*":"942316176:9009:66",

        "*_version_*":1611920924010872837}]

  },

  "*moreLikeThis*":[

    "942316176:9009:66",{"*numFound*":0,"*start*":0,"*docs*":[]

    }],

  "*debug*":{

Re: More Like This Query problems

Posted by John Bickerstaff <jo...@johnbickerstaff.com>.
Found it.

My SOLR does NOT store fields and after some careful checking, it turns out
we do NOT do term vectors either...  So, according to the docs, MLT will
not work.

Thanks for the response David!

On Thu, Oct 18, 2018 at 1:44 PM John Bickerstaff <jo...@johnbickerstaff.com>
wrote:

> Thanks. There are many docs with matching words....  I've tried an
> extremely simplified case where a basic query (q=Field1:"foo") returns
> millions of results... however a MLT similar to the one I mention below,
> using a doc Id I know has "foo" in Field1 returns only the same Doc ID as
> submitted in the query.
>
>
> http://XX.XXX.XX.XXX:10001/solr/BPS/select?indent=on&q=Field1:%22foo%22&wt=json
> (Returns several million as "numFound)
>
>
> http://XX.XXX.XX.XXX:10001/solr/BPS/select?indent=on&mlt.fl=Field1&mlt=true&q=id:%2227000:9009:66%22&wt=json
> (returns only the same ID in the More Like This section)
>
> Wouldn't the AND NOT just eliminate my initial doc Id from the list?
> Assuming matches, we would still expect other ids to be returned in any
> case, wouldn't we?  Should that be a Filter Query?
>
> On Thu, Oct 18, 2018, 12:57 PM David Hastings <DH...@wshein.com>
> wrote:
>
>> Make sure your query has an “AND NOT id:your doc id”
>> Also be certain there are other documents that will meet your criteria
>> for a test case. Remember it’s unique words in your core/collection
>>
>> On Oct 18, 2018, at 2:43 PM, John Bickerstaff <john@johnbickerstaff.com
>> <ma...@johnbickerstaff.com>> wrote:
>>
>> All,
>>
>>
>> I am having trouble with a “more like this” query in Solr.
>>
>>
>> Here’s what I think should be happening:
>>
>>
>> 1. Query contains Document ID (q=id:"942316176:9009:66
>> <
>> http://10.157.117.55:10001/solr/BPS/select?=&debug=true&indent=on&mlt.fl=surnames,genders,givennames,birthlocations,deathlocations&mlt=true&q=id:%22942316176:9009:66%22&wt=json
>> >
>> ”)
>>
>> 2. I add the following (on the solr admin page, raw query parameters
>> field)
>>
>>          &mlt=true&mlt.fl=field1,field2,field3
>>
>> 3. More Like This will take the Document ID, look at the fields (field1,
>> field2, field3) and return a list of documents that have the best match to
>> the contents of those fields in “document Id”
>>
>>
>> What is happening is that I’m getting only one result and it is the same
>> document id as the one I sent in on the query.  What I expected was a list
>> of Doc ID’s for documents that have some kind of match to the submitted
>> Doc
>> ID.
>>
>>
>> Any thoughts or advice would be appreciated.
>>
>>
>> ===================
>>
>>
>> Here is an example of the query URL:
>>
>>
>>
>> http://XX.XXX.XXX.XX:10001/solr/BPS/select?=&debug=true&indent=on&mlt.fl=field1,field2,field3&mlt=true&q=id:%22942316176:9009:66%22&wt=json
>> <
>> http://xx.xxx.xxx.xx:10001/solr/BPS/select?=&debug=true&indent=on&mlt.fl=field1,field2,field3&mlt=true&q=id:%22942316176:9009:66%22&wt=json
>> >
>>
>>
>> However, when I submit the query, I get only one document ID returned -
>> the
>> same one I submitted in the first place.
>>
>>
>> Here is the important section of the response:
>>
>>
>> {
>>
>>  "*responseHeader*":{
>>
>>    "*zkConnected*":true,
>>
>>    "*status*":0,
>>
>>    "*QTime*":26,
>>
>>    "*params*":{
>>
>>      "*q*":"id:\"942316176:9009:66\"",
>>
>>      "*debug*":"true",
>>
>>      "*mlt*":"true",
>>
>>      "*indent*":"on",
>>
>>      "*mlt.fl*”:”field1,field2,field3",
>>
>>      "*wt*":"json",
>>
>>      "*_*":"1539881180264"}},
>>
>>  "*response*":{"*numFound*":1,"*start*":0,"*maxScore*":1.0,"*docs*":[
>>
>>      {
>>
>>        "*id*":"942316176:9009:66",
>>
>>        "*_version_*":1611920924010872837}]
>>
>>  },
>>
>>  "*moreLikeThis*":[
>>
>>    "942316176:9009:66",{"*numFound*":0,"*start*":0,"*docs*":[]
>>
>>    }],
>>
>>  "*debug*":{
>>
>

Re: More Like This Query problems

Posted by John Bickerstaff <jo...@johnbickerstaff.com>.
Thanks. There are many docs with matching words....  I've tried an
extremely simplified case where a basic query (q=Field1:"foo") returns
millions of results... however a MLT similar to the one I mention below,
using a doc Id I know has "foo" in Field1 returns only the same Doc ID as
submitted in the query.

http://XX.XXX.XX.XXX:10001/solr/BPS/select?indent=on&q=Field1:%22foo%22&wt=json
(Returns several million as "numFound)

http://XX.XXX.XX.XXX:10001/solr/BPS/select?indent=on&mlt.fl=Field1&mlt=true&q=id:%2227000:9009:66%22&wt=json
(returns only the same ID in the More Like This section)

Wouldn't the AND NOT just eliminate my initial doc Id from the list?
Assuming matches, we would still expect other ids to be returned in any
case, wouldn't we?  Should that be a Filter Query?

On Thu, Oct 18, 2018, 12:57 PM David Hastings <DH...@wshein.com> wrote:

> Make sure your query has an “AND NOT id:your doc id”
> Also be certain there are other documents that will meet your criteria for
> a test case. Remember it’s unique words in your core/collection
>
> On Oct 18, 2018, at 2:43 PM, John Bickerstaff <john@johnbickerstaff.com
> <ma...@johnbickerstaff.com>> wrote:
>
> All,
>
>
> I am having trouble with a “more like this” query in Solr.
>
>
> Here’s what I think should be happening:
>
>
> 1. Query contains Document ID (q=id:"942316176:9009:66
> <
> http://10.157.117.55:10001/solr/BPS/select?=&debug=true&indent=on&mlt.fl=surnames,genders,givennames,birthlocations,deathlocations&mlt=true&q=id:%22942316176:9009:66%22&wt=json
> >
> ”)
>
> 2. I add the following (on the solr admin page, raw query parameters field)
>
>          &mlt=true&mlt.fl=field1,field2,field3
>
> 3. More Like This will take the Document ID, look at the fields (field1,
> field2, field3) and return a list of documents that have the best match to
> the contents of those fields in “document Id”
>
>
> What is happening is that I’m getting only one result and it is the same
> document id as the one I sent in on the query.  What I expected was a list
> of Doc ID’s for documents that have some kind of match to the submitted Doc
> ID.
>
>
> Any thoughts or advice would be appreciated.
>
>
> ===================
>
>
> Here is an example of the query URL:
>
>
>
> http://XX.XXX.XXX.XX:10001/solr/BPS/select?=&debug=true&indent=on&mlt.fl=field1,field2,field3&mlt=true&q=id:%22942316176:9009:66%22&wt=json
> <
> http://xx.xxx.xxx.xx:10001/solr/BPS/select?=&debug=true&indent=on&mlt.fl=field1,field2,field3&mlt=true&q=id:%22942316176:9009:66%22&wt=json
> >
>
>
> However, when I submit the query, I get only one document ID returned - the
> same one I submitted in the first place.
>
>
> Here is the important section of the response:
>
>
> {
>
>  "*responseHeader*":{
>
>    "*zkConnected*":true,
>
>    "*status*":0,
>
>    "*QTime*":26,
>
>    "*params*":{
>
>      "*q*":"id:\"942316176:9009:66\"",
>
>      "*debug*":"true",
>
>      "*mlt*":"true",
>
>      "*indent*":"on",
>
>      "*mlt.fl*”:”field1,field2,field3",
>
>      "*wt*":"json",
>
>      "*_*":"1539881180264"}},
>
>  "*response*":{"*numFound*":1,"*start*":0,"*maxScore*":1.0,"*docs*":[
>
>      {
>
>        "*id*":"942316176:9009:66",
>
>        "*_version_*":1611920924010872837}]
>
>  },
>
>  "*moreLikeThis*":[
>
>    "942316176:9009:66",{"*numFound*":0,"*start*":0,"*docs*":[]
>
>    }],
>
>  "*debug*":{
>

Re: More Like This Query problems

Posted by David Hastings <DH...@wshein.com>.
Make sure your query has an “AND NOT id:your doc id”
Also be certain there are other documents that will meet your criteria for a test case. Remember it’s unique words in your core/collection

On Oct 18, 2018, at 2:43 PM, John Bickerstaff <jo...@johnbickerstaff.com>> wrote:

All,


I am having trouble with a “more like this” query in Solr.


Here’s what I think should be happening:


1. Query contains Document ID (q=id:"942316176:9009:66
<http://10.157.117.55:10001/solr/BPS/select?=&debug=true&indent=on&mlt.fl=surnames,genders,givennames,birthlocations,deathlocations&mlt=true&q=id:%22942316176:9009:66%22&wt=json>
”)

2. I add the following (on the solr admin page, raw query parameters field)

         &mlt=true&mlt.fl=field1,field2,field3

3. More Like This will take the Document ID, look at the fields (field1,
field2, field3) and return a list of documents that have the best match to
the contents of those fields in “document Id”


What is happening is that I’m getting only one result and it is the same
document id as the one I sent in on the query.  What I expected was a list
of Doc ID’s for documents that have some kind of match to the submitted Doc
ID.


Any thoughts or advice would be appreciated.


===================


Here is an example of the query URL:


http://XX.XXX.XXX.XX:10001/solr/BPS/select?=&debug=true&indent=on&mlt.fl=field1,field2,field3&mlt=true&q=id:%22942316176:9009:66%22&wt=json<http://xx.xxx.xxx.xx:10001/solr/BPS/select?=&debug=true&indent=on&mlt.fl=field1,field2,field3&mlt=true&q=id:%22942316176:9009:66%22&wt=json>


However, when I submit the query, I get only one document ID returned - the
same one I submitted in the first place.


Here is the important section of the response:


{

 "*responseHeader*":{

   "*zkConnected*":true,

   "*status*":0,

   "*QTime*":26,

   "*params*":{

     "*q*":"id:\"942316176:9009:66\"",

     "*debug*":"true",

     "*mlt*":"true",

     "*indent*":"on",

     "*mlt.fl*”:”field1,field2,field3",

     "*wt*":"json",

     "*_*":"1539881180264"}},

 "*response*":{"*numFound*":1,"*start*":0,"*maxScore*":1.0,"*docs*":[

     {

       "*id*":"942316176:9009:66",

       "*_version_*":1611920924010872837}]

 },

 "*moreLikeThis*":[

   "942316176:9009:66",{"*numFound*":0,"*start*":0,"*docs*":[]

   }],

 "*debug*":{