You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Aman Deep Singh <am...@gmail.com> on 2017/09/22 10:52:21 UTC

mm is not working if you have same term multiple times in query

Hi,
I'm using Solr 6.6.0 i have set mm as 100% but when i have the repeated
search term then mm param is not honoured

I have 2 docs in index
Doc1-
name=lock
Doc 2-
name=lock lock

Now when i'm quering the solr with query
*http://localhost:8983/solr/test2/select?defType=dismax&qf=name&indent=on&mm=100%25&q=lock%20lock&wt=json
<http://localhost:8983/solr/test2/select?defType=dismax&qf=name&indent=on&mm=100%25&q=lock%20lock&wt=json>*
then it is returning both results but it should return only Doc 2 as no of
frequency is 2 in query while doc1 has frequency of 1 (lock term frequency).
Any Idea what to do ,to avoid getting doc 1 in resultset as i don't want
user to get the Doc1.
Schema
<field name="name" type="text_word_delimiter" indexed="true" stored="true"/>
<fieldType name="text_word_delimiter" class="solr.TextField"
autoGeneratePhraseQueries="false" positionIncrementGap="100"> <analyzer type
="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
"solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <
tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
"solr.ManagedSynonymFilterFactory" managed="synonyms_gdn"/> <filter class=
"solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

Their is no synonym is added also.

Thanks,
Aman Deep Singh

Re: mm is not working if you have same term multiple times in query

Posted by Chris Hostetter <ho...@fucit.org>.

: I'm using Solr 6.6.0 i have set mm as 100% but when i have the repeated
: search term then mm param is not honoured

: I have 2 docs in index
: Doc1-
: name=lock
: Doc 2-
: name=lock lock
: 
: Now when i'm quering the solr with query
: *http://localhost:8983/solr/test2/select?defType=dismax&qf=name&indent=on&mm=100%25&q=lock%20lock&wt=json

: then it is returning both results but it should return only Doc 2 as no of
: frequency is 2 in query while doc1 has frequency of 1 (lock term frequency).

There's a couple of misconceptions here...

first off: "mm" is a property of the "BooleanQuery" object that contains 
multiple SHOULD clauses -- it has nothign to do with the "frequency" of 
any clause/term -- if your BooleanQuery contains 2 SHOULD clauses, then 
the mm=2 will require that both clauses match.  If the 2 clauses are 
*identical* then BooleanQuery will actally optimize away one instance, and 
reduce the mm=1

second: even if BooleanQuery didn't have that optimization -- which was 
the case until ~6.x -- then your original query would *still* match Doc#1, 
because each clause (aka sub-query) would be evaluated independently.  the 
BooleanQuery would ask clause #1 "do you match doc#1?" and it would say 
"yes" -- then the BooleanQuery owuld ask clause #2 "do you match doc#1" 
and it would also say "yes" and so the BooleanQuery would say "i've 
reached the minimum number of SHOULD clauses i was configured to require 
for a match, so doc#1 is a match"


If you have a special case situation of wanting to require that term 
occurs at least X times -- the only way i can think of off the top of my 
head to do that would be using the termfreq() function.  

something like...

	q={!frange l=}termfreq(text,'lock')

https://lucene.apache.org/solr/guide/function-queries.html#termfreq-function
https://lucene.apache.org/solr/guide/other-parsers.html#function-range-query-parser


But i caution that while this might work in the specific example you gave, 
it's not really a drop in replacement for how you _thought_ mm should 
work, so a lot of things you might be trying to do with dismax+mm aren't 
going to have any sort of corollary here.

In general i'm curious as to your broader picture goal, nad if there isn't 
some better solution...


https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss
http://www.lucidworks.com/

Re: mm is not working if you have same term multiple times in query

Posted by Aman Deep Singh <am...@gmail.com>.

We can't use shingles as user can query lock and lock ,or any other
combination although and and some other words can be passed in stop word
processing but can't rely on that completely.

On 22-Sep-2017 7:00 PM, "Emir Arnautović" <em...@sematext.com>
wrote:

It seems to me that all OOTB solution would include some query parsing on
client side.
If those are adjacent values, you could try play with shingles to get it to
work.
Brainstorming: custom token filter that would assign token occurrence
number to each token: e.g.
“foo lock bar lock” would be indexed as foo1 lock1 bar1 lock2, but that
would mess score…

Maybe there is something specific about your usecase that could be used to
make it work.

Emir

> On 22 Sep 2017, at 15:17, Aman Deep Singh <am...@gmail.com>
wrote:
>
> Hi Emir,
> Thanks for the reply,
> I understand how the dismax/edismax works ,my problem is I don't want to
> show the results with one token only ,
> I cannot use phrase query here because the phrase query doesn't work with
> single word query so to do so we need to change the search request (qf or
> pf )dynamically ,will definitely try to use the function query.
>
> Thanks,
> Aman Deep Singh
>
> On 22-Sep-2017 6:25 PM, "Emir Arnautović" <em...@sematext.com>
> wrote:
>
>> Hi Aman,
>> You have wrong expectations: Edismax does respect mm, it’s just that it
is
>> met. If you take a look at parsed query, it’ll be something like:
>> +(((name:lock) (name:lock))~2)
>> And from dismax perspective it found both terms. It will not start
>> searching for the next term after first is found or look at term
frequency.
>> You can use phrase query to make sure that lock is close to lock or use
>> function query to make sure tf requirement is met.
>> Not sure what is your usecase.
>>
>> HTH,
>> Emir
>>
>>> On 22 Sep 2017, at 12:52, Aman Deep Singh <am...@gmail.com>
>> wrote:
>>>
>>> Hi,
>>> I'm using Solr 6.6.0 i have set mm as 100% but when i have the repeated
>>> search term then mm param is not honoured
>>>
>>> I have 2 docs in index
>>> Doc1-
>>> name=lock
>>> Doc 2-
>>> name=lock lock
>>>
>>> Now when i'm quering the solr with query
>>> *
>> http://localhost:8983/solr/test2/select?defType=dismax&
qf=name&indent=on&mm=100%25&q=lock%20lock&wt=json
>>> <
>> http://localhost:8983/solr/test2/select?defType=dismax&
qf=name&indent=on&mm=100%25&q=lock%20lock&wt=json
>>> *
>>> then it is returning both results but it should return only Doc 2 as no
>> of
>>> frequency is 2 in query while doc1 has frequency of 1 (lock term
>> frequency).
>>> Any Idea what to do ,to avoid getting doc 1 in resultset as i don't want
>>> user to get the Doc1.
>>> Schema
>>> <field name="name" type="text_word_delimiter" indexed="true"
>> stored="true"/>
>>> <fieldType name="text_word_delimiter" class="solr.TextField"
>>> autoGeneratePhraseQueries="false" positionIncrementGap="100"> <analyzer
>> type
>>> ="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter
>> class=
>>> "solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <
>>> tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
>>> "solr.ManagedSynonymFilterFactory" managed="synonyms_gdn"/> <filter
>> class=
>>> "solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
>>>
>>> Their is no synonym is added also.
>>>
>>> Thanks,
>>> Aman Deep Singh
>>
>>

Re: mm is not working if you have same term multiple times in query

Posted by Emir Arnautović <em...@sematext.com>.

It seems to me that all OOTB solution would include some query parsing on client side. 
If those are adjacent values, you could try play with shingles to get it to work. 
Brainstorming: custom token filter that would assign token occurrence number to each token: e.g.
“foo lock bar lock” would be indexed as foo1 lock1 bar1 lock2, but that would mess score…

Maybe there is something specific about your usecase that could be used to make it work.

Emir

> On 22 Sep 2017, at 15:17, Aman Deep Singh <am...@gmail.com> wrote:
> 
> Hi Emir,
> Thanks for the reply,
> I understand how the dismax/edismax works ,my problem is I don't want to
> show the results with one token only ,
> I cannot use phrase query here because the phrase query doesn't work with
> single word query so to do so we need to change the search request (qf or
> pf )dynamically ,will definitely try to use the function query.
> 
> Thanks,
> Aman Deep Singh
> 
> On 22-Sep-2017 6:25 PM, "Emir Arnautović" <em...@sematext.com>
> wrote:
> 
>> Hi Aman,
>> You have wrong expectations: Edismax does respect mm, it’s just that it is
>> met. If you take a look at parsed query, it’ll be something like:
>> +(((name:lock) (name:lock))~2)
>> And from dismax perspective it found both terms. It will not start
>> searching for the next term after first is found or look at term frequency.
>> You can use phrase query to make sure that lock is close to lock or use
>> function query to make sure tf requirement is met.
>> Not sure what is your usecase.
>> 
>> HTH,
>> Emir
>> 
>>> On 22 Sep 2017, at 12:52, Aman Deep Singh <am...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> I'm using Solr 6.6.0 i have set mm as 100% but when i have the repeated
>>> search term then mm param is not honoured
>>> 
>>> I have 2 docs in index
>>> Doc1-
>>> name=lock
>>> Doc 2-
>>> name=lock lock
>>> 
>>> Now when i'm quering the solr with query
>>> *
>> http://localhost:8983/solr/test2/select?defType=dismax&qf=name&indent=on&mm=100%25&q=lock%20lock&wt=json
>>> <
>> http://localhost:8983/solr/test2/select?defType=dismax&qf=name&indent=on&mm=100%25&q=lock%20lock&wt=json
>>> *
>>> then it is returning both results but it should return only Doc 2 as no
>> of
>>> frequency is 2 in query while doc1 has frequency of 1 (lock term
>> frequency).
>>> Any Idea what to do ,to avoid getting doc 1 in resultset as i don't want
>>> user to get the Doc1.
>>> Schema
>>> <field name="name" type="text_word_delimiter" indexed="true"
>> stored="true"/>
>>> <fieldType name="text_word_delimiter" class="solr.TextField"
>>> autoGeneratePhraseQueries="false" positionIncrementGap="100"> <analyzer
>> type
>>> ="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter
>> class=
>>> "solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <
>>> tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
>>> "solr.ManagedSynonymFilterFactory" managed="synonyms_gdn"/> <filter
>> class=
>>> "solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
>>> 
>>> Their is no synonym is added also.
>>> 
>>> Thanks,
>>> Aman Deep Singh
>> 
>>

Re: mm is not working if you have same term multiple times in query

Posted by Aman Deep Singh <am...@gmail.com>.

Hi Emir,
Thanks for the reply,
I understand how the dismax/edismax works ,my problem is I don't want to
show the results with one token only ,
I cannot use phrase query here because the phrase query doesn't work with
single word query so to do so we need to change the search request (qf or
pf )dynamically ,will definitely try to use the function query.

Thanks,
Aman Deep Singh

On 22-Sep-2017 6:25 PM, "Emir Arnautović" <em...@sematext.com>
wrote:

> Hi Aman,
> You have wrong expectations: Edismax does respect mm, it’s just that it is
> met. If you take a look at parsed query, it’ll be something like:
> +(((name:lock) (name:lock))~2)
> And from dismax perspective it found both terms. It will not start
> searching for the next term after first is found or look at term frequency.
> You can use phrase query to make sure that lock is close to lock or use
> function query to make sure tf requirement is met.
> Not sure what is your usecase.
>
> HTH,
> Emir
>
> > On 22 Sep 2017, at 12:52, Aman Deep Singh <am...@gmail.com>
> wrote:
> >
> > Hi,
> > I'm using Solr 6.6.0 i have set mm as 100% but when i have the repeated
> > search term then mm param is not honoured
> >
> > I have 2 docs in index
> > Doc1-
> > name=lock
> > Doc 2-
> > name=lock lock
> >
> > Now when i'm quering the solr with query
> > *
> http://localhost:8983/solr/test2/select?defType=dismax&qf=name&indent=on&mm=100%25&q=lock%20lock&wt=json
> > <
> http://localhost:8983/solr/test2/select?defType=dismax&qf=name&indent=on&mm=100%25&q=lock%20lock&wt=json
> >*
> > then it is returning both results but it should return only Doc 2 as no
> of
> > frequency is 2 in query while doc1 has frequency of 1 (lock term
> frequency).
> > Any Idea what to do ,to avoid getting doc 1 in resultset as i don't want
> > user to get the Doc1.
> > Schema
> > <field name="name" type="text_word_delimiter" indexed="true"
> stored="true"/>
> > <fieldType name="text_word_delimiter" class="solr.TextField"
> > autoGeneratePhraseQueries="false" positionIncrementGap="100"> <analyzer
> type
> > ="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter
> class=
> > "solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <
> > tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
> > "solr.ManagedSynonymFilterFactory" managed="synonyms_gdn"/> <filter
> class=
> > "solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
> >
> > Their is no synonym is added also.
> >
> > Thanks,
> > Aman Deep Singh
>
>

Re: mm is not working if you have same term multiple times in query

Posted by Emir Arnautović <em...@sematext.com>.

Hi Aman,
You have wrong expectations: Edismax does respect mm, it’s just that it is met. If you take a look at parsed query, it’ll be something like:
+(((name:lock) (name:lock))~2)
And from dismax perspective it found both terms. It will not start searching for the next term after first is found or look at term frequency. You can use phrase query to make sure that lock is close to lock or use function query to make sure tf requirement is met.
Not sure what is your usecase.

HTH,
Emir

> On 22 Sep 2017, at 12:52, Aman Deep Singh <am...@gmail.com> wrote:
> 
> Hi,
> I'm using Solr 6.6.0 i have set mm as 100% but when i have the repeated
> search term then mm param is not honoured
> 
> I have 2 docs in index
> Doc1-
> name=lock
> Doc 2-
> name=lock lock
> 
> Now when i'm quering the solr with query
> *http://localhost:8983/solr/test2/select?defType=dismax&qf=name&indent=on&mm=100%25&q=lock%20lock&wt=json
> <http://localhost:8983/solr/test2/select?defType=dismax&qf=name&indent=on&mm=100%25&q=lock%20lock&wt=json>*
> then it is returning both results but it should return only Doc 2 as no of
> frequency is 2 in query while doc1 has frequency of 1 (lock term frequency).
> Any Idea what to do ,to avoid getting doc 1 in resultset as i don't want
> user to get the Doc1.
> Schema
> <field name="name" type="text_word_delimiter" indexed="true" stored="true"/>
> <fieldType name="text_word_delimiter" class="solr.TextField"
> autoGeneratePhraseQueries="false" positionIncrementGap="100"> <analyzer type
> ="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
> "solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <
> tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
> "solr.ManagedSynonymFilterFactory" managed="synonyms_gdn"/> <filter class=
> "solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
> 
> Their is no synonym is added also.
> 
> Thanks,
> Aman Deep Singh