You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by dboychuck <db...@build.com> on 2017/02/23 21:14:34 UTC

Phrase field matches not counting towards minimum match

Ok let me explain what I am trying to do first since there may be a better
approach. Recently I had been trying to increase solr's matching precision
by requiring that all of the words in a field match before allowing a match
on a field. I am using edismax as my query parser and since it tokenizes on
white space there's no way to make sure that if my query is q=foo bar and I
have a field named somefield indexed as a text field with foo bar that foo
doesn't match and bar doesn't match but the phrase "foo bar" does match. 

I feel like I'm not explaining this very well but basically what I want to
do has already been done by Lucid works:
https://lucidworks.com/2014/07/02/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/

However their solution requires that you use a pluggable query parser which
is not an extension of edismax. Now I haven't done a deep comparison but I'm
assuming I would lose access to all of edismax's parameters if I used their
pluggable query parser.

So instead I tried to replicate this functionality using edismax's pf2 and
pf3 parameters. It all works beautifully the way I have it setup except that
phrase field matches don't count towards my mm count. 

Ok so now I will go into detail about how I have my index setup for this
specific example.

I am using solr's default text field to index a field named manufacturer2

here are the relevant parameters of my search

q=livex lighting 8193
qf=productid, manufacturer_stop
pf2=manufacturer2
mm=3<-1 5<-2 6<90%

now I am stopping the word lighting from my manufacturer_stop field using
stopwords so only livex is matching in the manufacturer_stop field

However "livex lighting" is matching in the manufacturer2 field using phrase
field matching in the pf2 parameter.

so my matches are the following:
MATCH livex in manufacturer_stop field
MATCH 8193 in productid field
MATCH "livex lighting" in manufacturer 2 field as a phrase field match

so I have three matches... however the phrase field match doesn't seem be be
counting towards my mm match requirement of 3 tokens passed 3 must match. If
I change my mm to require only 2 tokens must match I get the expected
result. But I want my phrase field to count towards my mm match requirement
since lighting is matching in my phrase field.

Any assistance would be appreciated.... Or if someone could suggest a better
approach that would also be appreciated.





--
View this message in context: http://lucene.472066.n3.nabble.com/Phrase-field-matches-not-counting-towards-minimum-match-tp4322066.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase field matches not counting towards minimum match

Posted by Emir Arnautovic <em...@sematext.com>.
Hi,

mm applies to qf only and pf2/3 is about boosting results that are 
matched. What you can do is play with additional fields in qf and/or try 
making it work close to your requirement with autoRelax parameter. Note 
that in case of autorelax it might result in unexpected results if one 
field is using stop words and other do not or if different stopwords. In 
your case, manufacturer_stop would get single token so it would cause 
that any document matching that token becoming acceptable.

What you can also do is run stricter query and run followup query with 
more relaxed fields/parameters. You can run it also as a single query 
that is OR-ed, just make sure with boost factors that scores for first 
query are much higher then for second query.

HTH,
Emir


On 23.02.2017 22:14, dboychuck wrote:
> Ok let me explain what I am trying to do first since there may be a better
> approach. Recently I had been trying to increase solr's matching precision
> by requiring that all of the words in a field match before allowing a match
> on a field. I am using edismax as my query parser and since it tokenizes on
> white space there's no way to make sure that if my query is q=foo bar and I
> have a field named somefield indexed as a text field with foo bar that foo
> doesn't match and bar doesn't match but the phrase "foo bar" does match.
>
> I feel like I'm not explaining this very well but basically what I want to
> do has already been done by Lucid works:
> https://lucidworks.com/2014/07/02/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/
>
> However their solution requires that you use a pluggable query parser which
> is not an extension of edismax. Now I haven't done a deep comparison but I'm
> assuming I would lose access to all of edismax's parameters if I used their
> pluggable query parser.
>
> So instead I tried to replicate this functionality using edismax's pf2 and
> pf3 parameters. It all works beautifully the way I have it setup except that
> phrase field matches don't count towards my mm count.
>
> Ok so now I will go into detail about how I have my index setup for this
> specific example.
>
> I am using solr's default text field to index a field named manufacturer2
>
> here are the relevant parameters of my search
>
> q=livex lighting 8193
> qf=productid, manufacturer_stop
> pf2=manufacturer2
> mm=3<-1 5<-2 6<90%
>
> now I am stopping the word lighting from my manufacturer_stop field using
> stopwords so only livex is matching in the manufacturer_stop field
>
> However "livex lighting" is matching in the manufacturer2 field using phrase
> field matching in the pf2 parameter.
>
> so my matches are the following:
> MATCH livex in manufacturer_stop field
> MATCH 8193 in productid field
> MATCH "livex lighting" in manufacturer 2 field as a phrase field match
>
> so I have three matches... however the phrase field match doesn't seem be be
> counting towards my mm match requirement of 3 tokens passed 3 must match. If
> I change my mm to require only 2 tokens must match I get the expected
> result. But I want my phrase field to count towards my mm match requirement
> since lighting is matching in my phrase field.
>
> Any assistance would be appreciated.... Or if someone could suggest a better
> approach that would also be appreciated.
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Phrase-field-matches-not-counting-towards-minimum-match-tp4322066.html
> Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/