You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by CA <ca...@it-agenten.com> on 2016/07/16 12:48:31 UTC

Find part of long query in shorter fields

Hello all,

our index contains product offers from online shops. The fields we are indexing have all rather short values: the name of the product, the brand, the price, category and some fields containing identifiers like ASIN, GTIN etc. if available. We do not index the description texts.

The regular user search uses the „edismax“ and queries the above mentioned fields which works fine for short inputs like „iphone 6s“.

Now, we have to support a different kind of query which won’t be user input but using complete product names like those we store ourselves but not necessarily names that are actually part of our data set. This means that the input query can be relatively long. The output of the query is planned to consist of a More Like This list. So, in effect the query should have at least one hit that is hopefully close enough, and the actual result will be a More Like This list sourced by that one hit.

I have tried to get this to work based on the „edismax“ setup for the regular user search but this does not work well when the input is longer than what we have stored as similar product. Here is an example:


## Step 1: Input (not stored in our index):
"Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger“ (input to edismax without quotes)

(a) This input does not produce any results with our current edismax config (details at the end of the e-mail).
(b) When I relax the „mm“ parameter to "2<-1 5<-30% 8<10%“, I get one hit with the following name:
=> "Braun Series Clean&Renew CCR2 Cleansing Dock Cartridges Lemonfresh Formula Cartrige (Compatible with Series 7,5,3) 2 pc“


## Step 2: When I reduce the input manually to the following:
"Braun Series 9 9095CC Men's Electric Shaver“

The above shortened input returns a very good hit with the name:
=> "Braun 9095cc Series 9 Electric Shaver"


My Question:

Is it possible, and if so - how, to have the query input:
"Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger“ (input to edismax without quotes)
return (also or only) the hit with the name:
=> "Braun 9095cc Series 9 Electric Shaver"
and maybe even give it a high score.

I have tried to use „explainOther“ (output see at the end of this e-mail) but I have a really hard time reading it. In some cases, I’m not even able to understand where one clause ends and the next one starts (is it possible to have it returned in several lines?). Maybe someone can give me a hint on how to use that output or knows of some documentation on the i-net that explains how to make good use of it?


Looking at the input string, I was wondering:

(A) Is relaxing the „mm“ parameter really the way to go?
(B) Should I create another name field in schema.xml that basically has a different query chain, discarding the last words of a query input if too long. Or maybe it’s possible to make tokens in the first part of the input more „important“ (though I’m not sure this is generally the case)? Should I remove some of the filters from the query chain (like the ShingleFilter)?
(C) Can I configure something else or should I not use edismax for this?


Thank you for reading this,
any insight is highly appreciated!

Chantal


***

Following are the field configuration for the name field, the configuration of the edismax handler, and the output of „explainOther“ for the above example.



SCHEMA.XML — „name" field:

<field name="name" type="name_split" indexed="true" stored="true" required="true" multiValued="false“/>

<fieldType name="name_split" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
    <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
                generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1"
                splitOnCaseChange="1" preserveOriginal="1"/>
        <filter class="solr.LengthFilterFactory" min="2" max="255"/>
        <filter class="org.apache.lucene.analysis.icu.ICUFoldingFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>



SOLRCONFIG.XML — MLT/EDISMAX

 <requestHandler name="/mlt" class="solr.SearchHandler">
     <lst name="defaults">
         <str name="echoParams">all</str>
         <str name="defType">edismax</str>

         <str name="q.alt">*:*</str>
         <str name="fl">id,brand,name,price,score,popularity</str>
         <str name="tie">0.1</str>
         <str name="qf">brand_split^6 name</str>
         <str name="pf">brand_split^10 name^10</str>
         <str name="mm">2&lt;-1 5&lt;-30% 8&lt;10%</str>
         <int name="qs">10</int>
         <int name="ps">20</int>

         <str name="wt">xml</str>

         <str name="mlt">false</str>
         <str name="mlt.qf">brand_split^6 name price</str>
         <str name="mlt.fl">brand_split name price</str>
         <str name="mlt.interestingTerms">details</str>
     </lst>
 </requestHandler>



DEBUG — EXPLAIN OTHER

The „other“ document with id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b has the title "Braun 9095cc Series 9 Electric Shaver"

<response>
    <lst name="responseHeader">
        <lst name="params“><!-- shortened for better overview -->
            <str name="defType">edismax</str>
            <str name="qf">brand_split^6 name</str>
            <str name="pf">brand_split^10 name^10</str>
            <str name="mm">2<-1 5<-30% 8<10%</str>
            <str name="qs">10</str>
            <str name="ps">20</str>
            <str name="tie">0.1</str>
            <str name="q">
                Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger
            </str>
            <str name="explainOther">id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b</str>
        </lst>
    </lst>
    <result name="response" numFound="1" start="0" maxScore="97.122955">
        <doc>
            <str name="name">
                Braun Series Clean&Renew CCR2 Cleansing Dock Cartridges Lemonfresh Formula Cartrige (Compatible with
                Series 7,5,3) 2 pc
            </str>
            <str name="id">773d4bdb341c4dc438c481ac80de5abde08d85bf</str>
            <str name="brand">Braun</str>
            <float name="score">97.122955</float>
        </doc>
    </result>
    <lst name="debug">
        <str name="rawquerystring">
            Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger
        </str>
        <str name="querystring">
            Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger
        </str>
        <str name="parsedquery">
            (+(DisjunctionMaxQuery((name:braun | (brand_split:braun)^6.0)~0.1) DisjunctionMaxQuery((name:series |
            (brand_split:series)^6.0)~0.1) DisjunctionMaxQuery((name:"(9095cc 9095) cc"~10 | (brand_split:"(9095cc 9095)
            cc"~10)^6.0)~0.1) DisjunctionMaxQuery((Synonym(name:men name:men's) | (Synonym(brand_split:men
            brand_split:men's))^6.0)~0.1) DisjunctionMaxQuery((name:electric | (brand_split:electric)^6.0)~0.1)
            DisjunctionMaxQuery((name:shaver | (brand_split:shaver)^6.0)~0.1) DisjunctionMaxQuery((name:"(wet/dry wet
            wetdry) dry"~10 | (brand_split:"(wet/dry wet wetdry) dry"~10)^6.0)~0.1) DisjunctionMaxQuery((name:with |
            (brand_split:with)^6.0)~0.1) +DisjunctionMaxQuery((name:clean | (brand_split:clean)^6.0)~0.1)
            +DisjunctionMaxQuery((name:renew | (brand_split:renew)^6.0)~0.1) DisjunctionMaxQuery((name:charger |
            (brand_split:charger)^6.0)~0.1)) DisjunctionMaxQuery(((brand_split:"(braun braun series braunseries) series
            (series series 9 series9) ? (9 9095cc 99095 99095cc) 9095 cc (9095cc 9095) (cc 9095cc men's 9095 9095ccmen)
            (cc ccmen) men (men's men men's electric menelectric) electric (electric electric shaver electricshaver)
            shaver (shaver shaver wet/dry shaverwetdry) wet dry (wet/dry wet wetdry) (dry wet/dry with wet wetdrywith)
            dry with (with with clean withclean) clean (clean clean and cleanand) and (and and renew andrenew) renew
            (renew renew charger renewcharger) charger charger"~20)^10.0 | (name:"(braun braun series braunseries)
            series (series series 9 series9) ? (9 9095cc 99095 99095cc) 9095 cc (9095cc 9095) (cc 9095cc men's 9095
            9095ccmen) (cc ccmen) men (men's men men's electric menelectric) electric (electric electric shaver
            electricshaver) shaver (shaver shaver wet/dry shaverwetdry) wet dry (wet/dry wet wetdry) (dry wet/dry with
            wet wetdrywith) dry with (with with clean withclean) clean (clean clean and cleanand) and (and and renew
            andrenew) renew (renew renew charger renewcharger) charger charger"~20)^10.0)~0.1))/no_coord
        </str>
        <str name="parsedquery_toString">
            +((name:braun | (brand_split:braun)^6.0)~0.1 (name:series | (brand_split:series)^6.0)~0.1 (name:"(9095cc
            9095) cc"~10 | (brand_split:"(9095cc 9095) cc"~10)^6.0)~0.1 (Synonym(name:men name:men's) |
            (Synonym(brand_split:men brand_split:men's))^6.0)~0.1 (name:electric | (brand_split:electric)^6.0)~0.1
            (name:shaver | (brand_split:shaver)^6.0)~0.1 (name:"(wet/dry wet wetdry) dry"~10 | (brand_split:"(wet/dry
            wet wetdry) dry"~10)^6.0)~0.1 (name:with | (brand_split:with)^6.0)~0.1 +(name:clean |
            (brand_split:clean)^6.0)~0.1 +(name:renew | (brand_split:renew)^6.0)~0.1 (name:charger |
            (brand_split:charger)^6.0)~0.1) ((brand_split:"(braun braun series braunseries) series (series series 9
            series9) ? (9 9095cc 99095 99095cc) 9095 cc (9095cc 9095) (cc 9095cc men's 9095 9095ccmen) (cc ccmen) men
            (men's men men's electric menelectric) electric (electric electric shaver electricshaver) shaver (shaver
            shaver wet/dry shaverwetdry) wet dry (wet/dry wet wetdry) (dry wet/dry with wet wetdrywith) dry with (with
            with clean withclean) clean (clean clean and cleanand) and (and and renew andrenew) renew (renew renew
            charger renewcharger) charger charger"~20)^10.0 | (name:"(braun braun series braunseries) series (series
            series 9 series9) ? (9 9095cc 99095 99095cc) 9095 cc (9095cc 9095) (cc 9095cc men's 9095 9095ccmen) (cc
            ccmen) men (men's men men's electric menelectric) electric (electric electric shaver electricshaver) shaver
            (shaver shaver wet/dry shaverwetdry) wet dry (wet/dry wet wetdry) (dry wet/dry with wet wetdrywith) dry with
            (with with clean withclean) clean (clean clean and cleanand) and (and and renew andrenew) renew (renew renew
            charger renewcharger) charger charger"~20)^10.0)~0.1
        </str>
        <lst name="explain">
            <str name="773d4bdb341c4dc438c481ac80de5abde08d85bf">
                97.122955 = sum of: 97.122955 = sum of: 61.102264 = max plus 0.1 times others of: 6.80276 =
                weight(name:braun in 477314) [], result of: 6.80276 = score(doc=477314,freq=1.0 = termFreq=1.0 ),
                product of: 8.171213 = idf(docFreq=324, docCount=1147961) 0.8325276 = tfNorm, computed from: 1.0 =
                termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 40.96 = fieldLength
                60.42199 = weight(brand_split:braun in 477314) [], result of: 60.42199 = score(doc=477314,freq=1.0 =
                termFreq=1.0 ), product of: 6.0 = boost 8.11682 = idf(docFreq=305, docCount=1023531) 1.2406745 = tfNorm,
                computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 1.9018271 = avgFieldLength 1.0 =
                fieldLength 8.663414 = max plus 0.1 times others of: 8.663414 = weight(name:series in 477314) [], result
                of: 8.663414 = score(doc=477314,freq=4.0 = termFreq=4.0 ), product of: 5.5549765 = idf(docFreq=4440,
                docCount=1147961) 1.5595771 = tfNorm, computed from: 4.0 = termFreq=4.0 1.2 = parameter k1 0.75 =
                parameter b 27.458092 = avgFieldLength 40.96 = fieldLength 4.0527744 = max plus 0.1 times others of:
                4.0527744 = weight(name:with in 477314) [], result of: 4.0527744 = score(doc=477314,freq=2.0 =
                termFreq=2.0 ), product of: 3.355103 = idf(docFreq=40070, docCount=1147961) 1.2079433 = tfNorm, computed
                from: 2.0 = termFreq=2.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 40.96 =
                fieldLength 8.542337 = max plus 0.1 times others of: 8.542337 = weight(name:clean in 477314) [], result
                of: 8.542337 = score(doc=477314,freq=3.0 = termFreq=3.0 ), product of: 6.008829 = idf(docFreq=2820,
                docCount=1147961) 1.421631 = tfNorm, computed from: 3.0 = termFreq=3.0 1.2 = parameter k1 0.75 =
                parameter b 27.458092 = avgFieldLength 40.96 = fieldLength 14.762168 = max plus 0.1 times others of:
                14.762168 = weight(name:renew in 477314) [], result of: 14.762168 = score(doc=477314,freq=3.0 =
                termFreq=3.0 ), product of: 10.383966 = idf(docFreq=35, docCount=1147961) 1.421631 = tfNorm, computed
                from: 3.0 = termFreq=3.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 40.96 =
                fieldLength
            </str>
        </lst>
        <str name="otherQuery">id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b</str>
        <lst name="explainOther">
            <str name="2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b">
                0.0 = Failure to meet condition(s) of required/prohibited clause(s) 0.0 = no match on required clause
                ((name:braun | (brand_split:braun)^6.0)~0.1 (name:series | (brand_split:series)^6.0)~0.1 (name:"(9095cc
                9095) cc"~10 | (brand_split:"(9095cc 9095) cc"~10)^6.0)~0.1 (Synonym(name:men name:men's) |
                (Synonym(brand_split:men brand_split:men's))^6.0)~0.1 (name:electric | (brand_split:electric)^6.0)~0.1
                (name:shaver | (brand_split:shaver)^6.0)~0.1 (name:"(wet/dry wet wetdry) dry"~10 |
                (brand_split:"(wet/dry wet wetdry) dry"~10)^6.0)~0.1 (name:with | (brand_split:with)^6.0)~0.1
                +(name:clean | (brand_split:clean)^6.0)~0.1 +(name:renew | (brand_split:renew)^6.0)~0.1 (name:charger |
                (brand_split:charger)^6.0)~0.1) 0.0 = Failure to meet condition(s) of required/prohibited clause(s)
                61.40732 = max plus 0.1 times others of: 9.853278 = weight(name:braun in 113560) [], result of: 9.853278
                = score(doc=113560,freq=1.0 = termFreq=1.0 ), product of: 8.171213 = idf(docFreq=324, docCount=1147961)
                1.2058525 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 27.458092 =
                avgFieldLength 16.0 = fieldLength 60.42199 = weight(brand_split:braun in 113560) [], result of: 60.42199
                = score(doc=113560,freq=1.0 = termFreq=1.0 ), product of: 6.0 = boost 8.11682 = idf(docFreq=305,
                docCount=1023531) 1.2406745 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 =
                parameter b 1.9018271 = avgFieldLength 1.0 = fieldLength 8.6537285 = max plus 0.1 times others of:
                8.6537285 = weight(name:series in 113560) [], result of: 8.6537285 = score(doc=113560,freq=2.0 =
                termFreq=2.0 ), product of: 5.5549765 = idf(docFreq=4440, docCount=1147961) 1.5578334 = tfNorm, computed
                from: 2.0 = termFreq=2.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 16.0 =
                fieldLength 52.67099 = max plus 0.1 times others of: 52.67099 = weight(name:"(9095cc 9095) cc"~10 in
                113560) [], result of: 52.67099 = score(doc=113560,freq=3.0 = phraseFreq=3.0 ), product of: 30.520727 =
                idf(), sum of: 13.037208 = idf(docFreq=2, docCount=1147961) 10.796498 = idf(docFreq=23,
                docCount=1147961) 6.687021 = idf(docFreq=1431, docCount=1147961) 1.725745 = tfNorm, computed from: 3.0 =
                phraseFreq=3.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 16.0 = fieldLength
                8.592838 = max plus 0.1 times others of: 8.592838 = weight(name:electric in 113560) [], result of:
                8.592838 = score(doc=113560,freq=2.0 = termFreq=2.0 ), product of: 5.51589 = idf(docFreq=4617,
                docCount=1147961) 1.5578334 = tfNorm, computed from: 2.0 = termFreq=2.0 1.2 = parameter k1 0.75 =
                parameter b 27.458092 = avgFieldLength 16.0 = fieldLength 13.669254 = max plus 0.1 times others of:
                13.669254 = weight(name:shaver in 113560) [], result of: 13.669254 = score(doc=113560,freq=2.0 =
                termFreq=2.0 ), product of: 8.7745285 = idf(docFreq=177, docCount=1147961) 1.5578334 = tfNorm, computed
                from: 2.0 = termFreq=2.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 16.0 =
                fieldLength 0.0 = no match on required clause ((name:clean | (brand_split:clean)^6.0)~0.1) 0.0 = No
                matching clause 0.0 = no match on required clause ((name:renew | (brand_split:renew)^6.0)~0.1) 0.0 = No
                matching clause
            </str>
        </lst>
    </lst>
</response>

Re: Find part of long query in shorter fields

Posted by CA <ca...@it-agenten.com>.
Hi Ahmet!

Thank you for that information. I was wondering whether dismax is kind of „deprecated“ or - if not - when would I use dismax in preference to edismax.
The documentation sounds to me like „edismax is dismax+ : is does everything dismax does, and more“.

Chantal

Am 21.07.2016 um 14:43 schrieb Ahmet Arslan <io...@yahoo.com>:

> Hi,
> 
> If you want to disable operators altogether please use dismax instead of edismax.
> In dismax, only + and - unary operators are supported, if i am not wrong.
> I don't remember the situation of quotations for the phrase query.
> 
> Ahmet
> 


Re: Find part of long query in shorter fields

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi,

If you want to disable operators altogether please use dismax instead of edismax.
In dismax, only + and - unary operators are supported, if i am not wrong.
I don't remember the situation of quotations for the phrase query.

Ahmet



On Tuesday, July 19, 2016 8:29 PM, CA <ca...@it-agenten.com> wrote:
Just for the records:

After realizing that with „defType=dismax“ I really do get the expected output I’ve found out what I need to change in my edismax configuration:

<str name="lowercaseOperators">false</str>

Then this will work:
> q=Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger
> // edismax with qf/pf : „name“ and „brand“ field
> 
Not returned anymore:
> name: "Braun Series Clean&Renew CCR2 Cleansing Dock Cartridges Lemonfresh Formula Cartrige (Compatible with Series 7,5,3) 2 pc“
> brand: Braun
> 
Best hit:
> name: "Braun 9095cc Series 9 Electric Shaver“
> brand: Braun


Actually, as I’d like to disable operators in the query altogether (if possible), I’m wondering whether I should not be using the old dismax in the first place.

Cheers,

Chantal

Re: Find part of long query in shorter fields

Posted by CA <ca...@it-agenten.com>.
Just for the records:

After realizing that with „defType=dismax“ I really do get the expected output I’ve found out what I need to change in my edismax configuration:

<str name="lowercaseOperators">false</str>

Then this will work:
> q=Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger
> // edismax with qf/pf : „name“ and „brand“ field
> 
Not returned anymore:
> name: "Braun Series Clean&Renew CCR2 Cleansing Dock Cartridges Lemonfresh Formula Cartrige (Compatible with Series 7,5,3) 2 pc“
> brand: Braun
> 
Best hit:
> name: "Braun 9095cc Series 9 Electric Shaver“
> brand: Braun


Actually, as I’d like to disable operators in the query altogether (if possible), I’m wondering whether I should not be using the old dismax in the first place.

Cheers,
Chantal

Re: Find part of long query in shorter fields

Posted by CA <ca...@it-agenten.com>.
Hi Ahmet,


thank you for the link. It helped me to find more resources.

What I still don’t understand, though, is why the edismax returns one of the documents with a partial hit and not the other:


q=Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger
// edismax with qf/pf : „name“ and „brand“ field

HIT:
name: "Braun Series Clean&Renew CCR2 Cleansing Dock Cartridges Lemonfresh Formula Cartrige (Compatible with Series 7,5,3) 2 pc“
brand: Braun

NOT A HIT:
name: "Braun 9095cc Series 9 Electric Shaver“
brand: Braun

(explainOther, schema, solrconfig for this, see my previous e-mail)


I’m still thinking that if I could understand what is happening then it would help me figure out what the solution for my use case is. Maybe edismax would be perfectly fine with the right combination of fieldtypes and config values?


Thanks for your input!
Chantal





Re: Find part of long query in shorter fields

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Chantal,

Please see https://issues.apache.org/jira/browse/LUCENE-7148


ahmet



On Saturday, July 16, 2016 3:48 PM, CA <ca...@it-agenten.com> wrote:
Hello all,

our index contains product offers from online shops. The fields we are indexing have all rather short values: the name of the product, the brand, the price, category and some fields containing identifiers like ASIN, GTIN etc. if available. We do not index the description texts.

The regular user search uses the „edismax“ and queries the above mentioned fields which works fine for short inputs like „iphone 6s“.

Now, we have to support a different kind of query which won’t be user input but using complete product names like those we store ourselves but not necessarily names that are actually part of our data set. This means that the input query can be relatively long. The output of the query is planned to consist of a More Like This list. So, in effect the query should have at least one hit that is hopefully close enough, and the actual result will be a More Like This list sourced by that one hit.

I have tried to get this to work based on the „edismax“ setup for the regular user search but this does not work well when the input is longer than what we have stored as similar product. Here is an example:


## Step 1: Input (not stored in our index):
"Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger“ (input to edismax without quotes)

(a) This input does not produce any results with our current edismax config (details at the end of the e-mail).
(b) When I relax the „mm“ parameter to "2<-1 5<-30% 8<10%“, I get one hit with the following name:
=> "Braun Series Clean&Renew CCR2 Cleansing Dock Cartridges Lemonfresh Formula Cartrige (Compatible with Series 7,5,3) 2 pc“


## Step 2: When I reduce the input manually to the following:
"Braun Series 9 9095CC Men's Electric Shaver“

The above shortened input returns a very good hit with the name:
=> "Braun 9095cc Series 9 Electric Shaver"


My Question:

Is it possible, and if so - how, to have the query input:
"Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger“ (input to edismax without quotes)
return (also or only) the hit with the name:
=> "Braun 9095cc Series 9 Electric Shaver"
and maybe even give it a high score.

I have tried to use „explainOther“ (output see at the end of this e-mail) but I have a really hard time reading it. In some cases, I’m not even able to understand where one clause ends and the next one starts (is it possible to have it returned in several lines?). Maybe someone can give me a hint on how to use that output or knows of some documentation on the i-net that explains how to make good use of it?


Looking at the input string, I was wondering:

(A) Is relaxing the „mm“ parameter really the way to go?
(B) Should I create another name field in schema.xml that basically has a different query chain, discarding the last words of a query input if too long. Or maybe it’s possible to make tokens in the first part of the input more „important“ (though I’m not sure this is generally the case)? Should I remove some of the filters from the query chain (like the ShingleFilter)?
(C) Can I configure something else or should I not use edismax for this?


Thank you for reading this,
any insight is highly appreciated!

Chantal


***

Following are the field configuration for the name field, the configuration of the edismax handler, and the output of „explainOther“ for the above example.



SCHEMA.XML — „name" field:

<field name="name" type="name_split" indexed="true" stored="true" required="true" multiValued="false“/>

<fieldType name="name_split" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
    <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
                generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1"
                splitOnCaseChange="1" preserveOriginal="1"/>
        <filter class="solr.LengthFilterFactory" min="2" max="255"/>
        <filter class="org.apache.lucene.analysis.icu.ICUFoldingFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>



SOLRCONFIG.XML — MLT/EDISMAX

<requestHandler name="/mlt" class="solr.SearchHandler">
     <lst name="defaults">
         <str name="echoParams">all</str>
         <str name="defType">edismax</str>

         <str name="q.alt">*:*</str>
         <str name="fl">id,brand,name,price,score,popularity</str>
         <str name="tie">0.1</str>
         <str name="qf">brand_split^6 name</str>
         <str name="pf">brand_split^10 name^10</str>
         <str name="mm">2&lt;-1 5&lt;-30% 8&lt;10%</str>
         <int name="qs">10</int>
         <int name="ps">20</int>

         <str name="wt">xml</str>

         <str name="mlt">false</str>
         <str name="mlt.qf">brand_split^6 name price</str>
         <str name="mlt.fl">brand_split name price</str>
         <str name="mlt.interestingTerms">details</str>
     </lst>
</requestHandler>



DEBUG — EXPLAIN OTHER

The „other“ document with id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b has the title "Braun 9095cc Series 9 Electric Shaver"

<response>
    <lst name="responseHeader">
        <lst name="params“><!-- shortened for better overview -->
            <str name="defType">edismax</str>
            <str name="qf">brand_split^6 name</str>
            <str name="pf">brand_split^10 name^10</str>
            <str name="mm">2<-1 5<-30% 8<10%</str>
            <str name="qs">10</str>
            <str name="ps">20</str>
            <str name="tie">0.1</str>
            <str name="q">
                Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger
            </str>
            <str name="explainOther">id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b</str>
        </lst>
    </lst>
    <result name="response" numFound="1" start="0" maxScore="97.122955">
        <doc>
            <str name="name">
                Braun Series Clean&Renew CCR2 Cleansing Dock Cartridges Lemonfresh Formula Cartrige (Compatible with
                Series 7,5,3) 2 pc
            </str>
            <str name="id">773d4bdb341c4dc438c481ac80de5abde08d85bf</str>
            <str name="brand">Braun</str>
            <float name="score">97.122955</float>
        </doc>
    </result>
    <lst name="debug">
        <str name="rawquerystring">
            Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger
        </str>
        <str name="querystring">
            Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger
        </str>
        <str name="parsedquery">
            (+(DisjunctionMaxQuery((name:braun | (brand_split:braun)^6.0)~0.1) DisjunctionMaxQuery((name:series |
            (brand_split:series)^6.0)~0.1) DisjunctionMaxQuery((name:"(9095cc 9095) cc"~10 | (brand_split:"(9095cc 9095)
            cc"~10)^6.0)~0.1) DisjunctionMaxQuery((Synonym(name:men name:men's) | (Synonym(brand_split:men
            brand_split:men's))^6.0)~0.1) DisjunctionMaxQuery((name:electric | (brand_split:electric)^6.0)~0.1)
            DisjunctionMaxQuery((name:shaver | (brand_split:shaver)^6.0)~0.1) DisjunctionMaxQuery((name:"(wet/dry wet
            wetdry) dry"~10 | (brand_split:"(wet/dry wet wetdry) dry"~10)^6.0)~0.1) DisjunctionMaxQuery((name:with |
            (brand_split:with)^6.0)~0.1) +DisjunctionMaxQuery((name:clean | (brand_split:clean)^6.0)~0.1)
            +DisjunctionMaxQuery((name:renew | (brand_split:renew)^6.0)~0.1) DisjunctionMaxQuery((name:charger |
            (brand_split:charger)^6.0)~0.1)) DisjunctionMaxQuery(((brand_split:"(braun braun series braunseries) series
            (series series 9 series9) ? (9 9095cc 99095 99095cc) 9095 cc (9095cc 9095) (cc 9095cc men's 9095 9095ccmen)
            (cc ccmen) men (men's men men's electric menelectric) electric (electric electric shaver electricshaver)
            shaver (shaver shaver wet/dry shaverwetdry) wet dry (wet/dry wet wetdry) (dry wet/dry with wet wetdrywith)
            dry with (with with clean withclean) clean (clean clean and cleanand) and (and and renew andrenew) renew
            (renew renew charger renewcharger) charger charger"~20)^10.0 | (name:"(braun braun series braunseries)
            series (series series 9 series9) ? (9 9095cc 99095 99095cc) 9095 cc (9095cc 9095) (cc 9095cc men's 9095
            9095ccmen) (cc ccmen) men (men's men men's electric menelectric) electric (electric electric shaver
            electricshaver) shaver (shaver shaver wet/dry shaverwetdry) wet dry (wet/dry wet wetdry) (dry wet/dry with
            wet wetdrywith) dry with (with with clean withclean) clean (clean clean and cleanand) and (and and renew
            andrenew) renew (renew renew charger renewcharger) charger charger"~20)^10.0)~0.1))/no_coord
        </str>
        <str name="parsedquery_toString">
            +((name:braun | (brand_split:braun)^6.0)~0.1 (name:series | (brand_split:series)^6.0)~0.1 (name:"(9095cc
            9095) cc"~10 | (brand_split:"(9095cc 9095) cc"~10)^6.0)~0.1 (Synonym(name:men name:men's) |
            (Synonym(brand_split:men brand_split:men's))^6.0)~0.1 (name:electric | (brand_split:electric)^6.0)~0.1
            (name:shaver | (brand_split:shaver)^6.0)~0.1 (name:"(wet/dry wet wetdry) dry"~10 | (brand_split:"(wet/dry
            wet wetdry) dry"~10)^6.0)~0.1 (name:with | (brand_split:with)^6.0)~0.1 +(name:clean |
            (brand_split:clean)^6.0)~0.1 +(name:renew | (brand_split:renew)^6.0)~0.1 (name:charger |
            (brand_split:charger)^6.0)~0.1) ((brand_split:"(braun braun series braunseries) series (series series 9
            series9) ? (9 9095cc 99095 99095cc) 9095 cc (9095cc 9095) (cc 9095cc men's 9095 9095ccmen) (cc ccmen) men
            (men's men men's electric menelectric) electric (electric electric shaver electricshaver) shaver (shaver
            shaver wet/dry shaverwetdry) wet dry (wet/dry wet wetdry) (dry wet/dry with wet wetdrywith) dry with (with
            with clean withclean) clean (clean clean and cleanand) and (and and renew andrenew) renew (renew renew
            charger renewcharger) charger charger"~20)^10.0 | (name:"(braun braun series braunseries) series (series
            series 9 series9) ? (9 9095cc 99095 99095cc) 9095 cc (9095cc 9095) (cc 9095cc men's 9095 9095ccmen) (cc
            ccmen) men (men's men men's electric menelectric) electric (electric electric shaver electricshaver) shaver
            (shaver shaver wet/dry shaverwetdry) wet dry (wet/dry wet wetdry) (dry wet/dry with wet wetdrywith) dry with
            (with with clean withclean) clean (clean clean and cleanand) and (and and renew andrenew) renew (renew renew
            charger renewcharger) charger charger"~20)^10.0)~0.1
        </str>
        <lst name="explain">
            <str name="773d4bdb341c4dc438c481ac80de5abde08d85bf">
                97.122955 = sum of: 97.122955 = sum of: 61.102264 = max plus 0.1 times others of: 6.80276 =
                weight(name:braun in 477314) [], result of: 6.80276 = score(doc=477314,freq=1.0 = termFreq=1.0 ),
                product of: 8.171213 = idf(docFreq=324, docCount=1147961) 0.8325276 = tfNorm, computed from: 1.0 =
                termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 40.96 = fieldLength
                60.42199 = weight(brand_split:braun in 477314) [], result of: 60.42199 = score(doc=477314,freq=1.0 =
                termFreq=1.0 ), product of: 6.0 = boost 8.11682 = idf(docFreq=305, docCount=1023531) 1.2406745 = tfNorm,
                computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 1.9018271 = avgFieldLength 1.0 =
                fieldLength 8.663414 = max plus 0.1 times others of: 8.663414 = weight(name:series in 477314) [], result
                of: 8.663414 = score(doc=477314,freq=4.0 = termFreq=4.0 ), product of: 5.5549765 = idf(docFreq=4440,
                docCount=1147961) 1.5595771 = tfNorm, computed from: 4.0 = termFreq=4.0 1.2 = parameter k1 0.75 =
                parameter b 27.458092 = avgFieldLength 40.96 = fieldLength 4.0527744 = max plus 0.1 times others of:
                4.0527744 = weight(name:with in 477314) [], result of: 4.0527744 = score(doc=477314,freq=2.0 =
                termFreq=2.0 ), product of: 3.355103 = idf(docFreq=40070, docCount=1147961) 1.2079433 = tfNorm, computed
                from: 2.0 = termFreq=2.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 40.96 =
                fieldLength 8.542337 = max plus 0.1 times others of: 8.542337 = weight(name:clean in 477314) [], result
                of: 8.542337 = score(doc=477314,freq=3.0 = termFreq=3.0 ), product of: 6.008829 = idf(docFreq=2820,
                docCount=1147961) 1.421631 = tfNorm, computed from: 3.0 = termFreq=3.0 1.2 = parameter k1 0.75 =
                parameter b 27.458092 = avgFieldLength 40.96 = fieldLength 14.762168 = max plus 0.1 times others of:
                14.762168 = weight(name:renew in 477314) [], result of: 14.762168 = score(doc=477314,freq=3.0 =
                termFreq=3.0 ), product of: 10.383966 = idf(docFreq=35, docCount=1147961) 1.421631 = tfNorm, computed
                from: 3.0 = termFreq=3.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 40.96 =
                fieldLength
            </str>
        </lst>
        <str name="otherQuery">id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b</str>
        <lst name="explainOther">
            <str name="2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b">
                0.0 = Failure to meet condition(s) of required/prohibited clause(s) 0.0 = no match on required clause
                ((name:braun | (brand_split:braun)^6.0)~0.1 (name:series | (brand_split:series)^6.0)~0.1 (name:"(9095cc
                9095) cc"~10 | (brand_split:"(9095cc 9095) cc"~10)^6.0)~0.1 (Synonym(name:men name:men's) |
                (Synonym(brand_split:men brand_split:men's))^6.0)~0.1 (name:electric | (brand_split:electric)^6.0)~0.1
                (name:shaver | (brand_split:shaver)^6.0)~0.1 (name:"(wet/dry wet wetdry) dry"~10 |
                (brand_split:"(wet/dry wet wetdry) dry"~10)^6.0)~0.1 (name:with | (brand_split:with)^6.0)~0.1
                +(name:clean | (brand_split:clean)^6.0)~0.1 +(name:renew | (brand_split:renew)^6.0)~0.1 (name:charger |
                (brand_split:charger)^6.0)~0.1) 0.0 = Failure to meet condition(s) of required/prohibited clause(s)
                61.40732 = max plus 0.1 times others of: 9.853278 = weight(name:braun in 113560) [], result of: 9.853278
                = score(doc=113560,freq=1.0 = termFreq=1.0 ), product of: 8.171213 = idf(docFreq=324, docCount=1147961)
                1.2058525 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 27.458092 =
                avgFieldLength 16.0 = fieldLength 60.42199 = weight(brand_split:braun in 113560) [], result of: 60.42199
                = score(doc=113560,freq=1.0 = termFreq=1.0 ), product of: 6.0 = boost 8.11682 = idf(docFreq=305,
                docCount=1023531) 1.2406745 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 =
                parameter b 1.9018271 = avgFieldLength 1.0 = fieldLength 8.6537285 = max plus 0.1 times others of:
                8.6537285 = weight(name:series in 113560) [], result of: 8.6537285 = score(doc=113560,freq=2.0 =
                termFreq=2.0 ), product of: 5.5549765 = idf(docFreq=4440, docCount=1147961) 1.5578334 = tfNorm, computed
                from: 2.0 = termFreq=2.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 16.0 =
                fieldLength 52.67099 = max plus 0.1 times others of: 52.67099 = weight(name:"(9095cc 9095) cc"~10 in
                113560) [], result of: 52.67099 = score(doc=113560,freq=3.0 = phraseFreq=3.0 ), product of: 30.520727 =
                idf(), sum of: 13.037208 = idf(docFreq=2, docCount=1147961) 10.796498 = idf(docFreq=23,
                docCount=1147961) 6.687021 = idf(docFreq=1431, docCount=1147961) 1.725745 = tfNorm, computed from: 3.0 =
                phraseFreq=3.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 16.0 = fieldLength
                8.592838 = max plus 0.1 times others of: 8.592838 = weight(name:electric in 113560) [], result of:
                8.592838 = score(doc=113560,freq=2.0 = termFreq=2.0 ), product of: 5.51589 = idf(docFreq=4617,
                docCount=1147961) 1.5578334 = tfNorm, computed from: 2.0 = termFreq=2.0 1.2 = parameter k1 0.75 =
                parameter b 27.458092 = avgFieldLength 16.0 = fieldLength 13.669254 = max plus 0.1 times others of:
                13.669254 = weight(name:shaver in 113560) [], result of: 13.669254 = score(doc=113560,freq=2.0 =
                termFreq=2.0 ), product of: 8.7745285 = idf(docFreq=177, docCount=1147961) 1.5578334 = tfNorm, computed
                from: 2.0 = termFreq=2.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 16.0 =
                fieldLength 0.0 = no match on required clause ((name:clean | (brand_split:clean)^6.0)~0.1) 0.0 = No
                matching clause 0.0 = no match on required clause ((name:renew | (brand_split:renew)^6.0)~0.1) 0.0 = No
                matching clause
            </str>
        </lst>
    </lst>
</response>