You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mikhail Khludnev (JIRA)" <ji...@apache.org> on 2019/02/15 09:42:00 UTC

[jira] [Assigned] (SOLR-13126) Multiplicative boost of isn't applied when one of the summed or multiplied queries doesn't match

     [ https://issues.apache.org/jira/browse/SOLR-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Khludnev reassigned SOLR-13126:
---------------------------------------

    Assignee: Alan Woodward  (was: Mikhail Khludnev)

> Multiplicative boost of isn't applied when one of the summed or multiplied queries doesn't match 
> -------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13126
>                 URL: https://issues.apache.org/jira/browse/SOLR-13126
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search
>    Affects Versions: 7.5.0
>         Environment: Reproduced with macOS 10.14.1, a quick test with Windows 10 showed the same result.
>            Reporter: Thomas Aglassinger
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: 0001-use-deprecated-classes-to-fix-regression-introduced-.patch, 0002-SOLR-13126-Added-test-case.patch, 2019-02-14_1715.png, SOLR-13126.patch, SOLR-13126.patch, debugQuery.json, image-2019-02-13-16-17-56-272.png, screenshot-1.png, solr_match_neither_nextteil_nor_sony.json, solr_match_neither_nextteil_nor_sony.txt, solr_match_netzteil_and_sony.json, solr_match_netzteil_and_sony.txt, solr_match_netzteil_only.json, solr_match_netzteil_only.txt
>
>
> Under certain circumstances search results from queries with multiple multiplicative boosts using the Solr functions {{product()}} and {{query()}} result in a score that is inconsistent with the one from the debugQuery information. Also only the debug score is correct while the actual search results show a wrong score.
> This seems somewhat similar to the behaviour described in https://issues.apache.org/jira/browse/LUCENE-7132, though this issue has been resolved a while ago.
> A little background: we are using Solr as a search platform for the e-commerce framework SAP Hybris. There the shop administrator can create multiplicative boost rules (see below for an example) where a value like 2.0 means that an item gets boosted to 200%. This works fine in the demo shop distributed by SAP but breaks in our shop. We encountered the issue when Upgrading from Solr 7.2.1 / Hybris 6.7 to Solr 7.5 / Hybris 18.8.3 (which would have been named Hybris 6.8 but the version naming schema changed).
> We reduced the Solr query generated by Hybris to the relevant parts and could reproduce the issue in the Solr admin without any Hybris connection.
> I attached the JSON result of a test query but here's a description of the parts that seemed most relevant to me.
> The {{responseHeader.params}} reads (slightly rearranged):
> {code:java}
> "q":"{!boost b=$ymb}(+{!lucene v=$yq})",
> "ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
> "yq":"*:*",
> "sort":"score desc",
> "debugQuery":"true",
> // Added to keep the output small but probably unrelated to the actual issue
> "fl":"score,id,code_string,name_text_de",
> "fq":"catalogId:\"someProducts\"",
> "rows":"10",
> {code}
> This example boosts the German product name (field {{name_text_de}}) in case in contains certain terms:
>  * "Netzteil" (power supply) is boosted to 200%
>  * "Sony" is boosted to 300%
> Consequently a product containing both terms should be boosted to 600%.
> Also the query function has the value 1 specified as default in case the name does not contain the respective term resulting in a pseudo boost that preserves the score.
> According to the debug information the parser used is the LuceneQParser, which translates this to the following parsed query:
> {quote}FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)))))
> {quote}
> And the translated boost is:
> {quote}org.apache.lucene.queries.function.valuesource.ProductFloatFunction:product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0))
> {quote}
> When taking a look at the search result, among other the following products are included (see the JSON comments for an analysis of each result):
> {code:javascript}
>      {
>         "id":"someProducts/Online/test7111111",
>         "name_text_de":"Original Sony Vaio Netzteil",
>         "code_string":"test7111111",
>         // CORRECT, both "Netzteil" and "Sony" are included in the name
>         "score":6.0},
>       {
>         "id":"someProducts/Online/taxTestingProductThree",
>         "name_text_de":"Steuertestprodukt Zwei",
>         "code_string":"taxTestingProductThree",
>         // CORRECT, neither "Netzteil" nor "Sony" are included in the name
>         "score":1.0},
>       {
>         "id":"someProducts/Online/797856300000",
>         "name_text_de":"GS-Netzteil 20W schwarz",
>         "code_string":"797856300000",
>         // WRONG, "Netzteil" is part of the name; 
>         // note that we do split words on hyphen because 
>         // WordDelimiterGraphFilterFactory.generateWordParts="1"
>         "score":1.0},
> {code}
> So apparently the multiplicative boost works for product names where all the boosted terms are included but fails if only one of the terms matches.
> There are also other products in the result that contain either "Netzteil" or "Sony" but still get a score of 1.0 instead of 2.0 resp. 3.0.
> Surprisingly in the {{explain}} segment the score for the product with "Netzteil" but without "Sony" correctly is 2.0:
> {code:java}
> 2.0 = product of:
>   1.0 = boost
>   2.0 = product of:
>     1.0 = *:*
>     2.0 = product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=1.0)
> {code}
> The type definition of {{text_de}} in the {{schema.xml}} (which is used for "name_text_de") includes the following filters:
> {code:xml}
> <fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
>     <analyzer>
>         <tokenizer class="solr.WhitespaceTokenizerFactory" />
>         <filter class="solr.WordDelimiterGraphFilterFactory"  preserveOriginal="1"
>                 generateWordParts="1" generateNumberParts="1" catenateWords="1"
>                 catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
>         <filter class="solr.LowerCaseFilterFactory" />
>     </analyzer>
> </fieldType>
> {code}
> The {{solrconfig.xml}} mostly is taken form the Hybris defaults and AFAIK does not do anything kinky. The following lines might be of interest:
> {code:xml}
> <luceneMatchVersion>7.5.0</luceneMatchVersion>
> <queryParser name="multiMaxScore" class="de.hybris.platform.solr.search.MultiMaxScoreQParserPlugin"/>
> {code}
> To sum it up, my expectation would have been:
> * The score in the result and explain section are identical.
> * Names matching only one of the two multiplied boost terms are receive the respective single boost instead of the default score 1.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org