You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Karl Wolf (JIRA)" <ji...@apache.org> on 2019/04/03 17:46:00 UTC

[jira] [Created] (SOLR-13367) Highlighting fails for Range queries on Multi-valued String fields

Karl Wolf created SOLR-13367:
--------------------------------

             Summary: Highlighting fails for Range queries on Multi-valued String fields
                 Key: SOLR-13367
                 URL: https://issues.apache.org/jira/browse/SOLR-13367
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: highlighter
    Affects Versions: 7.7.1, 7.5
         Environment: RedHat Linux v7

Java 1.8.0_201
            Reporter: Karl Wolf
             Fix For: 5.1


Range queries against multi-valued string fields produces useless highlighting, even though "hl.highlightMultiTerm":"true"

I have uncovered what I believe is a bug. At the very lease it is a difference in behavior between Solr v5.1.0 and v7.5.0 (and v7.7.1).

I have a multi-valued string Field defined in my schema as:

    <fieldType name="string" class="solr.StrField" sortMissingLast="true"/> 
    <field name="MyStringField" type="string" indexed="true" stored="true" multiValued="true" />

I am using a query containing a Range clause and I am using highlighting to get the list of values that actually matched the range query.

All examples below were using the appropriate Solr Admin Server SolrCore Query page.

***************************************************************************
First, a correctly working example of a range query using Solr v5.1.0 which produces useful results:

{
  "responseHeader": {
    "status": 0,
    "QTime": 366,
    "params": {
      "q": "MyStringField:[A TO B}",
      "hl": "true",
      "indent": "true",
      "hl.preserveMulti": "true",
      "fl": "MyStringField,MyUniqueID",
      "hl.requireFieldMatch": "true",
      "hl.usePhraseHighlighter": "true",
      "hl.fl": "MyStringField",
      "wt": "json",
      "hl.highlightMultiTerm": "true",
      "_": "1553275722025"
    }
  },
  "response": {
    "numFound": 999,
    "start": 0,
    "docs": [
      {
        "MyStringField": [
          "Stanley, Wendell M.",
          "Avery, Roy"
        ],
        "MyUniqueID": "UniqueID1"
      },
      {
        "MyStringField": [
          "Avery, Roy"
        ],
        "MyUniqueID": "UniqueID2"
      },
*** lots more docs correctly found
    ]
  },
*** we get to the highlighting portion of the response
*** this indicates which values of each MyStringField
*** that actually matched the query

  "highlighting": {
    "UniqueID1": {
      "MyStringField": [
        "<em>Avery, Roy</em>"
      ]
    },
    "UniqueID2": {
      "MyStringField": [
        "<em>Avery, Roy</em>"
      ]
    },
    "UniqueID3": {
      "MyStringField": [
        "<em>American Institute of Biological Sciences</em>",
        "<em>Albritton, Errett C.</em>"
      ]
    },
... etc.

 *** lots more useful highlight values. Note the two matching values
 *** for document UniqueID3. 
}


***************************************************************************
* THE PROBLEM
* Now using newer versions of Solr
***************************************************************************
Using the exact same parameters with Solr v7.5.0 or v7.7.1, the top portion of the 
response is basically the same including the number of documents found

{
  "responseHeader":{
    "status":0,
    "QTime":245,
    "params":{
      "q":"MyStringField:[A TO B}",
      "hl":"on",
      "hl.preserveMulti":"true",
      "fl":"MyUniqueID, MyStringField",
      "hl.requireFieldMatch":"true",
      "hl.fl":"MyStringField",
      "hightlightMultiTerm":"true",
      "wt":"json",
      "_":"1553105129887",
      "usePhraseHighLighter":"true"}},
  "response":{"numFound":999,"start":0,"docs":[

*** The problem is with the lighlighting portion of the results, which is effectively empty. 
*** There is no way to know what values in each document that actually matched the query:

  "highlighting":{
    "UniqueID1":{},
    "UniqueID2":{},
    "UniqueID3":{},
... etc.

*** NOTE: The source data is the same for all of the tested Solr versions and the Solr indexes
*** were properly rebuilt for each Solr version. 

***************************************************************************
Changing the request to using the "unified" highlighter: "hl.method=unified", the highlighting looks like:

  "highlighting":{
    "UniqueID1":{
      "MyStringField":[]},
    "UniqueID2":{
      "MyStringField":[]},
    "UniqueID3":{
      "MyStringField":[]},
... etc.

*** The highlighting now properly lists the matching field but still no useful values are listed.

***************************************************************************
NOTE: if I change the query from using a Range clause to using a Wildcard query: q="MyStringField:A*"

the highlighting is correct in both Solr v7.5.0 and v7.7.1: These are GOOD results!

  "highlighting":{
    "UniqueID1": {
      "MyStringField": ["<em>Avery, Roy</em>"]},
    "UniqueID2": {
      "MyStringField": ["<em>Avery, Roy</em>"]},
    "UniqueID3": {
      "MyStringField": [
        "<em>American Institute of Biological Sciences</em>",
        "<em>Albritton, Errett C.</em>"
      ]
    },
... etc.

*** This makes me think there is some problem with the way a Range query
*** feeds the search results to the Solr Highlighter code.

***************************************************************************
All attempts to vary the hl specs or any other query parameters do not solve the problem.
The wildcard query is my current work around but there still is a problem with
range queries:

In summary, there is some incompatibility among:

	1) A multi-valued string field AND
	2) A range query against that field AND
	3) The result Highlighting. It is effectively empty.

I don't know when this issue was first introduced. I have recently been updating from 5.1.0
to 7.5.0 in one big leap. I have attempted to read through the change logs for the intervening
versions but I gave up to save my sanity.

You should be able to reproduce this issue using any multi-valued, indexed and stored string field.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org