You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Peter Davie (Jira)" <ji...@apache.org> on 2019/10/12 04:39:00 UTC
[jira] [Created] (SOLR-13838) igain query parser generating invalid output

Peter Davie created SOLR-13838:
----------------------------------

             Summary: igain query parser generating invalid output
                 Key: SOLR-13838
                 URL: https://issues.apache.org/jira/browse/SOLR-13838
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: query parsers
    Affects Versions: 8.2
         Environment: The issue is a generic Java defect and therefore will be independent of the operating system or software platform.
            Reporter: Peter Davie
             Fix For: 8.3
         Attachments: IGainTermsQParserPlugin.java.patch

Investigating the output from the "features()" stream source, terms are being returned with NaN for the score_f field:

{{{{    "docs": [}}}}
{{{{      {}}}}
{{{{        "featureSet_s": "business",}}}}
{{{{        "score_f": "NaN",}}}}
{{{{        "term_s": "1,011.15",}}}}
{{{{        "idf_d": "-Infinity",}}}}
{{{{        "index_i": 1,}}}}
{{{{        "id": "business_1"}}}}
{{{{      },}}}}
{{{{      {}}}}
{{{{        "featureSet_s": "business",}}}}
{{{{        "score_f": "NaN",}}}}
{{{{        "term_s": "10.3m",}}}}
{{{{        "idf_d": "-Infinity",}}}}
{{{{        "index_i": 2,}}}}
{{{{        "id": "business_2"}}}}
{{{{      },}}}}
{{{{      {}}}}
{{{{        "featureSet_s": "business",}}}}
{{{{        "score_f": "NaN",}}}}
{{{{        "term_s": "01",}}}}
{{{{        "idf_d": "-Infinity",}}}}
{{{{        "index_i": 3,}}}}
{{{{        "id": "business_3"}}}}
{{{{      },...}}}}

Looking into{{ org/apache/solr/search/IGainTermsQParserPlugin.java}}, it seems that when a term is not included in the positive or negative documents, the docFreq calculation (docFreq = xc + nc) is 0, which means that subsequent calculations result in NaN (division by 0).

Attached is a patch which skips terms for which docFreq
is 0 in the finish() method of IGainTermsQParserPlugin and this resolves the issues with NaN scores in the features() output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org