You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2019/10/08 22:44:53 UTC

[GitHub] [lucene-solr] treygrainger opened a new pull request #932: SOLR-13829: Stop converting continuous numbers to Longs in Streaming Expressions

treygrainger opened a new pull request #932: SOLR-13829: Stop converting continuous numbers to Longs in Streaming Expressions
URL: https://github.com/apache/lucene-solr/pull/932
 
 
   # Description
   
   In some cases, streaming expressions are taking continuous numeric values (double, float, etc.) and converting them to Longs if the scale of the number is zero (i.e. 1.0, 2.0, 3.0, etc.). This causes issues because steaming operations which try to compare different values may blow up because the types are incompatible.
   
   For example, In trying to use the "sort" streaming evaluator on float field (pfloat), I am getting casting errors back based upon which values are calculated based upon underlying values in a field. For example:
   
   **Docs:** (paste each into "Documents" pane in Solr Admin UI as type:"json")
   
   ```
   {"id": "1", "name":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]}
   
   {"id": "2", "name":"cheese pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]}
   ```
   
   **Streaming Expression:**
   ```
   sort(select(search(food_collection, q="*:*", fl="id,vector_fs", sort="id asc"), cosineSimilarity(vector_fs, array(5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as sim, id), by="sim desc")
   ```
   
   **Response:**
   ```
   { 
     "result-set": {
       "docs": [
         {
           "EXCEPTION": "class java.lang.Double cannot be cast to class java.lang.Long (java.lang.Double and java.lang.Long are in module java.base of loader 'bootstrap')",
           "EOF": true,
           "RESPONSE_TIME": 13
         }
       ]
     }
   }
   ```
   
   This is because in org.apache.solr.client.solrj.io.eval.RecursiveEvaluator, there is a line which examines a numeric (BigDecimal) value and - regardless of the type of the field the value originated from - converts it to a Long if it looks like a whole number. This is the code in question from that class:
   
   ```
   protected Object normalizeOutputType(Object value) {
       if(null == value){
         return null;
       } else if (value instanceof VectorFunction) {
         return value;
       } else if(value instanceof BigDecimal){
         BigDecimal bd = (BigDecimal)value;
         if(bd.signum() == 0 || bd.scale() <= 0 || bd.stripTrailingZeros().scale() <= 0){
           try{
             return bd.longValueExact();
           }
           catch(ArithmeticException e){
             // value was too big for a long, so use a double which can handle scientific notation
           }
         }
         
         return bd.doubleValue();
       }
   ... [other type conversions]
   ```
   
   Because of the `return bd.longValueExact();` line, the calculated value for "sim" in doc 1 is "Float(1)", whereas the calculated value for "sim" for doc 2 is "Double(0.88938313). These are coming back as incompatible data types, even though the source data is all of the same type and should be comparable.
   
   Thus when the sort evaluator streaming expression (and probably others) runs on these calculated values and the list should contain ["0.88938313", "1.0"], an exception is thrown because the it's trying to compare incompatible data types [Double("0.99"), Long(1)].
   
   This bug is occurring on master currently, but has probably existed in the codebase since at least August 2017.
   
   # Solution
   
   Removing the code that converts incoming BigDecimals to a Long fixes the issue, and all Solr tests still pass, as well as `ant precommit` once that is done.  In reviewing various responses, I haven't thusfar been able to identify any problem resulting from this change. This is the solution I've implemented, but would appreciate review.
   
   I'm not sure if this could potentially create any side effects (i.e. loss of precision on certain calculations that went through multiple data type transformations) that were intended to be avoided by that code. Perhaps @dennisgove or @joel-bernstein  may know the reason for that original conversion of BigDecimals to longs and could shed some light.
   
   I guess the other way to solve this would be to modify every streaming expression (like the sort evaluator) to explicitly do type checking and conversions anytime incompatible types are being compared, but this feels inefficient and I'm assuming probably not like the right approach unless doing it that way was an intentional design decision in how the streaming expressions framework is intending to handle data types.
   
   # Tests
   
   `ant test` passes
   `ant precommit` passes
   I can add a test validating this works once the solution approach has been validated.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I am authorized to contribute this code to the ASF and have removed any code I do not have a license to distribute.
   - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only).
   
   I don't believe any documentation changes are needed, as this is just fixing a bug. See notes on unit tests under the `Tests` section.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org