You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Trey Grainger (Jira)" <ji...@apache.org> on 2019/10/08 19:53:00 UTC

[jira] [Commented] (SOLR-13829) RecursiveEvaluator casts Continuous numbers to Discrete Numbers, causing mismatch

    [ https://issues.apache.org/jira/browse/SOLR-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947161#comment-16947161 ] 

Trey Grainger commented on SOLR-13829:
--------------------------------------

Removing the code that converts incoming BigDecimals to a Long fixes the issue, and all Solr tests pass once that is done.  I'll upload a patch shortly that makes this change.

 

I'm not sure if this potentially creates any side effects (i.e. loss of precision on certain calculations that went through multiple data type transformations) that were intended to be avoided by that code. Perhaps [~dpgove] or [~jbernste] may know the reason for that original conversion of BigDecimals to longs and could shed some light.

The other way to solve this would be to modify every streaming expression (like the sort evaluator) to explicitly do type checking and conversions anytime incompatible types are being compared, but this feels inefficient and I'm assuming probably not like the right approach unless doing it that way was an intentional design decision in how the streaming expressions framework is intending to handle data types.

 

> RecursiveEvaluator casts Continuous numbers to Discrete Numbers, causing mismatch
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-13829
>                 URL: https://issues.apache.org/jira/browse/SOLR-13829
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Trey Grainger
>            Priority: Major
>
> In trying to use the "sort" streaming evaluator on float field (pfloat), I am getting casting errors back based upon which values are calculated based upon underlying values in a field.
> Example:
> *Docs:* (paste each into "Documents" pane in Solr Admin UI as type:"json")
>  
> {code:java}
> {"id": "1", "name":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]}
> {"id": "2", "name":"cheese pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]}{code}
>  
> *Streaming Expression:*
>  
> {code:java}
> sort(select(search(food_collection, q="*:*", fl="id,vector_fs", sort="id asc"), cosineSimilarity(vector_fs, array(5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as sim, id), by="sim desc"){code}
>  
> *Response:*
>  
> {code:java}
> { 
>   "result-set": {
>     "docs": [
>       {
>         "EXCEPTION": "class java.lang.Double cannot be cast to class java.lang.Long (java.lang.Double and java.lang.Long are in module java.base of loader 'bootstrap')",
>         "EOF": true,
>         "RESPONSE_TIME": 13
>       }
>     ]
>   }
> }{code}
>  
>  
> This is because in org.apache.solr.client.solrj.io.eval.RecursiveEvaluator, there is a line which examines a numeric (BigDecimal) value and - regardless of the type of the field the value originated from - converts it to a Long if it looks like a whole number. This is the code in question from that class:
> {code:java}
> protected Object normalizeOutputType(Object value) {
>     if(null == value){
>       return null;
>     } else if (value instanceof VectorFunction) {
>       return value;
>     } else if(value instanceof BigDecimal){
>       BigDecimal bd = (BigDecimal)value;
>       if(bd.signum() == 0 || bd.scale() <= 0 || bd.stripTrailingZeros().scale() <= 0){
>         try{
>           return bd.longValueExact();
>         }
>         catch(ArithmeticException e){
>           // value was too big for a long, so use a double which can handle scientific notation
>         }
>       }
>       
>       return bd.doubleValue();
>     }
> ... [other type conversions]
> {code}
> Because of the *return bd.longValueExact()*; line, the calculated value for "sim" in doc 1 is "Float(1)", whereas the calculated value for "sim" for doc 2 is "Double(0.88938313). These are coming back as incompatible data types, even though the source data is all of the same type and should be comparable.
> Thus when the *sort* evaluator streaming expression (and probably others) runs on these calculated values and the list should contain ["0.88938313", "1.0"], an exception is thrown because the it's trying to compare incompatible data types [Double("0.99"), Long(1)].
> This bug is occurring on master currently, but has probably existed in the codebase since at least August 2017.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org