You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Alex (JIRA)" <ji...@apache.org> on 2018/01/30 17:32:00 UTC

[jira] [Created] (DRILL-6120) NaN/Infinity: some functions don't work as expected

Alex created DRILL-6120:
---------------------------

             Summary: NaN/Infinity: some functions don't work as expected
                 Key: DRILL-6120
                 URL: https://issues.apache.org/jira/browse/DRILL-6120
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - JSON
    Affects Versions: 1.13.0
            Reporter: Alex
            Assignee: Volodymyr Tkach
         Attachments: PN_Inf_NaN.json, minMax.json, postgres.jpg

*AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT

*AFFECTED_FUNCTIONS:*
- *sqrt*
- *trunc*
- *min/max*

*ISSUE_DESCRIPTION:* According to DRILL-5919, new json number literals were added: *NaN, Infinity, -Infinity*. The new data types must be processed properly by existing functions. There are a few issues:
*1. SQRT function*. Run the following test query: \{code}select  sqrt(Nan) NaN, sqrt(Positive_Infinity) POS_INF, sqrt(Negative_Infinity) NEG_INF  from dfs.tmp.`PN_Inf_NaN.json`\{code} 
- EXPECTED_RESULT: it was expected to get the following result: _NaN, Infinity, NaN_ (expected result is based on java Math.sqrt() method)
- ACTUAL_RESULT: the test query returned: _NaN, Infinity, Infinity_

*2. TRUNC function*. According to DRILL docs (https://drill.apache.org/docs/math-and-trig/): _TRUNC(x, y) : Truncates x to y decimal places. *Specifying y is optional. Default is 1*_. So, the function must work properly without specifying *y*
However an error message appears. Run test_query: \{code}select  trunc(Nan) NaN, trunc(Positive_Infinity) POS_INF, trunc(Negative_Infinity) NEG_INF  from dfs.tmp.`PN_Inf_NaN.json`\{code}
- EXPECTED_RESULT: it was expected to get the following result *NaN, NaN, NaN*
- ACTUAL_RESULT: it appears the following error message: *Query Failed: An Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NumberFormatException Fragment 0:0 [Error Id: 95e01fee-7433-4b0b-b913-32358b4a8f55 on node1:31010]*

*3. MIN / MAX functions*
*ISSUE_DESCRIPTION:* min/max aggregation functions return the same result if the selected column contains NaN value.
{code}SELECT name, max(attr4), min(attr4) from dfs.tmp.`minMax.json` group by name{code}
Result
{code}
name	Min	Max
obj1	NaN	NaN
obj2	NaN	NaN
obj3	NaN	NaN
obj4	NaN	NaN
{code}

As for me, this logic should be revised, current behavior is a bit confusing:
- Postgres considers NaN to be the largest value, so {{MAX(col_withNaN)}} will return NaN, {{MIN(col_withNaN)}} - will return other value, so DRILL's min/max logic can be adjusted to the Postgres' one, see postgres.jpg
- Or NAN can behave like NULL - DRILL's MIN/MAX functions ignore NULLs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)