You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Volodymyr Tkach (JIRA)" <ji...@apache.org> on 2018/02/13 14:24:00 UTC
[jira] [Created] (DRILL-6154) NaN, Infinity issues
Volodymyr Tkach created DRILL-6154:
--------------------------------------
Summary: NaN, Infinity issues
Key: DRILL-6154
URL: https://issues.apache.org/jira/browse/DRILL-6154
Project: Apache Drill
Issue Type: Bug
Reporter: Volodymyr Tkach
Assignee: Volodymyr Tkach
1.Issue
*AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
*AFFECTED_FUNCTIONS:*
- *sqrt*
- *trunc*
*ISSUE_DESCRIPTION:* According to DRILL-5919/MD-2745, new json number literals were added: *NaN, Infinity, -Infinity*. The new data types must be processed properly by existing functions. There are a few issues:
*1. SQRT function*. Run the following test query: {code}select sqrt(Nan) NaN, sqrt(Positive_Infinity) POS_INF, sqrt(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
- EXPECTED_RESULT: it was expected to get the following result: _NaN, Infinity, NaN_ (expected result is based on java Math.sqrt() method)
- ACTUAL_RESULT: the test query returned: _NaN, Infinity, Infinity_
*2. TRUNC function*. According to DRILL docs (https://drill.apache.org/docs/math-and-trig/): _TRUNC(x, y) : Truncates x to y decimal places. *Specifying y is optional. Default is 1*_. So, the function must work properly without specifying *y*
However an error message appears. Run test_query: {code}select trunc(Nan) NaN, trunc(Positive_Infinity) POS_INF, trunc(Negative_Infinity) NEG_INF from dfs.tmp.`PN_Inf_NaN.json`{code}
- EXPECTED_RESULT: it was expected to get the following result *NaN, NaN, NaN*
- ACTUAL_RESULT: it appears the following error message: *Query Failed: An Error Occurred org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: NumberFormatException Fragment 0:0 [Error Id: 95e01fee-7433-4b0b-b913-32358b4a8f55 on node1:31010]*
Please investigate and fix, test file attached PN_Inf_NaN.json
2. Issue
*AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
*AFFECTED_FUNCTIONALITY:* INNER JOIN
*ISSUE_DESCRIPTION:* There were added new Json data types in MD-2745/DRILL-5919: *NaN, Infinity, -Infinity*.
During testing activities, it was detected a bit strange behavior of INNER JOIN operator - different query results in almost the same queries.
*Query1* {code} select distinct t.name, tt.name from dfs.tmp.`ObjsX.json` t inner join dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
*Query2* {code} select distinct t.name from dfs.tmp.`ObjsX.json` t inner join dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
*Query1* differs from *Query2* by 1 columns only:
- In *Query1* - 2 columns are selected - t.name, tt.name
- In *Query2* - 1 column is selected - t.name
However *Query1*/*Query2* return completely different results:
- *Query1* returns
{code}
name name0
object2 object2
object2 object3
object2 object4
object3 object2
object3 object3
object3 object4
object4 object2
object4 object3
object4 object4
{code}
This result seems to be correct.
- *Query2* returns _*No result found*_, not expected:
*EXPECTED_RESULT:*
{code}
name
object2
object3
object4
{code}
*ACTUAL_RESULT*: {code}No result found{code}
*NB!:* the issue appears only if tables are _*JOINed by a column which contains newly-added data types (NaN, Infinity, -Infinity)*_. The issue is not reproducible is a user is JOINing tables by a column containing other data types
3. Issue
*AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
*AFFECTED_FUNCTIONALITY:* ORDER BY, DESC
*THIS ISSUE REFERS TO:* MD-2745
*ISSUE_DESCRIPTION:* 'ORDER BY/DESC' clause behaves in different ways when sorting columns containing NaN values. In one case it considers NaN to be the largest value, in another - the smallest one.
*Steps:*
- Select from the attached test file (orderBy.json, attached)
{code}SELECT name, attr4 from dfs.tmp.`orderBy.json` order by name, attr4{code}
- Check the attached screen shot (orderByIssue.jpg):
*EXPECTED_RESULT:* It was expected the 'ORDER BY' clause to sort attr4 columns data in the same way (most probably NaN should be the largest, see *NB*)
*ACTUAL_RESULT:* attr4 column's values were sorted in different ways: for 'obj1'/'obj3' NaN is the largest, for 'obj2'/'obj4' NaN is the smallest.
*NB:* Postgres as well as Java's sorting (Collection.sort() / Arrays.sort() methods) treats NaN as the largest value
4. Issue
*AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
*AFFECTED_FUNCTIONALITY:* max(column_name), min(column_name),
*ISSUE_DESCRIPTION:* min/max aggregation functions return the same result if the selected column contains NaN value.
{code}SELECT name, max(attr4), min(attr4) from dfs.tmp.`minMax.json` group by name{code}
Result
{code}
name Min Max
obj1 NaN NaN
obj2 NaN NaN
obj3 NaN NaN
obj4 NaN NaN
{code}
As for me, this logic should be revised, current behavior is a bit confusing:
- Postgres considers NaN to be the largest value, so {{MAX(col_withNaN)}} will return NaN, {{MIN(col_withNaN)}} - will return other value, so DRILL's min/max logic can be adjusted to the Postgres' one, see postgres.jpg
- Or NAN can behave like NULL - DRILL's MIN/MAX functions ignore NULLs
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)