You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Alexander Malashevsky (JIRA)" <ji...@apache.org> on 2018/01/30 17:44:00 UTC

[jira] [Commented] (DRILL-6121) Nan/Inf data types: strange query result with INNER JOIN operator when selecting 1 column

    [ https://issues.apache.org/jira/browse/DRILL-6121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345508#comment-16345508 ] 

Alexander Malashevsky commented on DRILL-6121:
----------------------------------------------

Investigation showed that most probably the issue exists in *mergejoin*, because:
- the issue is not reproducble {code}planner.enable_mergejoin/planner.enable_nestedloopjoin = false, planner.enable_hashjoin = true{code}
- the issue is not reproducble {code}planner.enable_hashjoin/planner.enable_mergejoin = false, planner.enable_nestedloopjoin = true{code}
- the issue is reproducible {code}planner.enable_hashjoin/planner.enable_nestedloopjoin = false, planner.enable_mergejoin = true{code}

So, the issue is caused by planner.enable_mergejoin (when JOINing tables by a column which contains NaN/Infinity/-Infinity

> Nan/Inf data types: strange query result with INNER JOIN operator when selecting 1 column
> -----------------------------------------------------------------------------------------
>
>                 Key: DRILL-6121
>                 URL: https://issues.apache.org/jira/browse/DRILL-6121
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>            Reporter: Alexander Malashevsky
>            Assignee: Volodymyr Tkach
>            Priority: Minor
>         Attachments: ObjsX.json
>
>
> *AFFECTED_VERSION:* drill-1.13.0-SNAPSHOT
> *AFFECTED_FUNCTIONALITY:* INNER JOIN
> *ISSUE_DESCRIPTION:* There were added new Json data types in DRILL-5919: *NaN, Infinity, -Infinity*. 
> During testing activities, it was detected a bit strange behavior of INNER JOIN operator - different query results in almost the same queries. 
> *Query1* {code} select distinct t.name, tt.name from dfs.tmp.`ObjsX.json` t inner join dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
> *Query2* {code} select distinct t.name from dfs.tmp.`ObjsX.json` t inner join dfs.tmp.`ObjsX.json` tt on t.attr4 = tt.attr4 {code}
> *Query1* differs from *Query2* by 1 columns only:
> - In *Query1* - 2 columns are selected - t.name, tt.name
> - In *Query2* - 1 column is selected - t.name
> However *Query1*/*Query2* return completely different results:
> - *Query1* returns
> 	{code}
> 	name         name0
> 	object2 	object2
> 	object2 	object3
> 	object2 	object4
> 	object3 	object2
> 	object3 	object3
> 	object3 	object4
> 	object4 	object2
> 	object4 	object3
> 	object4 	object4
> 	{code}
> This result seems to be correct.
> - *Query2* returns _*No result found*_, not expected:
> 	*EXPECTED_RESULT:*
> 	{code}
> 	name
> 	object2
> 	object3
> 	object4
> 	{code}
> 	
> 	*ACTUAL_RESULT*: {code}No result found{code}
> *NB!:* the issue appears only if tables are _*JOINed by a column which contains newly-added data types (NaN, Infinity, -Infinity)*_. The issue is not reproducible is a user is JOINing tables by a column containing other data types



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)