You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/04/04 01:14:53 UTC
[jira] [Updated] (DRILL-2398) IS NOT DISTINCT FROM predicate
returns incorrect result when used as a join filter
[ https://issues.apache.org/jira/browse/DRILL-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Sinha updated DRILL-2398:
------------------------------
Fix Version/s: (was: 0.9.0)
1.0.0
> IS NOT DISTINCT FROM predicate returns incorrect result when used as a join filter
> ----------------------------------------------------------------------------------
>
> Key: DRILL-2398
> URL: https://issues.apache.org/jira/browse/DRILL-2398
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Reporter: Victoria Markman
> Assignee: Aman Sinha
> Priority: Critical
> Fix For: 1.0.0
>
> Attachments: j1.parquet, j2.parquet
>
>
> count(*) should return 0 and not NULL
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . > count(*)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > j1 INNER JOIN j2 ON
> . . . . . . . . . . . . > ( j1.c_double = j2.c_double)
> . . . . . . . . . . . . > where
> . . . . . . . . . . . . > j1.c_bigint IS NOT DISTINCT FROM j2.c_bigint
> . . . . . . . . . . . . > ;
> +------------+
> | EXPR$0 |
> +------------+
> +------------+
> {code}
> These are the values in the table
> {code}
> 0: jdbc:drill:schema=dfs> select j1.c_bigint, j2.c_bigint, count(*) from j1 INNER JOIN j2 ON (j1.c_double = j2.c_double) group by j1.c_bigint, j2.c_bigint;
> +------------+------------+------------+
> | c_bigint | c_bigint1 | EXPR$1 |
> +------------+------------+------------+
> | 460194667 | -498749284 | 1 |
> | 464547172 | -498828740 | 1 |
> | 467451850 | -498966611 | 2 |
> | 471050029 | -499154096 | 3 |
> | 472873799 | -499233550 | 3 |
> | 475698977 | -499395929 | 2 |
> | 478986584 | -499564607 | 1 |
> | 488139464 | -499763274 | 3 |
> | 498214699 | -499871720 | 2 |
> +------------+------------+------------+
> 9 rows selected (0.339 seconds)
> {code}
> IS DISTINCT FROM predicate returns correct result
> {code}
> select
> count(*)
> from
> j1 INNER JOIN j2 ON
> ( j1.c_double = j2.c_double)
> where
> j1.c_bigint IS DISTINCT FROM j2.c_bigint
> {code}
> Explain plan for query that returns incorrect result:
> {code}
> 00-01 StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 00-02 Project($f0=[0])
> 00-03 SelectionVectorRemover
> 00-04 Filter(condition=[CAST(CASE(IS NULL($1), IS NULL($3), IS NULL($3), IS NULL($1), =($1, $3))):BOOLEAN NOT NULL])
> 00-05 HashJoin(condition=[=($0, $2)], joinType=[inner])
> 00-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/joins/j1]], selectionRoot=/joins/j1, numFiles=1, columns=[`c_double`, `c_bigint`]]])
> 00-06 Project(c_double0=[$0], c_bigint0=[$1])
> 00-08 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/joins/j2]], selectionRoot=/joins/j2, numFiles=1, columns=[`c_double`, `c_bigint`]]])
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)