You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/04/04 01:14:53 UTC

[jira] [Updated] (DRILL-2398) IS NOT DISTINCT FROM predicate returns incorrect result when used as a join filter

     [ https://issues.apache.org/jira/browse/DRILL-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aman Sinha updated DRILL-2398:
------------------------------
    Fix Version/s:     (was: 0.9.0)
                   1.0.0

> IS NOT DISTINCT FROM predicate returns incorrect result when used as a join filter
> ----------------------------------------------------------------------------------
>
>                 Key: DRILL-2398
>                 URL: https://issues.apache.org/jira/browse/DRILL-2398
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Victoria Markman
>            Assignee: Aman Sinha
>            Priority: Critical
>             Fix For: 1.0.0
>
>         Attachments: j1.parquet, j2.parquet
>
>
> count(*) should return 0 and not NULL
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         count(*)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         j1 INNER JOIN j2 ON
> . . . . . . . . . . . . >         ( j1.c_double = j2.c_double)
> . . . . . . . . . . . . > where
> . . . . . . . . . . . . >         j1.c_bigint IS NOT DISTINCT FROM j2.c_bigint
> . . . . . . . . . . . . > ;
> +------------+
> |   EXPR$0   |
> +------------+
> +------------+
> {code}
> These are the values in the table
> {code}
> 0: jdbc:drill:schema=dfs> select j1.c_bigint, j2.c_bigint, count(*) from j1 INNER JOIN j2 ON (j1.c_double = j2.c_double) group by j1.c_bigint, j2.c_bigint;
> +------------+------------+------------+
> |  c_bigint  | c_bigint1  |   EXPR$1   |
> +------------+------------+------------+
> | 460194667  | -498749284 | 1          |
> | 464547172  | -498828740 | 1          |
> | 467451850  | -498966611 | 2          |
> | 471050029  | -499154096 | 3          |
> | 472873799  | -499233550 | 3          |
> | 475698977  | -499395929 | 2          |
> | 478986584  | -499564607 | 1          |
> | 488139464  | -499763274 | 3          |
> | 498214699  | -499871720 | 2          |
> +------------+------------+------------+
> 9 rows selected (0.339 seconds)
> {code}
> IS DISTINCT FROM predicate returns correct result
> {code}
> select
>         count(*)
> from
>         j1 INNER JOIN j2 ON
>         ( j1.c_double = j2.c_double)
> where
>         j1.c_bigint IS DISTINCT FROM j2.c_bigint
> {code}
> Explain plan for query that returns incorrect result:
> {code}
> 00-01      StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 00-02        Project($f0=[0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[CAST(CASE(IS NULL($1), IS NULL($3), IS NULL($3), IS NULL($1), =($1, $3))):BOOLEAN NOT NULL])
> 00-05              HashJoin(condition=[=($0, $2)], joinType=[inner])
> 00-07                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/joins/j1]], selectionRoot=/joins/j1, numFiles=1, columns=[`c_double`, `c_bigint`]]])
> 00-06                Project(c_double0=[$0], c_bigint0=[$1])
> 00-08                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/joins/j2]], selectionRoot=/joins/j2, numFiles=1, columns=[`c_double`, `c_bigint`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)