You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Anton Gozhiy (JIRA)" <ji...@apache.org> on 2018/11/26 10:31:00 UTC

[jira] [Comment Edited] (DRILL-6865) Query returns wrong result when filter pruning happens

    [ https://issues.apache.org/jira/browse/DRILL-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16697079#comment-16697079 ] 

Anton Gozhiy edited comment on DRILL-6865 at 11/26/18 10:30 AM:
----------------------------------------------------------------

The issue is also reproduced with the following case:
{code:sql}
select * from dfs.tmp.`multi` where n_nationkey > 5 and n_nationkey/2 < 5
{code}


was (Author: angozhiy):
The issue is also reproduced with the following case:
{code:sql}
select * from dfs.tmp.`multy` where n_nationkey > 5 and n_nationkey/2 < 5
{code}


> Query returns wrong result when filter pruning happens
> ------------------------------------------------------
>
>                 Key: DRILL-6865
>                 URL: https://issues.apache.org/jira/browse/DRILL-6865
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.14.0
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Volodymyr Vysotskyi
>            Priority: Blocker
>             Fix For: 1.15.0
>
>
> In DRILL-5796 was implemented removing the filter from the plan when some (or all) row groups of parquet table fully match the filter.
> For the case when filter has some predicates which parquet filter predicate does not support, they can be omitted for some cases from the resulting filter predicate. When row groups fully match predicates which left in the filter, the whole filter is removed from the plan and the wrong result is returned.
> Example of the query for reproducing this bug:
> {code:sql}
> create table dfs.tmp.`multi/t1` as select * from cp.`tpch/nation.parquet` where n_nationkey > 5;
> create table dfs.tmp.`multi/t2` as select * from cp.`tpch/nation.parquet` where n_nationkey < 5;
> select * from dfs.tmp.`multi` where n_nationkey > 5 and n_nationkey like '%10%';
> {code}
> returns
> {noformat}
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> | dir0  | n_nationkey  |     n_name      | n_regionkey  |                                                      n_comment                                                      |
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> | t1    | 6            | FRANCE          | 3            | refully final requests. regular, ironi                                                                              |
> | t1    | 7            | GERMANY         | 3            | l platelets. regular accounts x-ray: unusual, regular acco                                                          |
> | t1    | 8            | INDIA           | 2            | ss excuses cajole slyly across the packages. deposits print aroun                                                   |
> | t1    | 9            | INDONESIA       | 2            |  slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull  |
> | t1    | 10           | IRAN            | 4            | efully alongside of the slyly final dependencies.                                                                   |
> | t1    | 11           | IRAQ            | 4            | nic deposits boost atop the quickly final requests? quickly regula                                                  |
> | t1    | 12           | JAPAN           | 2            | ously. final, express gifts cajole a                                                                                |
> | t1    | 13           | JORDAN          | 4            | ic deposits are blithely about the carefully regular pa                                                             |
> | t1    | 14           | KENYA           | 0            |  pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t                       |
> | t1    | 15           | MOROCCO         | 0            | rns. blithely bold courts among the closely regular packages use furiously bold platelets?                          |
> | t1    | 16           | MOZAMBIQUE      | 0            | s. ironic, unusual asymptotes wake blithely r                                                                       |
> | t1    | 17           | PERU            | 1            | platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun          |
> | t1    | 18           | CHINA           | 2            | c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos                         |
> | t1    | 19           | ROMANIA         | 3            | ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account     |
> | t1    | 20           | SAUDI ARABIA    | 4            | ts. silent requests haggle. closely express packages sleep across the blithely                                      |
> | t1    | 21           | VIETNAM         | 2            | hely enticingly express accounts. even, final                                                                       |
> | t1    | 22           | RUSSIA          | 3            |  requests against the platelets use never according to the quickly regular pint                                     |
> | t1    | 23           | UNITED KINGDOM  | 3            | eans boost carefully special requests. accounts are. carefull                                                       |
> | t1    | 24           | UNITED STATES   | 1            | y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be      |
> +-------+--------------+-----------------+--------------+---------------------------------------------------------------------------------------------------------------------+
> {noformat}
> but single row should be returned:
> {noformat}
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> | dir0  | n_nationkey  | n_name  | n_regionkey  |                      n_comment                      |
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> | t1    | 10           | IRAN    | 4            | efully alongside of the slyly final dependencies.   |
> +-------+--------------+---------+--------------+-----------------------------------------------------+
> {noformat}
> Filter is removed from the plan, but it contains a predicate which wasn't applied:
> {noformat}
> 00-00    Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 19.0, cumulative cost = {77.9 rows, 115.9 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 400
> 00-01      Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount = 19.0, cumulative cost = {76.0 rows, 114.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 399
> 00-02        Project(T1¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR T1¦¦**): rowcount = 19.0, cumulative cost = {57.0 rows, 95.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 398
> 00-03          Project(T1¦¦**=[$0], n_nationkey=[$1]) : rowType = RecordType(DYNAMIC_STAR T1¦¦**, ANY n_nationkey): rowcount = 19.0, cumulative cost = {38.0 rows, 76.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 397
> 00-04            Scan(table=[[dfs, tmp, multi]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/multi/t1/0_0_0.parquet]], selectionRoot=file:/tmp/multi, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, `n_nationkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY n_nationkey): rowcount = 19.0, cumulative cost = {19.0 rows, 38.0 cpu, 38.0 io, 0.0 network, 0.0 memory}, id = 396
> {noformat}
> ----
> Additionally, a filter is not removed from the plan when parquet table with single row group is queried:
> {code:sql}
> create table dfs.tmp.`singleRowGroupTable` as select * from cp.`tpch/nation.parquet`;
> explain plan for select * from dfs.tmp.`singleRowGroupTable` where n_nationkey > -1;
> {code}
> returns plan
> {noformat}
> 00-00    Screen
> 00-01      Project(**=[$0])
> 00-02        Project(T0¦¦**=[$0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[>($1, -1)])
> 00-05              Project(T0¦¦**=[$0], n_nationkey=[$1])
> 00-06                Scan(table=[[dfs, tmp, singleRowGroupTable]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/tmp/singleRowGroupTable]], selectionRoot=file:/tmp/singleRowGroupTable, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, `n_nationkey`]]])
> {noformat}
> *Also, for the case when a table has multiple files, and filter matches all the table, it is not removed from the plan:*
> {code:sql}
> select * from dfs.tmp.`multi` where n_nationkey > -1;
> {code}
> has plan
> {noformat}
> 00-00    Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 12.0, cumulative cost = {109.2 rows, 277.2 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 196
> 00-01      Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): rowcount = 12.0, cumulative cost = {108.0 rows, 276.0 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 195
> 00-02        Project(T0¦¦**=[$0]) : rowType = RecordType(DYNAMIC_STAR T0¦¦**): rowcount = 12.0, cumulative cost = {96.0 rows, 264.0 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 194
> 00-03          SelectionVectorRemover : rowType = RecordType(DYNAMIC_STAR T0¦¦**, ANY n_nationkey): rowcount = 12.0, cumulative cost = {84.0 rows, 252.0 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 193
> 00-04            Filter(condition=[>($1, -1)]) : rowType = RecordType(DYNAMIC_STAR T0¦¦**, ANY n_nationkey): rowcount = 12.0, cumulative cost = {72.0 rows, 240.0 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 192
> 00-05              Project(T0¦¦**=[$0], n_nationkey=[$1]) : rowType = RecordType(DYNAMIC_STAR T0¦¦**, ANY n_nationkey): rowcount = 24.0, cumulative cost = {48.0 rows, 96.0 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 191
> 00-06                Scan(table=[[dfs, tmp, multi]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/tmp/multi/t2/0_0_0.parquet], ReadEntryWithPath [path=file:/tmp/multi/t1/0_0_0.parquet]], selectionRoot=file:/tmp/multi, numFiles=2, numRowGroups=2, usedMetadataFile=false, columns=[`**`, `n_nationkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY n_nationkey): rowcount = 24.0, cumulative cost = {24.0 rows, 48.0 cpu, 48.0 io, 0.0 network, 0.0 memory}, id = 190
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)