You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Rahul Challapalli (JIRA)" <ji...@apache.org> on 2015/06/29 21:16:04 UTC

[jira] [Created] (DRILL-3418) Partition Pruning : We are over-pruning and this leads to wrong results

Rahul Challapalli created DRILL-3418:
----------------------------------------

             Summary: Partition Pruning : We are over-pruning and this leads to wrong results
                 Key: DRILL-3418
                 URL: https://issues.apache.org/jira/browse/DRILL-3418
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
            Reporter: Rahul Challapalli
            Assignee: Steven Phillips
            Priority: Critical
             Fix For: 1.1.0


git.commit.id.abbrev=c199860

We are over-pruning based on the below plan.
{code}
explain plan for select * from `existing_partition_pruning/lineitem_hierarchical_intstring` where (dir0=1993 or dir1='jun') and (dir0=1991 or dir1='aug' or columns[0] > 5000);
00-00    Screen
00-01      Project(*=[$0])
00-02        Project(T17¦¦*=[$0])
00-03          SelectionVectorRemover
00-04            Filter(condition=[AND(OR(=($1, 1993), =($2, 'jun')), OR(=($1, 1991), =($2, 'aug'), >(ITEM($3, 0), 5000)))])
00-05              Project(T17¦¦*=[$0], dir0=[$1], dir1=[$2], columns=[$3])
00-06                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitem_hierarchical_intstring/0_0_26.parquet], ReadEntryWithPath [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitem_hierarchical_intstring/0_0_7.parquet]], selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitem_hierarchical_intstring, numFiles=2, columns=[`*`]]])
{code}

We have 7 year partitions and each year has 12 further month partitions.

I executed a count(*) query based on the above filters with & without partitioning in place and the values were different

Without Partitions :
{code}
select count(*) from tbl_nopartitions where (dir0=1993 or dir1='jun') and (dir0=1991 or dir1='aug' or columns[0] > 5000);
+---------+
| EXPR$0  |
+---------+
| 14515   |
+---------+
1 row selected (0.903 seconds)
{code}

With Partitions :
{code}
select count(*) from `existing_partition_pruning/lineitem_hierarchical_intstring` where (dir0=1993 or dir1='jun') and (dir0=1991 or dir1='aug' or columns[0] > 5000);
+---------+
| EXPR$0  |
+---------+
| 1800    |
+---------+
1 row selected (0.49 seconds)
{code}

The data is larger than 10 MB to upload here



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)