You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Steven Phillips (JIRA)" <ji...@apache.org> on 2015/06/30 01:01:05 UTC
[jira] [Commented] (DRILL-3419) Ambiguity in query plan when we do partition pruning

    [ https://issues.apache.org/jira/browse/DRILL-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606610#comment-14606610 ] 

Steven Phillips commented on DRILL-3419:
----------------------------------------

We actually are pruning in case 3. The problem is, every file gets pruned out. We currently don't handle this case very well, since there is no "Empty Scan" operator. The quick solution was to scan just one of the files, and include the filter in the plan. We should figure out a better way to handle this.

> Ambiguity in query plan when we do partition pruning
> ----------------------------------------------------
>
>                 Key: DRILL-3419
>                 URL: https://issues.apache.org/jira/browse/DRILL-3419
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.1.0
>            Reporter: Khurram Faraaz
>            Assignee: Steven Phillips
>             Fix For: 1.2.0
>
>
> Note that in case (1) and case (2) we prune, however it is not clear if we prune is case (3), that is because we see a FILTER in the query plan in case (3)
> CTAS 
> {code}
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE CTAS_ONE_MILN_RWS_PER_GROUP(col1, col2) PARTITION BY (col2) AS select cast(columns[0] as bigint) col1, cast(columns[1] as char(2)) col2 from `millionValGroup.csv`;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 1_1       | 21932064                   |
> | 1_0       | 28067936                   |
> +-----------+----------------------------+
> 2 rows selected (73.661 seconds)
> {code}
> case 1)
> {code}
> explain plan for select col1, col2 from CTAS_ONE_MILN_RWS_PER_GROUP where col2 LIKE '%Z%';
> | 00-00    Screen
> 00-01      Project(col1=[$0], col2=[$1])
> 00-02        UnionExchange
> 01-01          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_0_3.parquet], ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_1_3.parquet]], selectionRoot=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP, numFiles=2, columns=[`col2`, `col1`]]])
> {code}
> case 2)
> {code}
> explain plan for select col1, col2 from CTAS_ONE_MILN_RWS_PER_GROUP where col2 LIKE 'A%';
> | 00-00    Screen
> 00-01      Project(col1=[$0], col2=[$1])
> 00-02        UnionExchange
> 01-01          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_0_3.parquet], ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_0_2.parquet], ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_1_1.parquet], ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_1_2.parquet], ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_1_3.parquet], ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_0_1.parquet]], selectionRoot=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP, numFiles=6, columns=[`col2`, `col1`]]])
> {code}
> case 3) we are NOT pruning here.
> {code}
> explain plan for select col1, col2 from CTAS_ONE_MILN_RWS_PER_GROUP where col2 LIKE 'Z%';
> | 00-00    Screen
> 00-01      Project(col1=[$1], col2=[$0])
> 00-02        SelectionVectorRemover
> 00-03          Filter(condition=[LIKE($0, 'Z%')])
> 00-04            Project(col2=[$1], col1=[$0])
> 00-05              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP/1_1_48.parquet]], selectionRoot=/tmp/CTAS_ONE_MILN_RWS_PER_GROUP, numFiles=1, columns=[`col2`, `col1`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)