You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Robert Hou (JIRA)" <ji...@apache.org> on 2017/03/22 00:48:41 UTC

[jira] [Commented] (DRILL-5374) Parquet filter pushdown does not prune partition with nulls when predicate uses float column

    [ https://issues.apache.org/jira/browse/DRILL-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935591#comment-15935591 ] 

Robert Hou commented on DRILL-5374:
-----------------------------------

This is the Scan step from the explain plan:

{code}
00-06                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_1.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_4.parquet], ReadEntryWithPath [path=/drill/testdata/filter/orders_parts_metadata/0_0_2.parquet]], selectionRoot=/drill/testdata/filter/orders_parts_metadata, numFiles=3, usedMetadataFile=true, cacheFileRoot=/drill/testdata/filter/orders_parts_metadata, columns=[`float_id`]]])
{code}

Partition /drill/testdata/filter/orders_parts_metadata/0_0_4.parquet should not be scanned because it contains all null values for the float_id column.

> Parquet filter pushdown does not prune partition with nulls when predicate uses float column
> --------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5374
>                 URL: https://issues.apache.org/jira/browse/DRILL-5374
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.9.0
>            Reporter: Robert Hou
>            Assignee: Jinfeng Ni
>         Attachments: 0_0_1.parquet, 0_0_2.parquet, 0_0_3.parquet, 0_0_4.parquet, 0_0_5.parquet, drill.parquet_metadata
>
>
> Drill does not prune enough partitions for this query when filter pushdown is used with metadata caching. The float column is being compared with a double value.
> {code}
> 0: jdbc:drill:zk=10.10.100.186:5181/drill/rho> select count(*) from orders_parts_metadata where float_id < 1100.0;
> {code}
> To reproduce the problem, put the attached files into a directory. Then 
> {code}
> create the metadata:
> refresh table metadata dfs.`path_to_directory`;
> {code}
> For example, if you put the files in /drill/testdata/filter/orders_parts_metadata, then run this sql command
> {code}
> refresh table metadata dfs.`/drill/testdata/filter/orders_parts_metadata`;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)