You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/01/11 04:48:34 UTC
[jira] [Comment Edited] (DRILL-1500) Partition filtering might lead
to an unnecessary column in the result set.
[ https://issues.apache.org/jira/browse/DRILL-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272794#comment-14272794 ]
Aman Sinha edited comment on DRILL-1500 at 1/11/15 3:47 AM:
------------------------------------------------------------
This is not related to partition filters. I can repro the extra column even with the query below which does not involve partition columns. I believe this has to do with how we handle the '*' column in the planning and/or execution phase. In planning phase, if a Project above Scan is producing both * and another column, we should remove that column since the * column subsumes all columns. In execution phase, there's a known issue with duplicates (DRILL-1778) which seems related to this.
{code}
// Wrong number of columns
jdbc:drill:zk=local> select * from cp.`tpch/nation.parquet` where n_nationkey < 2 order by n_regionkey;
+-------------+------------+-------------+------------+--------------+
| n_nationkey | n_name | n_regionkey | n_comment | n_regionkey0 |
+-------------+------------+-------------+------------+--------------+
| 0 | ALGERIA | 0 | haggle. carefully final deposits detect slyly agai | 0 |
| 1 | ARGENTINA | 1 | al foxes promise slyly according to the regular accounts. bold requests alon | 1 |
+-------------+------------+-------------+------------+--------------+
{code}
was (Author: amansinha100):
This is not related to partition filters. I can repro the extra column even with the query below which does not involve partition columns. I believe this has to do with how we handle the '*' column in the planning and/or execution phase. In planning phase, if a Project above Scan is producing both * and another column, we should remove that column since the * column subsumes all columns. In execution phase, there's a known issue with duplicates (DRILL-1778) which seems related to this.
{sql}
// Wrong number of columns
jdbc:drill:zk=local> select * from cp.`tpch/nation.parquet` where n_nationkey < 2 order by n_regionkey;
+-------------+------------+-------------+------------+--------------+
| n_nationkey | n_name | n_regionkey | n_comment | n_regionkey0 |
+-------------+------------+-------------+------------+--------------+
| 0 | ALGERIA | 0 | haggle. carefully final deposits detect slyly agai | 0 |
| 1 | ARGENTINA | 1 | al foxes promise slyly according to the regular accounts. bold requests alon | 1 |
+-------------+------------+-------------+------------+--------------+
{sql}
> Partition filtering might lead to an unnecessary column in the result set.
> ---------------------------------------------------------------------------
>
> Key: DRILL-1500
> URL: https://issues.apache.org/jira/browse/DRILL-1500
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Reporter: Jinfeng Ni
> Assignee: Aman Sinha
> Priority: Critical
> Fix For: 0.8.0
>
>
> When partition filtering is used together with select * query, Drill might return the partitioning column duplicately.
> Q1 :
> {code}
> select * from dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet` where dir0=1994 and dir1='Q1' order by dir0 limit 1;
> +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> | dir00 | dir0 | dir1 | o_clerk | o_comment | o_custkey | o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority | o_totalprice |
> +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> | 1994 | 1994 | Q1 | Clerk#000000743 | y pending requests integrate | 1292 | 1994-01-20 | 66 | 5-LOW | F | 0 | 104190.66 |
> +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> 1 row selected (2.097 seconds)
> {code}
> We can see that column "dir0" appeared twice in the result set. In comparison, here is the query without partition filtering and the query result:
> Q2:
> {code}
> select * from dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet` order by dir0 limit 1;
> +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> | dir0 | dir1 | o_clerk | o_comment | o_custkey | o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority | o_totalprice |
> +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> | 1994 | Q1 | Clerk#000000743 | y pending requests integrate | 1292 | 1994-01-20 | 66 | 5-LOW | F | 0 | 104190.66 |
> +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> 1 row selected (0.761 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)