You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/07/05 23:49:00 UTC
[jira] [Commented] (DRILL-4139) Fix parquet partition pruning for
BIT, INTERVAL and DECIMAL types
[ https://issues.apache.org/jira/browse/DRILL-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075638#comment-16075638 ]
ASF GitHub Bot commented on DRILL-4139:
---------------------------------------
Github user jinfengni commented on a diff in the pull request:
https://github.com/apache/drill/pull/805#discussion_r125784066
--- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java ---
@@ -1008,8 +1008,24 @@ public void setMax(Object max) {
return nulls;
}
- @Override public boolean hasSingleValue() {
- return (max != null && min != null && max.equals(min));
+ /**
+ * Checks that the column chunk has single value.
+ * Returns true if min and max are the same, but not null.
+ * Returns true if min and max are null and the number of null values
+ * in the column chunk is greater than 0.
+ *
+ * @return true if column has single value
--- End diff --
My understanding is hasSingleValue() return true if the column meta data shows only one single value. A null value is also counted as a different value from other non-null value.
Therefore, for the case of column has min != null && max !=null && min.equals(max) && nulls!=null && nulls > 0, it should return false. However, in both the implementation of v1 and v3, it would return true.
That would actually lead to wrong query result. A simple reproduce:
```
create table dfs.tmp.`t5/a` as select 100 as mykey from cp.`tpch/nation.parquet` union all select col_notexist from cp.`tpch/region.parquet`;
create table dfs.tmp.`t5/b` as select 200 as mykey from cp.`tpch/nation.parquet` union all select col_notexist from cp.`tpch/region.parquet`;
```
We got two files, each having one single unique non-null value, plus null values. Now query the two files:
```
select mykey from dfs.tmp.`t5` where mykey = 100;
+--------+
| mykey |
+--------+
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| 100 |
| null |
| null |
| null |
| null |
| null |
+--------+
30 rows selected (0.246 seconds)
```
Apparently, those 5 nulls should not be returned.
I applied the 3 commits in this PR on top of today's master branch.
```
select * from sys.version;
+------------------+-------------------------------------------+-------------------------------------------------------------------------------+----------------------------+-----------------+----------------------------+
| version | commit_id | commit_message | commit_time | build_email | build_time |
+------------------+-------------------------------------------+-------------------------------------------------------------------------------+----------------------------+-----------------+----------------------------+
| 1.11.0-SNAPSHOT | cad6e4dc950aa4a95ad20515ce5abd9c546d3e5d | DRILL-4139: Fix loss of scale value for DECIMAL in parquet partition pruning | 05.07.2017 @ 12:05:25 PDT | jni@apache.org | 05.07.2017 @ 12:06:07 PDT |
+------------------+-------------------------------------------+-----
```
> Fix parquet partition pruning for BIT, INTERVAL and DECIMAL types
> -----------------------------------------------------------------
>
> Key: DRILL-4139
> URL: https://issues.apache.org/jira/browse/DRILL-4139
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.3.0
> Environment: 4 node cluster on CentOS
> Reporter: Khurram Faraaz
> Assignee: Volodymyr Vysotskyi
>
> Exception while trying to prune partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> is seen in drillbit.log after Functional run on 4 node cluster.
> Drill 1.3.0 sys.version => d61bb83a8
> {code}
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning class: org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2
> 2015-11-27 03:12:19,809 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] INFO o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze filter tree: 0 ms
> 2015-11-27 03:12:19,810 [29a835ec-3c02-0fb6-d3c1-bae276ef7385:foreman] WARN o.a.d.e.p.l.partition.PruneScanRule - Exception while trying to prune partition.
> java.lang.UnsupportedOperationException: Unsupported type: BIT
> at org.apache.drill.exec.store.parquet.ParquetGroupScan.populatePruningVector(ParquetGroupScan.java:479) ~[drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.planner.ParquetPartitionDescriptor.populatePartitionVectors(ParquetPartitionDescriptor.java:96) ~[drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.planner.logical.partition.PruneScanRule.doOnMatch(PruneScanRule.java:235) ~[drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2.onMatch(ParquetPruneScanRule.java:87) [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228) [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:808) [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303) [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:303) [calcite-core-1.4.0-drill-r8.jar:1.4.0-drill-r8]
> at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.logicalPlanningVolcanoAndLopt(DefaultSqlHandler.java:545) [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:213) [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:248) [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164) [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:184) [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:905) [drill-java-exec-1.3.0.jar:1.3.0]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:244) [drill-java-exec-1.3.0.jar:1.3.0]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)