You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Adam Gilmore <dr...@gmail.com> on 2015/07/15 06:41:57 UTC

Re: Review Request 33836: DRILL-1950: Parquet pushdown filtering

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33836/
-----------------------------------------------------------

(Updated July 15, 2015, 4:41 a.m.)


Review request for drill and Jacques Nadeau.


Repository: drill-git


Description
-------

An implementation of Parquet pushdown filtering for Drill.  More details can be found in the JIRA item (DRILL-1950).


Diffs (updated)
-----

  contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseStoragePlugin.java 7737f69 
  contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveStoragePlugin.java fb827cc 
  contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoStoragePlugin.java 093df57 
  exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 140e9a8 
  exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java 6bf1280 
  exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java 2d1bac2 
  exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java 2d41740 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/AbstractStoragePlugin.java 58c8622 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePlugin.java b60c16f 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistry.java 80a0876 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java 4ae0cc8 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FormatPlugin.java 14f1441 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java 3c2b806 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaStoragePlugin.java 597d24c 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/EmptyRowGroupScan.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/EmptyScanBatchCreator.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetCompareFunctionProcessor.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFilterBuilder.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java 56a1f00 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java 845bce9 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java PRE-CREATION 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRowGroupScan.java 987f792 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java 441a707 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java 4e7d628 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetRecordMaterializer.java a80eb57 
  exec/java-exec/src/main/java/parquet/hadoop/FilterPredicateSerializer.java PRE-CREATION 
  exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestFilterPushdown.java PRE-CREATION 
  exec/java-exec/src/test/java/parquet/hadoop/TestFilterPredicateSerializer.java PRE-CREATION 
  exec/java-exec/src/test/resources/parquet/pushdown/0_0_0.parquet PRE-CREATION 
  exec/java-exec/src/test/resources/parquet/pushdown/0_0_1.parquet PRE-CREATION 
  protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java e76d748 

Diff: https://reviews.apache.org/r/33836/diff/


Testing
-------

I have created a number of test cases to test that the filter is correctly pushed down in various scenarios.  This also ensures that the pushdown filter is correctly working as it must run through the row group filtering to estimate fewer rows scanned and for the optimizer to pick that as a better plan.


Thanks,

Adam Gilmore