You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2016/08/30 22:29:24 UTC

[09/17] drill git commit: Update partition pruning intro for 1.8 - pp on parquet metadata cache

Update partition pruning intro for 1.8 - pp on parquet metadata cache


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/bb118573
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/bb118573
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/bb118573

Branch: refs/heads/gh-pages
Commit: bb11857308c2b0e05d758288973bf63fa65b73ea
Parents: 2347455
Author: Bridget Bevens <bb...@maprtech.com>
Authored: Mon Aug 8 11:42:18 2016 -0700
Committer: Bridget Bevens <bb...@maprtech.com>
Committed: Mon Aug 8 11:42:18 2016 -0700

----------------------------------------------------------------------
 .../010-partition-pruning-introduction.md       | 47 +++++++++++---------
 1 file changed, 25 insertions(+), 22 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/bb118573/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md b/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
index 20e314b..315b062 100644
--- a/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
+++ b/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
@@ -1,22 +1,25 @@
----
-title: "Partition Pruning Introduction"
-date:  
-parent: "Partition Pruning"
---- 
-
-Partition pruning is a performance optimization that limits the number of files and partitions that Drill reads when querying file systems and Hive tables. When you partition data, Drill only reads a subset of the files that reside in a file system or a subset of the partitions in a Hive table when a query matches certain filter criteria.
-
-The query planner in Drill performs partition pruning by evaluating the filters. If no partition filters are present, the underlying Scan operator reads all files in all directories and then sends the data to operators, such as Filter, downstream. When partition filters are present, the query planner pushes the filters down to the Scan if possible. The Scan reads only the directories that match the partition filters, thus reducing disk I/O.
-
-## Using Partitioned Drill Data
-Before using Parquet data created by Drill 1.2 or earlier in later releases, you need to migrate the data. Migrate Parquet data as described in ["Migrating Parquet Data"]({{site.baseurl}}/docs/migrating-parquet-data/). 
-
-{% include startimportant.html %}Migrate only Parquet files that Drill generated.{% include endimportant.html %}
-
-## Partitioning Data
-In early versions of Drill, partition pruning involved time-consuming manual setup tasks. Using the PARTITION BY clause in the CTAS command simplifies the process.
-
-
-
-
-
+---
+title: "Partition Pruning Introduction"
+date: 2016-08-08 18:42:19 UTC
+parent: "Partition Pruning"
+--- 
+
+Partition pruning is a performance optimization that limits the number of files and partitions that Drill reads when querying file systems and Hive tables. When you partition data, Drill only reads a subset of the files that reside in a file system or a subset of the partitions in a Hive table when a query matches certain filter criteria.
+
+As of Drill 1.8, partition pruning also applies to the parquet metadata cache. See [Optimizing Parquet Metadata Reading]({{site.baseurl}}/docs/optimizing-parquet-metadata-reading/) to see how to create a parquet metadata cache. When data is partitioned in a directory hierarchy, Drill attempts to read the metadata cache file from a sub-partition, based on matching filter criteria instead of reading from the top level partition, to reduce the amount of metadata read during the query planning time. 
+
+
+The query planner in Drill performs partition pruning by evaluating the filters. If no partition filters are present, the underlying Scan operator reads all files in all directories and then sends the data to operators, such as Filter, downstream. When partition filters are present, the query planner pushes the filters down to the Scan if possible. The Scan reads only the directories that match the partition filters, thus reducing disk I/O.
+
+## Using Partitioned Drill Data
+Before using Parquet data created by Drill 1.2 or earlier in later releases, you need to migrate the data. Migrate Parquet data as described in ["Migrating Parquet Data"]({{site.baseurl}}/docs/migrating-parquet-data/). 
+
+{% include startimportant.html %}Migrate only Parquet files that Drill generated.{% include endimportant.html %}
+
+## Partitioning Data
+In early versions of Drill, partition pruning involved time-consuming manual setup tasks. Using the PARTITION BY clause in the CTAS command simplifies the process.
+
+
+
+
+