You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2016/08/30 22:29:26 UTC

[11/17] drill git commit: update to partition pruning intro to include refresh command for metadata cache file

update to partition pruning intro to include refresh command for metadata cache file


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/2bc38da0
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/2bc38da0
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/2bc38da0

Branch: refs/heads/gh-pages
Commit: 2bc38da0e9ff9159b2337f3285aaaae05e5979aa
Parents: 21c41f5
Author: Bridget Bevens <bb...@maprtech.com>
Authored: Thu Aug 11 12:02:19 2016 -0700
Committer: Bridget Bevens <bb...@maprtech.com>
Committed: Thu Aug 11 12:02:19 2016 -0700

----------------------------------------------------------------------
 .../partition-pruning/010-partition-pruning-introduction.md     | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/2bc38da0/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md b/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
index 315b062..e5f4e5f 100644
--- a/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
+++ b/_docs/performance-tuning/partition-pruning/010-partition-pruning-introduction.md
@@ -1,13 +1,12 @@
 ---
 title: "Partition Pruning Introduction"
-date: 2016-08-08 18:42:19 UTC
+date: 2016-08-11 19:02:20 UTC
 parent: "Partition Pruning"
 --- 
 
 Partition pruning is a performance optimization that limits the number of files and partitions that Drill reads when querying file systems and Hive tables. When you partition data, Drill only reads a subset of the files that reside in a file system or a subset of the partitions in a Hive table when a query matches certain filter criteria.
 
-As of Drill 1.8, partition pruning also applies to the parquet metadata cache. See [Optimizing Parquet Metadata Reading]({{site.baseurl}}/docs/optimizing-parquet-metadata-reading/) to see how to create a parquet metadata cache. When data is partitioned in a directory hierarchy, Drill attempts to read the metadata cache file from a sub-partition, based on matching filter criteria instead of reading from the top level partition, to reduce the amount of metadata read during the query planning time. 
-
+As of Drill 1.8, partition pruning also applies to the Parquet metadata cache. When data is partitioned in a directory hierarchy, Drill attempts to read the metadata cache file from a sub-partition, based on matching filter criteria instead of reading from the top level partition, to reduce the amount of metadata read during the query planning time. If you created a metadata cache file in a previous version of Drill, you must issue the REFRESH TABLE METADATA command to regenerate the metadata cache file before running queries for partition pruning to occur. See [Optimizing Parquet Metadata Reading]({{site.baseurl}}/docs/optimizing-parquet-metadata-reading/) for more information.  
 
 The query planner in Drill performs partition pruning by evaluating the filters. If no partition filters are present, the underlying Scan operator reads all files in all directories and then sends the data to operators, such as Filter, downstream. When partition filters are present, the query planner pushes the filters down to the Scan if possible. The Scan reads only the directories that match the partition filters, thus reducing disk I/O.