You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2019/04/30 22:15:09 UTC

[drill] branch gh-pages updated: edits to stats and metadata cache

This is an automated email from the ASF dual-hosted git repository.

bridgetb pushed a commit to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/drill.git


The following commit(s) were added to refs/heads/gh-pages by this push:
     new b5bd35e  edits to stats and metadata cache
b5bd35e is described below

commit b5bd35edfb27db60a64d7f887100b57e63c5b5f3
Author: Bridget Bevens <bb...@maprtech.com>
AuthorDate: Tue Apr 30 15:14:14 2019 -0700

    edits to stats and metadata cache
---
 _docs/img/histogram.png                            | Bin 0 -> 17523 bytes
 .../sql-commands/009-analyze-table.md              | 293 +++++++++++----------
 .../sql-commands/011-refresh-table-metadata.md     |   7 +-
 3 files changed, 157 insertions(+), 143 deletions(-)

diff --git a/_docs/img/histogram.png b/_docs/img/histogram.png
new file mode 100644
index 0000000..402c1c1
Binary files /dev/null and b/_docs/img/histogram.png differ
diff --git a/_docs/sql-reference/sql-commands/009-analyze-table.md b/_docs/sql-reference/sql-commands/009-analyze-table.md
index 9aa65f9..2f9e3bc 100644
--- a/_docs/sql-reference/sql-commands/009-analyze-table.md
+++ b/_docs/sql-reference/sql-commands/009-analyze-table.md
@@ -1,12 +1,12 @@
 ---
 title: "ANALYZE TABLE"
-date: 2019-04-23
+date: 2019-04-30
 parent: "SQL Commands"
 ---  
 
-Starting in Drill 1.16, Drill supports the ANALYZE TABLE statement. The ANALYZE TABLE statement computes statistics on Parquet data stored in tables and directories. ANALYZE TABLE writes statistics to a JSON file in the `.stats.drill` directory, for example `/user/table1/.stats.drill/0_0.json`. The optimizer in Drill uses these statistics to estimate filter, aggregation, and join cardinalities to create more efficient query plans. 
+Drill 1.16 and later supports the ANALYZE TABLE statement. The ANALYZE TABLE statement computes statistics on Parquet data stored in tables and directories. ANALYZE TABLE writes statistics to a JSON file in the `.stats.drill` directory, for example `/user/table1/.stats.drill/0_0.json`. The optimizer in Drill uses these statistics to estimate filter, aggregation, and join cardinalities to create more efficient query plans. 
 
-You can run the ANALYZE TABLE statement to calculate statistics for tables, columns, and directories with Parquet data; however, Drill will not use the statistics for query planning unless you enable the `planner.statistics.use` option, as shown:  
+You can run the ANALYZE TABLE statement to calculate statistics for tables, columns, and directories with Parquet data; however, Drill will not use the statistics for query planning unless you enable the `planner.statistics.use` option, as shown:
 
 	SET `planner.statistics.use` = true;
 
@@ -49,28 +49,34 @@ If you want to remove statistics for a table (and keep the table), you must remo
 
 	DROP TABLE [IF EXISTS] [workspace.]name/.stats.drill  
 
-If you have already issued the ANALYZE TABLE statement against specific columns, a table, or directory, you must run the DROP TABLE statement with `/.stats.drill` before you can successfully run the ANALYZE TABLE statement against the data source again:  
+If you have already issued the ANALYZE TABLE statement against specific columns, a table, or directory, you must run the DROP TABLE statement with `/.stats.drill` before you can successfully run the ANALYZE TABLE statement against the data source again, for example:
+
+	DROP TABLE `table_stats/Tpch0.01/parquet/customer/.stats.drill`;
 
-	DROP TABLE dfs.samples.`nation1/.stats.drill`;
 
 Note that `/.stats.drill` is the directory to which the JSON file with statistics is written.   
 
 ## Usage Notes  
-- The ANALYZE TABLE statement can compute statistics for Parquet data stored in tables, columns, and directories.  
+
+
+- The ANALYZE TABLE statement can compute statistics for Parquet data stored in tables, columns, and directories within dfs storage plugins only.  
 - The user running the ANALYZE TABLE statement must have read and write permissions on the data source.  
-- The optimizer in Drill computes the following types of statistics for each column: 
+- The optimizer in Drill computes the following types of statistics for each column:  
 	- Rowcount (total number of entries in the table)  
 	- Nonnullrowcount (total number of non-null entries in the table)  
 	- NDV (total distinct values in the table)  
-	- Avgwidth (average width of columns/average number of characters in a column)   
+	- Avgwidth (average width of a column/average number of characters in a column)  
 	- Majortype (data type of the column values)  
-	- Histogram (represents the frequency distribution of values (numeric data) in a column; designed for estimations on data with skewed distribution; sorts data into “buckets” such that each bucket contains the same number of rows determined by ceiling(num_rows/n) where n is the number of buckets; the number of distinct values in each bucket depends on the distribution of the column's values)  
-
-- ANALYZE TABLE can compute statistics on nested scalar columns; however, you must explicitly state the columns, for example:  
+	- Histogram (represents the frequency distribution of values (numeric data) in a column) See Histograms.  
+	- When you look at the statistics file, statistics for each column display in the following format (c_nationkey is used as an example column):  
+	
+			{"column":"`c_nationkey`","majortype":{"type":"INT","mode":"REQUIRED"},"schema":1.0,"rowcount":1500.0,"nonnullrowcount":1500.0,"ndv":25,"avgwidth":4.0,"histogram":{"category":"numeric-equi-depth","numRowsPerBucket":150,"buckets":[0.0,2.0,4.0,7.0,9.0,12.0,15.199999999999978,17.0,19.0,22.0,24.0]}}  
 
-		ANALYZE TABLE employee_table COMPUTE STATISTICS (name.firstname, name.lastname);  
-- ANALYZE TABLE can compute statistics at the root directory level, but not at the partition level.  
-- Drill does not compute statistics for complex types (maps, arrays).   
+- ANALYZE TABLE can compute statistics on nested scalar columns; however, you must explicitly state the columns, for example:    
+		 `ANALYZE TABLE employee_table COMPUTE STATISTICS (name.firstname, name.lastname);`  
+- ANALYZE TABLE can compute statistics at the root directory level, but not at the partition level. 
+Drill does not compute statistics for complex types (maps, arrays).
+ 
 
 ## Related Options
 You can set the following options related to the ANALYZE TABLE statement at the system or session level with the SET (session level) or ALTER SYSTEM SET (system level) statements, or through the Drill Web UI at `http://<drill-hostname-or-ip>:8047/options`:  
@@ -109,10 +115,43 @@ If you use any of these words in a Drill query, you must enclose the word in bac
 
 - After you run the ANALYZE TABLE statement, you can view the profile for ANALYZE in the Drill Web UI. Go to `http://<drill-hostname-or-ip>:8047/profiles`, and click the ANALYZE TABLE statement for which you want to view the profile.  
 - Should you notice any performance issues, you may want to decrease the value of the `planner.slice_target` option.   
-- Generating statistics on large data sets can unnecessarily consume time and resources, such as memory and CPU. ANALYZE TABLE can compute statistics on a sample (subset of the data indicated as a percentage) to limit the amount of resources needed for computation. Drill still scans the entire data set, but only computes on the rows selected for sampling. Rows are randomly selected for the sample. Note that the quality of statistics increases with the sample size.  
+- Generating statistics on large data sets can unnecessarily consume time and resources, such as memory and CPU. ANALYZE TABLE can compute statistics on a sample (subset of the data indicated as a percentage) to limit the amount of resources needed for computation. Drill still scans the entire data set, but only computes on the rows selected for sampling. Rows are randomly selected for the sample. Note that the quality of statistics increases with the sample size.    
+ 
+## Queries that Benefit from Statistics
+Typically, the types of queries that benefit from statistics are those that include:
 
-## Limitations  
+- Grouping  
+- Multi-table joins  
+- Equality predicates on scalar columns   
+- Range predicates (filters) on numeric columns
+  
+## Histograms
+Histograms show the distribution of data to determine if data is skewed or normally distributed. Histogram statistics improve the selectivity estimates used by the optimizer to create the most efficient query plans possible. Histogram statistics are useful for range predicates to help determine how many rows belong to a particular range.   
+ 
+Running the ANALYZE TABLE statement generates equi-depth histogram statistics on each column in a table. Equi-depth histograms distribute distinct column values across buckets of varying widths, with all buckets having approximately the same number of rows. The fixed number of rows per bucket is predetermined by `ceil(number_rows/n)`, where `n` is the number of buckets. The number of distinct values in each bucket depends on the distribution of the values in a column. Equi-depth histogra [...]
+ 
+The following diagram shows the column values on the horizontal axis and the individual frequencies (dark blue) and total frequency of a bucket (light blue). In this example, the total number of rows = 64, hence the number of rows per bucket = `ceil(64/4)  = 16`.  
+
+![](https://i.imgur.com/imchEyg.png)  
+
+The following steps are used to determine bucket boundaries:  
+1. Determine the number of rows per bucket: ceil(N/m) where m = num buckets.  
+2. Sort the data on the column.  
+3. Determine bucket boundaries: The start of bucket 0  = min(column), then continue adding individual frequencies until the row limit is reached, which is the end point of the bucket. Continue to the next bucket and repeat the process. The same column value can potentially be at the end point of one bucket and the start point of the next bucket. Also, the last bucket could have slightly fewer values than other buckets.  
 
+For the predicate `"WHERE a = 5"`, in the example histogram above, you can see that 5 is in the first bucket, which has a range of [1, 7], Using the ‘continuous variable’ nature of histograms, and assuming a uniform distribution within a bucket, we get 16/7 = 2 (approximately).  This is closer to the actual value of 1.
+ 
+Next, consider the range predicate `"WHERE a > 5 AND a <= 16"`.  The range spans part of bucket [1, 7] and entire buckets [8, 9], [10, 11] and [12, 16].  The total estimate = (7-5)/7 * 16 + 16 + 16 + 16 = 53 (approximately).  The actual count is 59.
+
+**Viewing Histogram Statistics for a Column**
+Histogram statistics are generated for each column, as shown:  
+
+qhistogram":{"category":"numeric-equi-depth","numRowsPerBucket":150,"buckets":[0.0,2.0,4.0,7.0,9.0,12.0,15.199999999999978,17.0,19.0,22.0,24.0]
+
+In this example, there are 11 buckets. Each bucket contains 150 rows, which is an approximation of the number of rows (1500)/number of buckets (11). The list of numbers for the “buckets” property indicates value ranges where buckets start and end. Starting from 0, the first number (0.0) denotes the end of the first bucket and the start of the second bucket. The second number (2.0) denotes the end of the second and start of the third bucket, and so on.  
+  
+
+## Limitations  
 
 - Drill does not cache statistics. 
 - ANALYZE TABLE runs only on directory-based Parquet tables. 
@@ -127,162 +166,136 @@ If you use any of these words in a Drill query, you must enclose the word in bac
  
 		//If you encounter this error, run the ANALYZE TABLE statement on each file with null values individually instead of running the statement against all the files at once.  
 
--  Running the ANALYZE TABLE statement against a table with a metadata cache file inadvertently updates the timestamp on the metadata cache file, which automatically triggers the REFRESH TABLE METADATA command.  
+-   Running the ANALYZE TABLE statement creates the stats file, which changes the directory timestamp. The change of the timestamp automatically  triggers the REFRESH TABLE METADATA command, even when the underlying data has not changed.  
 
 ## EXAMPLES  
 
-These examples use a schema, `dfs.samples`, which points to the `/home` directory. The `/home` directory contains a subdirectory, `parquet`, which contains the `nation.parquet` and `region.parquet` files. You can access these Parquet files in the `sample-data` directory of your Drill installation.  
+These examples use a schema, `dfs.drilltestdir`, which points to the `/drill/testdata` directory in the MapR File System. The `/drill/testdata` directory has the following subdirectories: 
 
-	[root@doc23 parquet]# pwd
-	/home/parquet
-	
-	[root@doc23 parquet]# ls
-	nation.parquet  region.parquet  
+    /drill/testdata/table_stats/Tpch0.01/parquet
 
-Change schemas to use `dfs.samples`:
+The `/parquet`directory contains a table named “customer.”
 
-	use dfs.samples;
-	+-------+------------------------------------------+
-	|  ok   |                 summary                  |
-	+-------+------------------------------------------+
-	| true  | Default schema changed to [dfs.samples]  |
-	+-------+------------------------------------------+  
+Switch schema to `dfs.drilltestdir`:
+ 
+	use dfs.drilltestdir;
+	+------+----------------------------------------------+
+	|  ok  |               	summary                	      |
+	+------+----------------------------------------------+
+	| true | Default schema changed to [dfs.drilltestdir] |
+	+------+----------------------------------------------+
+ 
+The following query shows the columns and types of data in the “customer” table:  
 
-### Enabling Statistics for Query Planning 
+	apache drill (dfs.drilltestdir)> select * from `table_stats/Tpch0.01/parquet/customer` limit 2;
+	+-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+-----------------------------------------------------------------+
+	| c_custkey |       c_name       |           c_address            | c_nationkey |     c_phone     | c_acctbal | c_mktsegment |                            c_comment                            |
+	+-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+-----------------------------------------------------------------+
+	| 1         | Customer#000000001 | IVhzIApeRb ot,c,E              | 15          | 25-989-741-2988 | 711.56    | BUILDING     | to the even, regular platelets. regular, ironic epitaphs nag e  |
+	| 2         | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak | 13          | 23-768-687-3665 | 121.65    | AUTOMOBILE   | l accounts. blithely ironic theodolites integrate boldly: caref |
+	+-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+-----------------------------------------------------------------+
 
+ 
+###Enabling Statistics for Query Planning
 You can run the ANALYZE TABLE statement at any time to compute statistics; however, you must enable the following option if you want Drill to use statistics during query planning:
-
+ 
 	set `planner.statistics.use`=true;
-	+-------+----------------------------------+
-	|  ok   |             summary              |
-	+-------+----------------------------------+
-	| true  | planner.statistics.use updated.  |
-	+-------+----------------------------------+  
-
-### Computing Statistics on a Directory 
-
-If you want to compute statistics for all Parquet data in a directory, you can run the ANALYZE TABLE statement against the directory, as shown:
-
-	ANALYZE TABLE `/parquet` COMPUTE STATISTICS;
-	+-----------+----------------------------+
-	| Fragment  | Number of records written  |
-	+-----------+----------------------------+
-	| 0_0       | 4                          |
-	+-----------+----------------------------+
-
-### Computing Statistics on a Table 
-
-You can create a table from the data in the `nation.parquet` file, as shown:
-
-	CREATE TABLE nation1 AS SELECT * from `parquet/nation.parquet`;
-	+-----------+----------------------------+
-	| Fragment  | Number of records written  |
-	+-----------+----------------------------+
-	| 0_0       | 25                         |
-	+-----------+----------------------------+
-
-Drill writes the table to the `/home` directory, which is where the `dfs.samples` workspace points: 
-
-	[root@doc23 home]# ls
-	nation1  parquet  
-
-Changing to the `nation1` directory, you can see that the table is written as a parquet file:  
-
-	[root@doc23 home]# cd nation1
-	[root@doc23 nation1]# ls
-	0_0_0.parquet
-
-You can run the ANALYZE TABLE statement on a subset of columns in the table to generate statistics for those columns only, as shown:
-
-	ANALYZE TABLE dfs.samples.nation1 COMPUTE STATISTICS (N_NATIONKEY, N_REGIONKEY);
-	+-----------+----------------------------+
-	| Fragment  | Number of records written  |
-	+-----------+----------------------------+
-	| 0_0       | 2                          |
-	+-----------+----------------------------+
-
-Or, you can run the ANALYZE TABLE statement on the entire table if you want statistics generated for all columns in the table:
-
-	ANALYZE TABLE dfs.samples.nation1 COMPUTE STATISTICS;
-	+-----------+----------------------------+
-	| Fragment  | Number of records written  |
-	+-----------+----------------------------+
-	| 0_0       | 4                          |
-	+-----------+----------------------------+  
+	+------+---------------------------------+
+	|  ok  |             summary         	 |
+	+------+---------------------------------+
+	| true | planner.statistics.use updated. |
+	+------+---------------------------------+
+ 
+###Computing Statistics
+You can compute statistics on directories with Parquet data or on Parquet tables.
+ 
+You can run the ANALYZE TABLE statement on a subset of columns to generate statistics for those columns only, as shown:
+ 
+	analyze table `table_stats/Tpch0.01/parquet/customer` compute statistics (c_custkey, c_nationkey, c_acctbal);
+	+----------+---------------------------+
+	| Fragment | Number of records written |
+	+----------+---------------------------+
+	| 0_0      | 3                         |
+	+----------+---------------------------+
+ 
+Or, you can run the ANALYZE TABLE statement on the entire table/directory if you want statistics generated for all the columns:
+ 
+	analyze table `table_stats/Tpch0.01/parquet/customer` compute statistics;
+	+----------+---------------------------+
+	| Fragment | Number of records written |
+	+----------+---------------------------+
+	| 0_0      | 8                         |
+	+----------+---------------------------+
 
-### Computing Statistics on a SAMPLE
-You can also run ANALYZE TABLE on a percentage of the data in a table using the SAMPLE command, as shown:
 
-	ANALYZE TABLE dfs.samples.nation1 COMPUTE STATISTICS SAMPLE 50 PERCENT;
-	+-----------+----------------------------+
-	| Fragment  | Number of records written  |
-	+-----------+----------------------------+
-	| 0_0       | 4                          |
-	+-----------+----------------------------+  
+ 
+###Computing Statistics on a SAMPLE
+You can also run ANALYZE TABLE on a percentage of the data using the SAMPLE command, as shown:
+ 
+	ANALYZE TABLE `table_stats/Tpch0.01/parquet/customer` COMPUTE STATISTICS SAMPLE 50 PERCENT;
+	+----------+---------------------------+
+	| Fragment | Number of records written |
+	+----------+---------------------------+
+	| 0_0      | 8                         |
+	+----------+---------------------------+
 
-### Storing Statistics
+ 
+###Storing Statistics
 When you generate statistics, a statistics directory (`.stats.drill`) is created with a JSON file that contains the statistical data.
+ 
+For tables, the `.stats.drill` directory is nested within the table directory. For example, if you ran ANALYZE TABLE against a table named “customer,” you could access the statistic file in `/customer/.stats.drill`. The JSON file is stored in the `.stats.drill` directory.
+ 
+For directories, a new directory is written with the same name as the directory on which you ran ANALYZE TABLE, appended by `.stats.drill`. For example, if you ran ANALYZE TABLE against a directory named “customer,” you could access the JSON statistics file in the new `customer.stats.drill` directory.
+ 
+You can query the statistics file to see the statistics generated for each column, as shown in the following two examples:
 
-For tables, the `.stats.drill` directory is nested within the table directory. For example, if you ran ANALYZE TABLE against a table named “nation1,” you could access the statistic file in:  
-	
-	[root@doc23 home]# cd nation1/.stats.drill
-	[root@doc23 .stats.drill]# ls
-	0_0.json
-
-For directories, a new directory is written with the same name as the directory on which you ran ANALYZE TABLE and appended by `.stats.drill`. For example, if you ran ANALYZE TABLE against a directory named “parquet,” you could access the statistic file in:
-
-	[root@doc23 home]# cd parquet.stats.drill
-	[root@doc23 parquet.stats.drill]# ls
-	0_0.json
-
-You can query the statistics file, as shown in the following two examples:
-
-	SELECT * FROM dfs.samples.`parquet.stats.drill`;
+ 
+	select * from `table_stats/Tpch0.01/parquet/customer/.stats.drill`;
 	+--------------------+----------------------------------------------------------------------------------+
 	| statistics_version |                                   directories                                    |
 	+--------------------+----------------------------------------------------------------------------------+
-	| v1                 | [{"computed":"2019-04-23","columns":[{"column":"`R_REGIONKEY`","majortype":{"type":"BIGINT","mode":"REQUIRED"},"schema":1.0,"rowcount":5.0,"nonnullrowcount":5.0,"ndv":5,"avgwidth":8.0,"histogram":{"category":"numeric-equi-depth","numRowsPerBucket":1,"buckets":[1.0,0.0,0.0,2.9999999999999996,2.0,4.0]}},{"column":"`R_NAME`","majortype":{"type":"VARCHAR","mode":"REQUIRED"},"schema":1.0,"rowcount":5.0,"nonnullrowcount":5.0,"ndv":5,"avgwidth":6.8,"histogram":{"buckets" [...]
-	+--------------------+----------------------------------------------------------------------------------+
+	| v1                 | [{"computed":"2019-04-30","columns":[{"column":"`c_custkey`","majortype":{"type":"INT","mode":"REQUIRED"},"schema":1.0,"rowcount":1500.0,"nonnullrowcount":1500.0,"ndv":1500,"avgwidth":4.0,"histogram":{"category":"numeric-equi-depth","numRowsPerBucket":150,"buckets":[2.0,149.0,299.0,450.99999999999994,599.0,749.0,900.9999999999999,1049.0,1199.0,1349.0,1500.0]}},{"column":"`c_name`","majortype":{"type":"VARCHAR","mode":"REQUIRED"},"schema":1.0,"rowcount":1500.0,"non [...]
+	+--------------------+--------------------------------------------------------------------------------------+  
 
+	SELECT t.directories.columns[0].ndv as ndv, t.directories.columns[0].rowcount as rc, t.directories.columns[0].nonnullrowcount AS nnrc, t.directories.columns[0].histogram as histogram FROM `table_stats/Tpch0.01/parquet/customer/.stats.drill` t;
+	+------+--------+--------+----------------------------------------------------------------------------------+
+	| ndv  |   rc   |  nnrc  |                                    histogram                                     |
+	+------+--------+--------+----------------------------------------------------------------------------------+
+	| 1500 | 1500.0 | 1500.0 | {"category":"numeric-equi-depth","numRowsPerBucket":150,"buckets":[2.0,149.0,299.0,450.99999999999994,599.0,749.0,900.9999999999999,1049.0,1199.0,1349.0,1500.0]}             |
+	+------+--------+--------+----------------------------------------------------------------------------------+
 
 
-	SELECT t.directories.columns[0].ndv as ndv, t.directories.columns[0].rowcount as rc, t.directories.columns[0].non                                                                                               nullrowcount AS nnrc, t.directories.columns[0].histogram as histogram FROM dfs.samples.`parquet.stats.drill` t;
-	+-----+-----+------+----------------------------------------------------------------------------------+
-	| ndv | rc  | nnrc |                                    histogram                                     |
-	+-----+-----+------+----------------------------------------------------------------------------------+
-	| 5   | 5.0 | 5.0  | {"category":"numeric-equi-depth","numRowsPerBucket":1,"buckets":[1.0,0.0,0.0,2.9999999999999996,2.0,4.0]} |
-	+-----+-----+------+----------------------------------------------------------------------------------+  
 
-### Dropping Statistics 
 
+###Dropping Statistics
 If you want to compute statistics on a table or directory that you have already run the ANALYZE TABLE statement against, you must first drop the statistics before you can run ANALYZE TABLE statement on the table again.
-
+ 
 The following example demonstrates how to drop statistics on a table:
+ 
+	DROP TABLE `table_stats/Tpch0.01/parquet/customer/.stats.drill`;
+	+------+--------------------------------------------------------------------+
+	|  ok  |                              summary                               |
+	+------+--------------------------------------------------------------------+
+	| true | Table [table_stats/Tpch0.01/parquet/customer/.stats.drill] dropped |
+	+------+--------------------------------------------------------------------+
 
-	DROP TABLE dfs.samples.`parquet/.stats.drill`;
-	+-------+-------------------------------------+
-	|  ok   |               summary               |
-	+-------+-------------------------------------+
-	| true  | Table [parquet/.stats.drill] dropped  |
-	+-------+-------------------------------------+
-
-The following example demonstrates how to drop statistics on a directory:
 
-	DROP TABLE dfs.samples.`/parquet.stats.drill`;
+The following example demonstrates how to drop statistics on a directory, assuming that “customer” is a directory that contains Parquet files:
+ 
+	DROP TABLE `table_stats/Tpch0.01/parquet/customer.stats.drill`;
 	+-------+------------------------------------+
-	|  ok   |              summary               |
+	|  ok   | 	         summary     	         |
 	+-------+------------------------------------+
-	| true  | Table [parquet.stats.drill] dropped  |
+	| true  | Table [customer.stats.drill] dropped|
 	+-------+------------------------------------+
+ 
+When you drop statistics, the statistics directory no longer exists for the table:
 
-When you drop statistics, the statistics directory no longer exists for the table: 
+	select * from `table_stats/Tpch0.01/parquet/customer/.stats.drill`;
+
+	Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 66: Object 'table_stats/Tpch0.01/parquet/customer/.stats.drill' not found  
+	[Error Id: 886003ca-c64f-4e7d-b4c5-26ee1ca617b8 ] (state=,code=0)
 
-	[root@doc23 home]# cd parquet/.stats.drill
-	-bash: cd: parquet/.stats.drill: No such file or directory
-	
-	SELECT * FROM dfs.samples.`parquet/.stats.drill`;
-	Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 17: Object 'parquet/.stats.drill' not found within 'dfs.samples'
-	[Error Id: 0b9a0c35-f058-4e0a-91d5-034d095393d7 on doc23.lab:31010] (state=,code=0)  
 
 ## Troubleshooting  
 
diff --git a/_docs/sql-reference/sql-commands/011-refresh-table-metadata.md b/_docs/sql-reference/sql-commands/011-refresh-table-metadata.md
index 2af0d0b..3e71ebc 100644
--- a/_docs/sql-reference/sql-commands/011-refresh-table-metadata.md
+++ b/_docs/sql-reference/sql-commands/011-refresh-table-metadata.md
@@ -1,6 +1,6 @@
 ---
 title: "REFRESH TABLE METADATA"
-date: 2019-04-29
+date: 2019-04-30
 parent: "SQL Commands"
 ---
 Run the REFRESH TABLE METADATA command on Parquet tables and directories to generate a metadata cache file. REFRESH TABLE METADATA collects metadata from the footers of Parquet files and writes the metadata to a metadata file (`.drill.parquet_file_metadata.v4`) and a summary file (`.drill.parquet_summary_metadata.v4`). The planner uses the metadata cache file to prune extraneous data during the query planning phase. Run the REFRESH TABLE METADATA command if planning time is a significant [...]
@@ -34,8 +34,9 @@ Run the [EXPLAIN]({{site.baseurl}}/docs/explain/) command to determine the query
 ## Usage Notes  
 
 ### Metadata Storage  
-- Drill traverses directories for Parquet files and gathers the metadata from the footer of the files. Drill stores the collected metadata in a metadata cache file, `.drill.parquet_file_metadata.v4`, a summary file, `.drill.parquet_summary_metadata.v4`, and a directories file, `.drill.parquet_metadata_directories` file at each directory level.     
-- The metadata cache file stores metadata for files in that directory, as well as the metadata for the files in the subdirectories.  
+-  Drill traverses directories for Parquet files and gathers the metadata from the footer of the files. Drill stores the collected metadata in a metadata cache file, `.drill.parquet_file_metadata.v4`, a summary file, `.drill.parquet_summary_metadata.v4`, and a directories file, `.drill.parquet_metadata_directories` file at each directory level.  
+-  Introduced in Drill 1.16, the summary file, `.drill.parquet_summary_metadata.v4`, optimizes planning for certain queries, like COUNT(*) queries, such that the planner can use the summary file instead of the larger metadata cache file.       
+- The metadata cache file stores metadata for files in the current directory, as well as the metadata for the files in subdirectories.  
 - For each row group in a Parquet file, the metadata cache file stores the column names in the row group and the column statistics, such as the min/max values and null count.  
 - If the Parquet data is updated, for example data is added to a file, Drill automatically  refreshes the Parquet metadata when you issue the next query against the Parquet data.