You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2018/12/14 22:01:59 UTC

[drill-site] branch asf-site updated: team update-DRILL-6744 edits

This is an automated email from the ASF dual-hosted git repository.

bridgetb pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/drill-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new d70c850  team update-DRILL-6744 edits
d70c850 is described below

commit d70c850599f5cc6401d8a98ae84d75e0c6635ed6
Author: Bridget Bevens <bb...@maprtech.com>
AuthorDate: Fri Dec 14 14:01:44 2018 -0800

    team update-DRILL-6744 edits
---
 docs/parquet-filter-pushdown/index.html |  85 ++++++++++++---
 feed.xml                                |   4 +-
 team/index.html                         | 188 ++++++++++++++++++++------------
 3 files changed, 190 insertions(+), 87 deletions(-)

diff --git a/docs/parquet-filter-pushdown/index.html b/docs/parquet-filter-pushdown/index.html
index 9465889..85ec38e 100644
--- a/docs/parquet-filter-pushdown/index.html
+++ b/docs/parquet-filter-pushdown/index.html
@@ -1268,7 +1268,7 @@
 
     </div>
 
-     Sep 28, 2018
+     Dec 14, 2018
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
@@ -1276,9 +1276,10 @@
       
         <p>Drill 1.9 introduces the Parquet filter pushdown option. Parquet filter pushdown is a performance optimization that prunes extraneous data from a Parquet file to reduce the amount of data that Drill scans and reads when a query on a Parquet file contains a filter expression. Pruning data reduces the I/O, CPU, and network overhead to optimize Drill’s performance.</p>
 
-<p>Parquet filter pushdown is enabled by default. When a query contains a filter expression, you can run the <a href="/docs/explain/">EXPLAIN PLAN command</a> to see if Drill applies Parquet filter pushdown to the query. You can enable and disable this feature using the <a href="/docs/alter-system/">ALTER SYSTEM|SESSION SET</a> command with the <code>planner.store.parquet.rowgroup.filter.pushdown</code> option.  </p>
-
-<p>As of Drill 1.13, the query planner in Drill can apply project push down, filter push down, and partition pruning to star queries in common table expressions (CTEs), views, and subqueries, for example:  </p>
+<p>Parquet filter pushdown is enabled by default. When a query contains a filter expression, you can run the <a href="/docs/explain/">EXPLAIN PLAN command</a> to see if Drill applies Parquet filter pushdown to the query. You can enable and disable this feature through the <code>planner.store.parquet.rowgroup.filter.pushdown</code> option, as shown:   </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SET `planner.store.parquet.rowgroup.filter.pushdown`=&#39;false&#39;   
+</code></pre></div>
+<p>Starting in Drill 1.13, the query planner in Drill can apply project push down, filter push down, and partition pruning to star queries in common table expressions (CTEs), views, and subqueries, for example:  </p>
 <div class="highlight"><pre><code class="language-text" data-lang="text">   select col1 from (select * from t)  
 </code></pre></div>
 <p>When a CTE, view, or subquery contains a star filter condition, the query planner in Drill can apply the filter and prune extraneous data, further reducing the amount of data that the scanner reads and improving performance. </p>
@@ -1293,15 +1294,64 @@
 
 <p>The query planner looks at the minimum and maximum values in each row group for an intersection. If no intersection exists, the planner can prune the row group in the table. If the minimum and maximum value range is too large, Drill does not apply Parquet filter pushdown. The query planner can typically prune more data when the tables in the Parquet file are sorted by row groups.  </p>
 
+<h2 id="parquet-filter-pushdown-for-varchar-and-decimal-data-types">Parquet Filter Pushdown for VARCHAR and DECIMAL Data Types</h2>
+
+<p>Starting in Drill 1.15, Drill supports Parquet filter pushdown for the VARCHAR and DECIMAL data types. Drill uses binary statistics in the Parquet file or Drill metadata file to push filters on VARCHAR and DECIMAL data types down to the data source.  </p>
+
+<h3 id="parquet-generated-files">Parquet Generated Files</h3>
+
+<p>By default, Parquet filter pushdown works for VARCHAR and DECIMAL data types if the Parquet files were created with Parquet version 1.10.0 or later. Drill 1.13 and later uses Parquet 1.10.0 to write and read back Parquet files. </p>
+
+<p>If Parquet files were created with a pre-1.10.0 version of Parquet, and the data in the binary columns is in ASCII format (not UTC-8), enable the <code>store.parquet.reader.strings_signed_min_max</code> option, which allows Drill to use binary statistics in older Parquet files.  </p>
+
+<p><strong>Note:</strong> DECIMAL filter pushdown only works for Parquet files created by Parquet 1.10.0 or later due to issue <a href="https://issues.apache.org/jira/browse/PARQUET-1322">PARQUET-1322</a>.  </p>
+
+<h3 id="parquet-files-created-by-hive">Parquet Files Created by Hive</h3>
+
+<p>In Hive 2.3, Parquet files are created by a pre-1.10.0 version of Parquet. If the data in the binary columns is in ASCII format, you can enable the <code>store.parquet.reader.strings_signed_min_max</code> option to enable pushdown support for VARCHAR data types. DECIMAL filter pushdown is not supported.  </p>
+
+<h3 id="drill-generated-metadata-files">Drill Generated Metadata Files</h3>
+
+<p>Parquet filter pushdown for DECIMAL and VARCHAR data types may not work correctly on Drill metadata files that were generated prior to Drill 1.15. Regenerate all Drill metadata files using Drill 1.15 or later to ensure that Parquet filter pushdown on VARCHAR and DECIMAL data types works correctly on Drill generated metadata files.</p>
+
+<p>If the <code>store.parquet.reader.strings_signed_min_max</code> option is not enabled during regeneration, the minimum and maximum values for the binary data will not be written. When the binary data is in ASCII format, enabling the <code>store.parquet.reader.strings_signed_min_max</code> option during regeneration ensures that the minimum and maximum values are written and thus read back and used during filter pushdown.  </p>
+
+<h3 id="enabling-statistics-use-for-pre-1-10-0-parquet-files">Enabling Statistics Use for Pre-1.10.0 Parquet Files</h3>
+
+<p>If Parquet files were created with a pre-1.10.0 version of Parquet, and the data in binary columns is in ASCII format (not UTF-8), you can enable Drill to use the statistics for Parquet filter pushdown on VARCHAR and DECIMAL data types.</p>
+
+<p>You can use either of the following methods to enable this functionality in Drill:  </p>
+
+<ul>
+<li><p>In the <code>parquet</code> format plugin configuration, add the <code>enableStringsSignedMinMax</code> option, and set the option to <code>true</code>, as shown:  </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">&quot;parquet&quot; : {
+       type: &quot;parquet&quot;,
+       enableStringsSignedMinMax: true
+    }  
+</code></pre></div>
+<p>This configuration applies to all Parquet files in the <code>parquet</code> format plugin to which this storage plugin points, including the configured workspaces.</p></li>
+<li><p>From the command line, enable the <code>store.parquet.reader.strings_signed_min_max</code> option at the session or system level, as shown:  </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">SET `store.parquet.reader.strings_signed_min_max`=&#39;true&#39;;
+ALTER SYSTEM SET `store.parquet.reader.strings_signed_min_max`=&#39;true&#39;;  
+</code></pre></div>
+<p><strong>Note:</strong>  </p>
+
+<ul>
+<li>The <code>store.parquet.reader.strings_signed_min_max</code> option allows three values: <code>&#39;true&#39;</code>, <code>&#39;false&#39;</code>, <code>&#39;&#39;</code>(empty string). By default, the value is an empty string.<br></li>
+<li>Setting this option at the system level applies to all Parquet files in the system. Alternatively, you can set this option in the Drill Web UI. Options in the Drill Web UI are set at the system level.<br></li>
+<li>When set at the session level, the setting takes precedence over the setting in the parquet format plugin and overrides the system level setting.<br></li>
+</ul></li>
+</ul>
+
 <h2 id="using-parquet-filter-pushdown">Using Parquet Filter Pushdown</h2>
 
 <p>Currently, Parquet filter pushdown only supports filters that reference columns from a single table (local filters). Parquet filter pushdown requires the minimum and maximum values in the Parquet file metadata. All Parquet files created in Drill using the CTAS statement contain the necessary metadata. If your Parquet files were created using another tool, you may need to use Drill to read and rewrite the files using the <a href="/docs/create-table-as-ctas/">CTAS command</a>.</p>
 
-<p>Parquet filter pushdown works best if you presort the data. You do not have to sort the entire data set at once. You can sort a subset of the data set, sort another subset, and so on. </p>
+<p>Parquet filter pushdown works best if you presort the data. You do not have to sort the entire data set at once. You can sort a subset of the data set, sort another subset, and so on.   </p>
 
 <h3 id="configuring-parquet-filter-pushdown">Configuring Parquet Filter Pushdown</h3>
 
-<p>Use the <a href="/docs/alter-system/">ALTER SYSTEM|SESSION SET</a> command with the Parquet filter pushdown options to enable or disable the feature, and set the number of row groups for a table.  </p>
+<p>Use the <a href="/docs/alter-system/">ALTER SYSTEM</a> or <a href="/docs/set/">SET</a> command with the Parquet filter pushdown options to enable or disable the related features.  </p>
 
 <p>The following table lists the Parquet filter pushdown options with their descriptions and default values:  </p>
 
@@ -1313,21 +1363,22 @@
 </tr>
 </thead><tbody>
 <tr>
-<td>&quot;planner.store.parquet.rowgroup.filter.pushdown&quot;</td>
-<td>Turns the Parquet filter pushdown feature on or   off.</td>
+<td>planner.store.parquet.rowgroup.filter.pushdown</td>
+<td>Turns   the Parquet filter pushdown feature on or off.</td>
 <td>TRUE</td>
 </tr>
 <tr>
-<td>&quot;planner.store.parquet.rowgroup.filter.pushdown.threshold&quot;</td>
-<td>Sets the number of row groups that a table can   have. You can increase the threshold if the filter can prune many row groups.   However, if this setting is too high, the filter evaluation overhead   increases. Base this setting on the data set. Reduce this setting if the   planning time is significant, or you do not see any benefit at runtime.</td>
+<td>planner.store.parquet.rowgroup.filter.pushdown.threshold</td>
+<td>Sets   the number of row groups that a table can have. You can increase the   threshold if the filter can prune many row groups. However, if this setting   is too high, the filter evaluation overhead increases. Base this setting on   the data set. Reduce this setting if the planning time is significant, or you   do not see any benefit at runtime.</td>
 <td>10,000</td>
 </tr>
+<tr>
+<td>store.parquet.reader.strings_signed_min_max</td>
+<td>Allows binary statistics usage   for Parquet files created with a pre-1.10.0 version of Parquet. Files created   pre-1.10.0 have incorrectly calculated statistics for UTF-8 data. If you know   that data in the binary columns is in ASCII (not UTF-8), setting this option   to &#39;true&#39; enables statistics usage for VARCHAR and DECIMAL data types.   Default is unset; empty string. Allowed values are &#39;true&#39;, &#39;false&#39;, &#39;&#39; (empty   string).</td>
+<td>&#39;&#39;(empty string)</td>
+</tr>
 </tbody></table>
 
-<h3 id="viewing-the-query-plan">Viewing the Query Plan</h3>
-
-<p>Because Drill applies Parquet filter pushdown during the query planning phase, you can view the query execution plan to see if Drill pushes down the filter when a query on a Parquet file contains a filter expression. You can run the <a href="/docs/explain/">EXPLAIN PLAN command</a> to see the execution plan for the query, as shown in the following example.</p>
-
 <p><strong>Example</strong>  </p>
 
 <p>Starting in Drill 1.14, Drill supports the planner rule, JoinPushTransitivePredicatesRule, which enables Drill to infer filter conditions for join queries and push the filter conditions down to the data source. </p>
@@ -1349,7 +1400,7 @@
 
 <p>The following table lists the supported and unsupported clauses, operators, data types, function, and scenarios for Parquet filter pushdown:  </p>
 
-<p><strong>Note:</strong> <sup>1</sup> indicates support as of Drill 1.13. <sup>2</sup> indicates support as of Drill 1.14.  </p>
+<p><strong>Note:</strong> <sup>1</sup> indicates support as of Drill 1.13. <sup>2</sup> indicates support as of Drill 1.14. <sup>3</sup> indicates support as of Drill 1.15.  </p>
 
 <table><thead>
 <tr>
@@ -1375,8 +1426,8 @@
 </tr>
 <tr>
 <td>Data Types</td>
-<td>INT,   BIGINT, FLOAT, DOUBLE, DATE, TIMESTAMP, TIME, <sup>1</sup>BOOLEAN (true, false)</td>
-<td>CHAR,   VARCHAR columns, Hive TIMESTAMP</td>
+<td>INT,   BIGINT, FLOAT, DOUBLE, DATE, TIMESTAMP, TIME, <sup>1</sup>BOOLEAN (true, false), <sup>3</sup>VARCHAR and DECIMAL columns</td>
+<td>CHAR,   Hive TIMESTAMP</td>
 </tr>
 <tr>
 <td>Function</td>
diff --git a/feed.xml b/feed.xml
index 95b9f29..24914ae 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Tue, 11 Dec 2018 13:49:25 -0800</pubDate>
-    <lastBuildDate>Tue, 11 Dec 2018 13:49:25 -0800</lastBuildDate>
+    <pubDate>Fri, 14 Dec 2018 13:58:54 -0800</pubDate>
+    <lastBuildDate>Fri, 14 Dec 2018 13:58:54 -0800</lastBuildDate>
     <generator>Jekyll v2.5.2</generator>
     
       <item>
diff --git a/team/index.html b/team/index.html
index c7456fd..59846ab 100644
--- a/team/index.html
+++ b/team/index.html
@@ -128,158 +128,210 @@
 
 <table><thead>
 <tr>
-<th>Name</th>
-<th>Alias (email is &lt;alias&gt;@apache.org)</th>
+<th><strong>Name</strong></th>
+<th><strong>Alias (email is <alias>@apache.org)</strong></th>
 </tr>
 </thead><tbody>
 <tr>
-<td>Jacques Nadeau</td>
-<td>jacques</td>
+<td>Abdel Hakim Deneche</td>
+<td>adeneche</td>
 </tr>
 <tr>
-<td>Tomer Shiran</td>
-<td>tshiran</td>
+<td>Aditya Kishore</td>
+<td>adi</td>
 </tr>
 <tr>
-<td>Ted Dunning</td>
-<td>tdunning</td>
+<td>Abhishek Girish</td>
+<td>agirish</td>
 </tr>
 <tr>
-<td>Jason Frantz</td>
-<td>jason</td>
+<td>AnilKumar B</td>
+<td>akumarb2010</td>
 </tr>
 <tr>
-<td>MC Srivas</td>
-<td>srivas</td>
+<td>Aman Sinha</td>
+<td>amansinha</td>
 </tr>
 <tr>
-<td>Julian Hyde</td>
-<td>jhyde</td>
+<td>Arina Ielchiieva</td>
+<td>arina</td>
 </tr>
 <tr>
-<td>Tim Chen</td>
-<td>tnachen</td>
+<td>Boaz Ben-Zvi</td>
+<td>boaz</td>
 </tr>
 <tr>
-<td>Mehant Baid</td>
-<td>mehant</td>
+<td>Bridget Bevens</td>
+<td>bridgetb</td>
 </tr>
 <tr>
-<td>Jinfeng Ni</td>
-<td>jni</td>
+<td>Kamesh Bhallamudi</td>
+<td>bvskamesh</td>
 </tr>
 <tr>
-<td>Venki Korukanti</td>
-<td>venki</td>
+<td>Charles Givre</td>
+<td>cgivre</td>
 </tr>
 <tr>
-<td>Jason Altekruse</td>
-<td>json</td>
+<td>Chunhui Shi</td>
+<td>cshi</td>
 </tr>
 <tr>
-<td>Aditya Kishore</td>
-<td>adi</td>
+<td>Chris Wensel</td>
+<td>cwensel</td>
 </tr>
 <tr>
-<td>Parth Chandra</td>
-<td>parthc</td>
+<td>Chris Westin</td>
+<td>cwestin</td>
 </tr>
 <tr>
-<td>Aman Sinha</td>
-<td>amansinha</td>
+<td>Ellen Friedman</td>
+<td>ellenf</td>
 </tr>
 <tr>
-<td>Steven Phillips</td>
-<td>smp</td>
+<td>German Shegalov</td>
+<td>gera</td>
 </tr>
 <tr>
-<td>Bridget Bevens</td>
-<td>bridgetb</td>
+<td>Gautam Parai</td>
+<td>gparai</td>
+</tr>
+<tr>
+<td>Grant Ingersoll</td>
+<td>gsingers</td>
 </tr>
 <tr>
 <td>Hanifi Gunes</td>
 <td>hg</td>
 </tr>
 <tr>
-<td>Abdelhakim Deneche</td>
-<td>adeneche</td>
+<td>Hanumath Rao Maduri</td>
+<td>hmaduri</td>
 </tr>
 <tr>
-<td>Sudheesh Katkam</td>
-<td>sudheesh</td>
+<td>Hsuan-Yi Chu</td>
+<td>hsuanyichu</td>
 </tr>
 <tr>
-<td>Ellen Friedman</td>
-<td>ellenf</td>
+<td>Isabel Drost-Fromm</td>
+<td>isabel</td>
 </tr>
 <tr>
-<td>Kris Hahn</td>
-<td>krishahn</td>
+<td>Jacques Nadeau</td>
+<td>jacques</td>
 </tr>
 <tr>
-<td>Neeraja Rentachintala</td>
-<td>neerajar</td>
+<td>Jason Frantz</td>
+<td>jason</td>
 </tr>
 <tr>
-<td>Chris Westin</td>
-<td>cwestin</td>
+<td>Julian Hyde</td>
+<td>jhyde</td>
 </tr>
 <tr>
-<td>Abhishek Girish</td>
-<td>agirish</td>
+<td>Jinfeng Ni</td>
+<td>jni</td>
 </tr>
 <tr>
-<td>Rahul Challapalli</td>
-<td>rkins</td>
+<td>Jason Altekruse</td>
+<td>json</td>
 </tr>
 <tr>
-<td>Arina Ielchiieva</td>
-<td>arina</td>
+<td>Karthikeyan Manivannan</td>
+<td>karthikm</td>
 </tr>
 <tr>
-<td>Paul Rogers</td>
-<td>progers</td>
+<td>Keys Botzum</td>
+<td>kbotzum</td>
+</tr>
+<tr>
+<td>Kris Hahn</td>
+<td>krishahn</td>
+</tr>
+<tr>
+<td>Kunal Khatua</td>
+<td>kunal</td>
 </tr>
 <tr>
 <td>Laurent Goujon</td>
 <td>laurent</td>
 </tr>
 <tr>
-<td>Charles Givre</td>
-<td>cgivre</td>
+<td>Mehant Baid</td>
+<td>mehant</td>
 </tr>
 <tr>
-<td>Boaz Ben-Zvi</td>
-<td>boaz</td>
+<td>Neeraja Rentachintala</td>
+<td>neerajar</td>
 </tr>
 <tr>
-<td>Anil Kumar Batchu</td>
-<td>akumarb2010</td>
+<td>Parth Chandra</td>
+<td>parthc</td>
 </tr>
 <tr>
-<td>Vitalii Diravka</td>
-<td>vitalii</td>
+<td>Padma Penumarthy</td>
+<td>ppadma</td>
 </tr>
 <tr>
-<td>Kamesh Bhallamudi</td>
-<td>kameshb</td>
+<td>Paul Rogers</td>
+<td>progers</td>
 </tr>
 <tr>
-<td>Kunal Khatua</td>
-<td>kunal</td>
+<td>Ryan Rawson</td>
+<td>rawson</td>
 </tr>
 <tr>
-<td>Volodymyr Vysotskyi</td>
-<td>volodymyr</td>
+<td>Rahul Kumar Challapalli</td>
+<td>rkins</td>
+</tr>
+<tr>
+<td>Steven Phillips</td>
+<td>smp</td>
 </tr>
 <tr>
 <td>Sorabh Hamirwasia</td>
 <td>sorabh</td>
 </tr>
 <tr>
+<td>Srivas</td>
+<td>srivas</td>
+</tr>
+<tr>
+<td>Sudheesh Katkam</td>
+<td>sudheesh</td>
+</tr>
+<tr>
+<td>Ted Dunning</td>
+<td>tdunning</td>
+</tr>
+<tr>
 <td>Timothy Farkas</td>
 <td>timothyfarkas</td>
 </tr>
+<tr>
+<td>Timothy Chen</td>
+<td>tnachen</td>
+</tr>
+<tr>
+<td>Tomer Shiran</td>
+<td>tshiran</td>
+</tr>
+<tr>
+<td>Venki Korukanti</td>
+<td>venki</td>
+</tr>
+<tr>
+<td>Vitalii Diravka</td>
+<td>vitalii</td>
+</tr>
+<tr>
+<td>Vova Vysotskyi</td>
+<td>volodymyr</td>
+</tr>
+<tr>
+<td>Weijie Tong</td>
+<td>weijie</td>
+</tr>
 </tbody></table>
 </div>