You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hawq.apache.org by yo...@apache.org on 2016/11/15 23:59:56 UTC

incubator-hawq-docs git commit: This closes #59 - Revisions to HAWQ Best Practices topics.

Repository: incubator-hawq-docs
Updated Branches:
  refs/heads/develop 740b6ee69 -> 9f4293ba4


This closes #59 - Revisions to HAWQ Best Practices topics.


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/9f4293ba
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/9f4293ba
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/9f4293ba

Branch: refs/heads/develop
Commit: 9f4293ba40edad95b1eca1d9dfe04f22d3208afa
Parents: 740b6ee
Author: David Yozie <yo...@apache.org>
Authored: Tue Nov 15 15:59:09 2016 -0800
Committer: David Yozie <yo...@apache.org>
Committed: Tue Nov 15 15:59:09 2016 -0800

----------------------------------------------------------------------
 .../HAWQBestPracticesOverview.html.md.erb       |  3 ---
 .../operating_hawq_bestpractices.html.md.erb    | 13 ++++++++--
 .../querying_data_bestpractices.html.md.erb     | 24 +++++++++++++++---
 query/query-performance.html.md.erb             | 26 ++++++++++++++------
 4 files changed, 50 insertions(+), 16 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/9f4293ba/bestpractices/HAWQBestPracticesOverview.html.md.erb
----------------------------------------------------------------------
diff --git a/bestpractices/HAWQBestPracticesOverview.html.md.erb b/bestpractices/HAWQBestPracticesOverview.html.md.erb
index 6277727..13b4dca 100644
--- a/bestpractices/HAWQBestPracticesOverview.html.md.erb
+++ b/bestpractices/HAWQBestPracticesOverview.html.md.erb
@@ -4,9 +4,6 @@ title: Best Practices
 
 This chapter provides best practices on using the components and features that are part of a HAWQ system.
 
--   **[HAWQ Best Practices](../bestpractices/general_bestpractices.html)**
-
-    This topic addresses general best practices for using HAWQ.
 
 -   **[Best Practices for Operating HAWQ](../bestpractices/operating_hawq_bestpractices.html)**
 

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/9f4293ba/bestpractices/operating_hawq_bestpractices.html.md.erb
----------------------------------------------------------------------
diff --git a/bestpractices/operating_hawq_bestpractices.html.md.erb b/bestpractices/operating_hawq_bestpractices.html.md.erb
index d48cf82..9dc56e9 100644
--- a/bestpractices/operating_hawq_bestpractices.html.md.erb
+++ b/bestpractices/operating_hawq_bestpractices.html.md.erb
@@ -4,6 +4,16 @@ title: Best Practices for Operating HAWQ
 
 This topic provides best practices for operating HAWQ, including recommendations for stopping, starting and monitoring HAWQ.
 
+## <a id="best_practice_config"></a>Best Practices for Configuring HAWQ Parameters
+
+The HAWQ configuration guc/parameters are located in `$GPHOME/etc/hawq-site.xml`. This configuration file resides on all HAWQ instances and can be modified either by the Ambari interface or the command line. 
+
+If you install and manage HAWQ using Ambari, use the Ambari interface for all configuration changes. Do not use command line utilities such as `hawq config` to set or change HAWQ configuration properties for Ambari-managed clusters. Configuration changes to `hawq-site.xml` made outside the Ambari interface will be overwritten when you restart or reconfigure HAWQ using Ambari.
+
+If you manage your cluster using command line tools instead of Ambari, use a consistent `hawq-site.xml` file to configure your entire cluster. 
+
+**Note:** While `postgresql.conf` still exists in HAWQ, any parameters defined in `hawq-site.xml` will overwrite configurations in `postgresql.conf`. For this reason, we recommend that you only use `hawq-site.xml` to configure your HAWQ cluster. For Ambari clusters, always use Ambari for configuring `hawq-site.xml` parameters.
+
 ## <a id="task_qgk_bz3_1v"></a>Best Practices to Start/Stop HAWQ Cluster Members
 
 For best results in using `hawq start` and `hawq stop` to manage your HAWQ system, the following best practices are recommended.
@@ -85,7 +95,6 @@ WHERE status &lt;&gt; &#39;u&#39;;</code></pre></td>
 <ol>
 <li>Verify that the hosts with down segments are responsive.</li>
 <li>If hosts are OK, check the <span class="ph filepath">pg_log</span> files for the down segments to discover the root cause of the segments going down.</li>
-<li>If no unexpected errors are found, run the <code class="ph codeph">gprecoverseg</code> utility to bring the segments back online.</li>
 </ol></td>
 </tr>
 </tbody>
@@ -116,7 +125,7 @@ WHERE status &lt;&gt; &#39;u&#39;;</code></pre></td>
 <p>Recommended frequency: real-time, if possible, or every 15 minutes</p>
 <p>Severity: CRITICAL</p></td>
 <td>Set up system check for hardware and OS errors.</td>
-<td>If required, remove a machine from the HAWQ cluster to resolve hardware and OS issues, then, after add it back to the cluster and run <code class="ph codeph">gprecoverseg</code>.</td>
+<td>If required, remove a machine from the HAWQ cluster to resolve hardware and OS issues, then add it back to the cluster after the issues are resolved.</td>
 </tr>
 <tr class="even">
 <td>Check disk space usage on volumes used for HAWQ data storage and the OS.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/9f4293ba/bestpractices/querying_data_bestpractices.html.md.erb
----------------------------------------------------------------------
diff --git a/bestpractices/querying_data_bestpractices.html.md.erb b/bestpractices/querying_data_bestpractices.html.md.erb
index e2fb983..3efe569 100644
--- a/bestpractices/querying_data_bestpractices.html.md.erb
+++ b/bestpractices/querying_data_bestpractices.html.md.erb
@@ -4,6 +4,25 @@ title: Best Practices for Querying Data
 
 To obtain the best results when querying data in HAWQ, review the best practices described in this topic.
 
+## <a id="virtual_seg_performance"></a>Factors Impacting Query Performance
+
+The number of virtual segments used for a query directly impacts the query's performance. The following factors can impact the degree of parallelism of a query:
+
+-   **Cost of the query**. Small queries use fewer segments and larger queries use more segments. Some techniques used in defining resource queues can influence the number of both virtual segments and general resources allocated to queries. For more information, see [Best Practices for Using Resource Queues](managing_resources_bestpractices.html#topic_hvd_pls_wv).
+-   **Available resources at query time**. If more resources are available in the resource queue, those resources will be used.
+-   **Hash table and bucket number**. If the query involves only hash-distributed tables, the query's parallelism is fixed (equal to the hash table bucket number) under the following conditions: 
+ 
+  	- The bucket number (bucketnum) configured for all the hash tables is the same for all tables 
+   - The table size for random tables is no more than 1.5 times the size allotted for the hash tables. 
+
+  Otherwise, the number of virtual segments depends on the query's cost: hash-distributed table queries behave like queries on randomly-distributed tables.
+  
+-   **Query Type**: It can be difficult to calculate  resource costs for queries with some user-defined functions or for queries to external tables. With these queries,  the number of virtual segments is controlled by the  `hawq_rm_nvseg_perquery_limit `and `hawq_rm_nvseg_perquery_perseg_limit` parameters, as well as by the ON clause and the location list of external tables. If the query has a hash result table (e.g. `INSERT into hash_table`), the number of virtual segments must be equal to the bucket number of the resulting hash table. If the query is performed in utility mode, such as for `COPY` and `ANALYZE` operations, the virtual segment number is calculated by different policies.
+
+  ***Note:*** PXF external tables use the `default_hash_table_bucket_number` parameter, not the `hawq_rm_nvseg_perquery_perseg_limit` parameter, to control the number of virtual segments.
+
+See [Query Performance](../query/query-performance.html#topic38) for more details.
+
 ## <a id="id_xtk_jmq_1v"></a>Examining Query Plans to Solve Problems
 
 If a query performs poorly, examine its query plan and ask the following questions:
@@ -20,8 +39,5 @@ If a query performs poorly, examine its query plan and ask the following questio
 
     `Work_mem used: 23430K bytes avg, 23430K bytes max (seg0). Work_mem wanted: 33649K bytes avg, 33649K bytes max (seg0) to lessen workfile I/O affecting 2               workers.`
 
-The "bytes wanted" (Work_mem) message from `EXPLAIN ANALYZE` is based on the amount of data written to work files and is not exact.
-
-**Note**
-The *work\_mem* property is not configurable. Use resource queues to manage memory use. For more information on resource queues, see [Configuring Resource Management](../resourcemgmt/ConfigureResourceManagement.html) and [Working with Hierarchical Resource Queues](../resourcemgmt/ResourceQueues.html).
+  **Note:** The "bytes wanted" (*work\_mem* property) is based on the amount of data written to work files and is not exact. This property is not configurable. Use resource queues to manage memory use. For more information on resource queues, see [Configuring Resource Management](../resourcemgmt/ConfigureResourceManagement.html) and [Working with Hierarchical Resource Queues](../resourcemgmt/ResourceQueues.html).
 

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/9f4293ba/query/query-performance.html.md.erb
----------------------------------------------------------------------
diff --git a/query/query-performance.html.md.erb b/query/query-performance.html.md.erb
index e3aa8f7..981d77b 100644
--- a/query/query-performance.html.md.erb
+++ b/query/query-performance.html.md.erb
@@ -118,18 +118,30 @@ The following table describes the metrics related to data locality. Use these me
 
 ## <a id="topic_wv3_gzc_d5"></a>Number of Virtual Segments
 
-The number of virtual segment used has impacts on the query performance. HAWQ decides the number of virtual segments of a query (its parallelism) by using the following rules:
+To obtain the best results when querying data in HAWQ, review the best practices described in this topic.
 
--   **Cost of the query**. Small queries use fewer segments and larger queries use more segments. Note that there are some techniques you can use when defining resource queues to influence the number of virtual segments and general resources that are allocated to queries. See [Best Practices for Using Resource Queues](../bestpractices/managing_resources_bestpractices.html#topic_hvd_pls_wv).
--   **Available resources**. Resources available at query time. If more resources are available in the resource queue, the resources will be used.
--   **Hash table and bucket number**. If the query involves only hash-distributed tables, and the bucket number (bucketnum) configured for all the hash tables is either the same bucket number for all tables or the table size for random tables is no more than 1.5 times larger than the size of hash tables for the hash tables, then the query's parallelism is fixed (equal to the hash table bucket number). Otherwise, the number of virtual segments depends on the query's cost and hash-distributed table queries will behave like queries on randomly distributed tables.
--   **Query Type**: For queries with some user-defined functions or for external tables where calculating resource costs is difficult , then the number of virtual segments is controlled by `hawq_rm_nvseg_perquery_limit `and `hawq_rm_nvseg_perquery_perseg_limit` parameters, as well as by the ON clause and the location list of external tables. If the query has a hash result table (e.g. `INSERT into hash_table`) then the number of virtual segment number must be equal to the bucket number of the resulting hash table, If the query is performed in utility mode, such as for `COPY` and `ANALYZE` operations, the virtual segment number is calculated by different policies, which will be explained later in this section.
+### <a id="virtual_seg_performance"></a>Factors Impacting Query Performance
 
-The following are guidelines for numbers of virtual segments to use, provided there are sufficient resources available.
+The number of virtual segments used for a query directly impacts the query's performance. The following factors can impact the degree of parallelism of a query:
+
+-   **Cost of the query**. Small queries use fewer segments and larger queries use more segments. Some techniques used in defining resource queues can influence the number of both virtual segments and general resources allocated to queries.
+-   **Available resources at query time**. If more resources are available in the resource queue, those resources will be used.
+-   **Hash table and bucket number**. If the query involves only hash-distributed tables, the query's parallelism is fixed (equal to the hash table bucket number) under the following conditions:
+
+   - The bucket number (bucketnum) configured for all the hash tables is the same bucket number
+   - The table size for random tables is no more than 1.5 times the size allotted for the hash tables.
+
+  Otherwise, the number of virtual segments depends on the query's cost: hash-distributed table queries behave like queries on randomly-distributed tables.
+
+-   **Query Type**: It can be difficult to calculate  resource costs for queries with some user-defined functions or for queries to external tables. With these queries,  the number of virtual segments is controlled by the  `hawq_rm_nvseg_perquery_limit `and `hawq_rm_nvseg_perquery_perseg_limit` parameters, as well as by the ON clause and the location list of external tables. If the query has a hash result table (e.g. `INSERT into hash_table`), the number of virtual segments must be equal to the bucket number of the resulting hash table. If the query is performed in utility mode, such as for `COPY` and `ANALYZE` operations, the virtual segment number is calculated by different policies.
+
+###General Guidelines
+
+The following guidelines expand on the numbers of virtual segments to use, provided there are sufficient resources available.
 
 -   **Random tables exist in the select list:** \#vseg (number of virtual segments) depends on the size of the table.
 -   **Hash tables exist in the select list:** \#vseg depends on the bucket number of the table.
--   **Random and hash tables both exist in the select list:** \#vseg depends on the bucket number of the table, if the table size of random tables is no more than 1.5 times larger than the size of hash tables. Otherwise, \#vseg depends on the size of the random table.
+-   **Random and hash tables both exist in the select list:** \#vseg depends on the bucket number of the table, if the table size of random tables is no more than 1.5 times the size of hash tables. Otherwise, \#vseg depends on the size of the random table.
 -   **User-defined functions exist:** \#vseg depends on the `hawq_rm_nvseg_perquery_limit` and `hawq_rm_nvseg_perquery_perseg_limit` parameters.
 -   **PXF external tables exist:** \#vseg depends on the `default_hash_table_bucket_number` parameter.
 -   **gpfdist external tables exist:** \#vseg is at least the number of locations in the location list.