You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2018/04/03 02:11:13 UTC
[7/9] impala git commit: [DOCS] Updates to the load balancing
algorithms section
[DOCS] Updates to the load balancing algorithms section
Further refined the Load Balancing Algorithm section
with reviews and comments from SME.
Change-Id: Ia697aafc799b2a3414a208aa85e1de4bf0214317
Reviewed-on: http://gerrit.cloudera.org:8080/9869
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/5a78a30d
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/5a78a30d
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/5a78a30d
Branch: refs/heads/master
Commit: 5a78a30d1e49ca6289a666bc62a5af544224af90
Parents: 4f59991
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Fri Mar 30 11:23:16 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Tue Apr 3 00:36:26 2018 +0000
----------------------------------------------------------------------
docs/topics/impala_proxy.xml | 111 +++++++++++++++++++-------------------
1 file changed, 54 insertions(+), 57 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/impala/blob/5a78a30d/docs/topics/impala_proxy.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_proxy.xml b/docs/topics/impala_proxy.xml
index 653f7f4..1f5bb4b 100644
--- a/docs/topics/impala_proxy.xml
+++ b/docs/topics/impala_proxy.xml
@@ -95,9 +95,12 @@ under the License.
<ol>
<li>
- Download the load-balancing proxy software. It should only need to be installed and configured on a
- single host. Pick a host other than the DataNodes where <cmdname>impalad</cmdname> is running,
- because the intention is to protect against the possibility of one or more of these DataNodes becoming unavailable.
+ Select and download the load-balancing proxy software or other
+ load-balancing hardware appliance. It should only need to be installed
+ and configured on a single host, typically on an edge node. Pick a
+ host other than the DataNodes where <cmdname>impalad</cmdname> is
+ running, because the intention is to protect against the possibility
+ of one or more of these DataNodes becoming unavailable.
</li>
<li>
@@ -105,42 +108,27 @@ under the License.
In particular:
<ul>
<li>
- <p>
- Set up a port that the load balancer will listen on to relay Impala requests back and forth.
- </p>
- </li>
+ Set up a port that the load balancer will listen on to relay
+ Impala requests back and forth. </li>
<li>
- <p rev="DOCS-690">
- Consider using the <i>source affinity</i>
- algorithm to ensure the sticky sessions. Where practical, enable
- this setting so that stateless client applications such as
- <cmdname>impalad</cmdname> and Hue are not disconnected from
- long-running queries. Evaluate whether this setting is
- appropriate for your combination of workload and client
- applications. See <xref href="#proxy_balancing" format="dita"/>
- for load balancing algorithm options.
- </p>
+ See <xref href="#proxy_balancing" format="dita"/> for load
+ balancing algorithm options.
</li>
<li>
- <p>
- For Kerberized clusters, follow the instructions in <xref href="impala_proxy.xml#proxy_kerberos"/>.
- </p>
+ For Kerberized clusters, follow the instructions in <xref
+ href="impala_proxy.xml#proxy_kerberos"/>.
</li>
</ul>
</li>
<li>
- Specify the host and port settings for each Impala node. These are the hosts that the load balancer will
- choose from when relaying each Impala query. See <xref href="impala_ports.xml#ports"/> for when to use
- port 21000, 21050, or another value depending on what type of connections you are load balancing.
- <note rev="">
- <p rev="">
- In particular, if you are using Hue or JDBC-based applications,
- you typically set up load balancing for both ports 21000 and 21050, because
- these client applications connect through port 21050 while the <cmdname>impala-shell</cmdname>
- command connects through port 21000.
- </p>
- </note>
+ If you are using Hue or JDBC-based applications, you typically set
+ up load balancing for both ports 21000 and 21050, because these client
+ applications connect through port 21050 while the
+ <cmdname>impala-shell</cmdname> command connects through port
+ 21000. See <xref href="impala_ports.xml#ports"/> for when to use port
+ 21000, 21050, or another value depending on what type of connections
+ you are load balancing.
</li>
<li>
@@ -148,9 +136,11 @@ under the License.
</li>
<li>
- For any scripts, jobs, or configuration settings for applications that formerly connected to a specific
- datanode to run Impala SQL statements, change the connection information (such as the <codeph>-i</codeph>
- option in <cmdname>impala-shell</cmdname>) to point to the load balancer instead.
+ For any scripts, jobs, or configuration settings for applications
+ that formerly connected to a specific DataNode to run Impala SQL
+ statements, change the connection information (such as the
+ <codeph>-i</codeph> option in <cmdname>impala-shell</cmdname>) to
+ point to the load balancer instead.
</li>
</ol>
@@ -174,49 +164,56 @@ under the License.
<dl>
<dlentry>
- <dt>leastconn</dt>
+ <dt>Leastconn</dt>
<dd>
Connects sessions to the coordinator with the fewest connections,
to balance the load evenly. Typically used for workloads consisting
of many independent, short-running queries. In configurations with
only a few client machines, this setting can avoid having all
- requests go to only a small set of coordinators. Recommended for
- Impala with F5.
+ requests go to only a small set of coordinators.
+ </dd>
+ <dd>
+ Recommended for Impala with F5.
</dd>
</dlentry>
<dlentry>
- <dt>source affinity</dt>
+ <dt>Source IP Persistence</dt>
<dd>
- Sessions from the same IP address always go to the same coordinator.
- A good choice for Impala workloads containing a mix of queries and
- DDL statements, such as <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph>.
- Because the metadata changes from a DDL statement take time to propagate across the cluster,
- prefer to use source affinity in this case. If necessary, run the DDL and subsequent
- queries that depend on the results of the DDL through the same session, for example
- by running <codeph>impala-shell -f <varname>script_file</varname></codeph> to submit
- several statements through a single session.
- An alternative is to set the query option <codeph>SYNC_DDL=1</codeph>
- to hold back subsequent queries until the results of a DDL operation have propagated
- throughout the cluster, but that is a relatively expensive setting.
+ <p>
+ Sessions from the same IP address always go to the same
+ coordinator. A good choice for Impala workloads containing a mix
+ of queries and DDL statements, such as <codeph>CREATE TABLE</codeph>
+ and <codeph>ALTER TABLE</codeph>. Because the metadata changes from
+ a DDL statement take time to propagate across the cluster, prefer
+ to use the Source IP Persistence in this case. If you are unable
+ to choose Source IP Persistence, run the DDL and subsequent queries
+ that depend on the results of the DDL through the same session,
+ for example by running <codeph>impala-shell -f <varname>script_file</varname></codeph>
+ to submit several statements through a single session.
+ </p>
</dd>
<dd>
- Recommended for use with Hue.
+ <p>
+ Required for setting up high availability with Hue.
+ </p>
</dd>
</dlentry>
<dlentry>
- <dt>round-robin</dt>
+ <dt>Round-robin</dt>
<dd>
- Distributes connections to all coordinator nodes.
- Typically not recommended for Impala.
+ <p>
+ Distributes connections to all coordinator nodes.
+ Typically not recommended for Impala.
+ </p>
</dd>
</dlentry>
</dl>
<p>
- You might need to perform benchmarks and load testing to determine which setting is optimal for your
- use case. If some client applications have special characteristics, such as long-running Hue queries
- working best with source affinity, you might configure multiple virtual IP addresses with a
- different load-balancing algorithm for each.
+ You might need to perform benchmarks and load testing to determine
+ which setting is optimal for your use case. Always set up using two
+ load-balancing algorithms: Source IP Persistence for Hue and Leastconn
+ for others.
</p>
</conbody>