You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2018/08/06 16:18:48 UTC
impala git commit: [DOCS] Dedicated coordinators and executors
explained
Repository: impala
Updated Branches:
refs/heads/master 34fd9732e -> 312a8d60d
[DOCS] Dedicated coordinators and executors explained
Change-Id: Ic0038eb40951ddf619ea6470a7363122635391d8
Reviewed-on: http://gerrit.cloudera.org:8080/11080
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/312a8d60
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/312a8d60
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/312a8d60
Branch: refs/heads/master
Commit: 312a8d60dd0f8f9efa4072a3f12579640857abc5
Parents: 34fd973
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Mon Jul 30 11:49:33 2018 -0700
Committer: Alex Rodoni <ar...@cloudera.com>
Committed: Mon Aug 6 15:59:48 2018 +0000
----------------------------------------------------------------------
docs/impala.ditamap | 1 +
docs/impala_keydefs.ditamap | 2 +-
docs/topics/impala_dedicated_coordinator.xml | 629 ++++++++++++++++++++++
docs/topics/impala_scalability.xml | 114 ----
4 files changed, 631 insertions(+), 115 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/impala/blob/312a8d60/docs/impala.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index 19c72a6..68558fd 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -283,6 +283,7 @@ under the License.
<topicref audience="hidden" href="topics/impala_perf_ddl.xml"/>
</topicref>
<topicref href="topics/impala_scalability.xml"/>
+ <topicref href="topics/impala_dedicated_coordinator.xml"/>
<topicref href="topics/impala_partitioning.xml"/>
<topicref href="topics/impala_file_formats.xml">
<topicref href="topics/impala_txtfile.xml"/>
http://git-wip-us.apache.org/repos/asf/impala/blob/312a8d60/docs/impala_keydefs.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index ac97531..ff83cd2 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -10957,7 +10957,7 @@ under the License.
<keydef href="topics/impala_scalability.xml" keys="scalability"/>
<keydef audience="hidden" href="topics/impala_scalability.xml#scalability_memory" keys="scalability_memory"/>
<keydef href="topics/impala_scalability.xml#scalability_catalog" keys="scalability_catalog"/>
- <keydef href="topics/impala_scalability.xml#scalability_coordinator" keys="scalability_coordinator"/>
+ <keydef href="topics/impala_dedicated_coordinator.xml" keys="scalability_coordinator"/>
<keydef href="topics/impala_scalability.xml#statestore_scalability" keys="statestore_scalability"/>
<keydef href="topics/impala_scalability.xml#scalability_buffer_pool" keys="scalability_buffer_pool"/>
<keydef audience="hidden" href="topics/impala_scalability.xml#scalability_cluster_size" keys="scalability_cluster_size"/>
http://git-wip-us.apache.org/repos/asf/impala/blob/312a8d60/docs/topics/impala_dedicated_coordinator.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_dedicated_coordinator.xml b/docs/topics/impala_dedicated_coordinator.xml
new file mode 100644
index 0000000..1b43772
--- /dev/null
+++ b/docs/topics/impala_dedicated_coordinator.xml
@@ -0,0 +1,629 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<concept id="scalability">
+
+ <title>How to Configure Impala with Dedicated Coordinators</title>
+
+ <titlealts audience="PDF">
+
+ <navtitle>Dedicated Coordinators Optimization</navtitle>
+
+ </titlealts>
+
+ <prolog>
+ <metadata>
+ <data name="Category" value="Performance"/>
+ <data name="Category" value="Impala"/>
+ <data name="Category" value="Planning"/>
+ <data name="Category" value="Querying"/>
+ <data name="Category" value="Memory"/>
+ <data name="Category" value="Scalability"/>
+ </metadata>
+ </prolog>
+
+ <conbody>
+
+ <p>
+ Each host that runs the Impala Daemon acts as both a coordinator and as an executor, by
+ default, managing metadata caching, query compilation, and query execution. In this
+ configuration, Impala clients can connect to any Impala daemon and send query requests.
+ </p>
+
+ <p>
+ During highly concurrent workloads for large-scale queries, the dual roles can cause
+ scalability issues because:
+ </p>
+
+ <ul>
+ <li>
+ <p>
+ The extra work required for a host to act as the coordinator could interfere with its
+ capacity to perform other work for the later phases of the query. For example,
+ coordinators can experience significant network and CPU overhead with queries
+ containing a large number of query fragments. Each coordinator caches metadata for all
+ table partitions and data files, which requires coordinators to be configured with a
+ large JVM heap. Executor-only Impala daemons should be configured with the default JVM
+ heaps, which leaves more memory available to process joins, aggregations, and other
+ operations performed by query executors.
+ </p>
+ </li>
+
+ <li>
+ <p>
+ Having a large number of hosts act as coordinators can cause unnecessary network
+ overhead, or even timeout errors, as each of those hosts communicates with the
+ Statestored daemon for metadata updates.
+ </p>
+ </li>
+
+ <li>
+ <p >
+ The "soft limits" imposed by the admission control feature are more likely to be
+ exceeded when there are a large number of heavily loaded hosts acting as coordinators.
+ Check
+ <xref
+ href="https://issues.apache.org/jira/browse/IMPALA-3649"
+ format="html" scope="external">
+ <u>IMPALA-3649</u>
+ </xref> and
+ <xref
+ href="https://issues.apache.org/jira/browse/IMPALA-6437"
+ format="html" scope="external">
+ <u>IMPALA-6437</u>
+ </xref> to see the status of the enhancements to mitigate this issue.
+ </p>
+ </li>
+ </ul>
+
+ <p >
+ The following factors can further exacerbate the above issues:
+ </p>
+
+ <ul>
+ <li >
+ <p >
+ High number of concurrent query fragments due to query concurrency and/or query
+ complexity
+ </p>
+ </li>
+
+ <li >
+ <p >
+ Large metadata topic size related to the number of partitions/files/blocks
+ </p>
+ </li>
+
+ <li >
+ <p >
+ High number of coordinator nodes
+ </p>
+ </li>
+
+ <li >
+ <p >
+ High number of coordinators used in the same resource pool
+ </p>
+ </li>
+ </ul>
+
+ <p>
+ If such scalability bottlenecks occur, in CDH 5.12 / Impala 2.9 and higher, you can assign
+ one dedicated role to each Impala daemon host, either as a coordinator or as an executor,
+ to address the issues.
+ </p>
+
+ <ul>
+ <li>
+ <p>
+ All explicit or load-balanced client connections must go to the coordinator hosts.
+ These hosts perform the network communication to keep metadata up-to-date and route
+ query results to the appropriate clients. The dedicated coordinator hosts do not
+ participate in I/O-intensive operations such as scans, and CPU-intensive operations
+ such as aggregations.
+ </p>
+ </li>
+
+ <li>
+ <p>
+ The executor hosts perform the intensive I/O, CPU, and memory operations that make up
+ the bulk of the work for each query. The executors do communicate with the Statestored
+ daemon for membership status, but the dedicated executor hosts do not process the
+ final result sets for queries.
+ </p>
+ </li>
+ </ul>
+
+ <p >
+ Using dedicated coordinators offers the following benefits:
+ </p>
+
+ <ul>
+ <li >
+ <p>
+ Reduces memory usage by limiting the number of Impala nodes that need to cache
+ metadata.
+ </p>
+ </li>
+
+ <li >
+ <p>
+ Provides a better concurrency by avoiding coordinator bottleneck.
+ </p>
+ </li>
+
+ <li>
+ <p>
+ Eliminates the query over admission by using one dedicated coordinator.
+ </p>
+ </li>
+
+ <li >
+ <p>
+ Reduces resource, especially network, utilization on the Statestored daemon by
+ limiting metadata broadcast to a subset of nodes.
+ </p>
+ </li>
+
+ <li >
+ <p>
+ Improves reliability and performance for highly concurrent workloads by reducing
+ workload stress on coordinators. Dedicated coordinators require 50% or less
+ connections and threads.
+ </p>
+ </li>
+
+ <li >
+ <p>
+ Reduces the number of explicit metadata refreshes required.
+ </p>
+ </li>
+
+ <li >
+ <p>
+ Improves diagnosability if a bottleneck or other performance issue arises on a
+ specific host, you can narrow down the cause more easily because each host is
+ dedicated to specific operations within the overall Impala workload.
+ </p>
+ </li>
+ </ul>
+
+ <p>
+ In this configuration with dedicated coordinators / executors, you cannot connect to the
+ dedicated executor hosts through clients such as impala-shell or business intelligence
+ tools as only the coordinator nodes support client connections.
+ </p>
+
+ </conbody>
+
+ <concept id="concept_vhv_4b1_n2b">
+
+ <title>Determining the Optimal Number of Dedicated Coordinators</title>
+
+ <conbody>
+
+ <p >
+ You should have the smallest number of coordinators that will still satisfy your
+ workload requirements in a cluster. A rough estimation is 1 coordinator for every 50
+ executors.
+ </p>
+
+ <p>
+ To maintain a healthy state and optimal performance, it is recommended that you keep the
+ peak utilization of all resources used by Impala, including CPU, the number of threads,
+ the number of connections, RPCs, under 80%.
+ </p>
+
+ <p >
+ Consider the following factors to determine the right number of coordinators in your
+ cluster:
+ </p>
+
+ <ul>
+ <li >
+ <p >
+ What is the number of concurrent queries?
+ </p>
+ </li>
+
+ <li >
+ <p >
+ What percentage of the workload is DDL?
+ </p>
+ </li>
+
+ <li >
+ <p>
+ What is the average query resource usage at the various stages (merge, runtime
+ filter, result set size, etc.)?
+ </p>
+ </li>
+
+ <li >
+ <p>
+ How many Impala Daemons (impalad) is in the cluster?
+ </p>
+ </li>
+
+ <li >
+ <p>
+ Is there a high availability requirement?
+ </p>
+ </li>
+
+ <li >
+ <p>
+ Compute/storage capacity reduction factor
+ </p>
+ </li>
+ </ul>
+
+ <p>
+ Start with the below set of steps to determine the initial number of coordinators:
+ </p>
+
+ <ol>
+ <li>
+ If your cluster has less than 10 nodes, we recommend that you configure one dedicated
+ coordinator. Deploy the dedicated coordinator on a DataNode to avoid losing storage
+ capacity. In most of cases, one dedicated coordinator is enough to support all
+ workloads on a cluster.
+ </li>
+
+ <li>
+ Add more coordinators if the dedicated coordinator CPU or network peak utilization is
+ 80% or higher. You might need 1 coordinator for every 50 executors.
+ </li>
+
+ <li>
+ If the Impala service is shared by multiple workgroups with a dynamic resource pool
+ assigned, use one coordinator per pool to avoid admission control over admission.
+ </li>
+
+ <li>
+ If high availability is required, double the number of coordinators. One set as an
+ active set and the other as a backup set.
+ </li>
+ </ol>
+
+ </conbody>
+
+ <concept id="concept_y4k_rc1_n2b">
+
+ <title>Advanced Tuning</title>
+
+ <conbody>
+
+ <p >
+ Use the following guidelines to further tune the throughput and stability.
+ <ol>
+ <li >
+ The concurrency of DML statements does not typically depend on the number of
+ coordinators or size of the cluster. Queries that return large result sets
+ (10,000+ rows) consume more CPU and memory resources on the coordinator. Add one
+ or two coordinators if the workload has many such queries.
+ </li>
+
+ <li>
+ DDL queries, excluding <codeph>COMPUTE STATS</codeph> and <codeph>CREATE TABLE AS
+ SELECT</codeph>, are executed only on coordinators. If your workload contains many
+ DDL queries running concurrently, you could add one coordinator.
+ </li>
+
+ <li>
+ The CPU contention on coordinators can slow down query executions when concurrency
+ is high, especially for very short queries (<10s). Add more coordinators to
+ avoid CPU contention.
+ </li>
+
+ <li>
+ On a large cluster with 50+ nodes, the number of network connections from a
+ coordinator to executors can grow quickly as query complexity increases. The
+ growth is much greater on coordinators than executors. Add a few more coordinators
+ if workload are complex, i.e. (an average number of fragments * number of Impalad)
+ > 500, but with the low memory/CPU usage to share the load. Watch IMPALA-4603 and
+ IMPALA-7213 to track the progress on fixing this issue.
+ </li>
+
+ <li >
+ When using multiple coordinators for DML statements, divide queries to different
+ groups (number of groups = number of coordinators). Configure a separate dynamic
+ resource pool for each group and direct each group of query requests to a specific
+ coordinator. This is to avoid query over admission.
+ </li>
+
+ <li>
+ The front-end connection requirement is not a factor in determining the number of
+ dedicated coordinators. Consider setting up a connection pool at the client side
+ instead of adding coordinators. For a short term solution, you could increase the
+ value of <codeph>fe_service_threads</codeph> on coordinators to allow more client
+ connections.
+ </li>
+
+ <li>
+ In general, you should have a very small number of coordinators so storage
+ capacity reduction is not a concern. On a very small cluster (less than 10 nodes),
+ deploy a dedicated coordinator on a DataNode to avoid storage capacity reduction.
+ </li>
+ </ol>
+ </p>
+
+ </conbody>
+
+ </concept>
+
+ <concept id="concept_w51_cd1_n2b">
+
+ <title>Estimating Coordinator Resource Usage</title>
+
+ <conbody>
+
+ <p>
+ <table id="table_kh3_d31_n2b" colsep="1" rowsep="1" frame="all">
+ <tgroup cols="3">
+ <colspec colnum="1" colname="col1" colsep="1" rowsep="1"/>
+ <colspec colnum="2" colname="col2" colsep="1" rowsep="1"/>
+ <colspec colnum="3" colname="col3" colsep="1" rowsep="1"/>
+ <tbody>
+ <row>
+ <entry >
+ <b>Resource</b>
+ </entry>
+ <entry >
+ <b>Safe range</b>
+ </entry>
+ <entry >
+ <b>Notes / CM tsquery to monitor</b>
+ </entry>
+ </row>
+ <row>
+ <entry >
+ Memory
+ </entry>
+ <entry>
+ <p >
+ (Max JVM heap setting +
+ </p>
+
+
+
+ <p >
+ query concurrency *
+ </p>
+
+
+
+ <p >
+ query mem_limit)
+ </p>
+
+
+
+ <p >
+ <=
+ </p>
+
+
+
+ <p >
+ 80% of Impala process memory allocation
+ </p>
+ </entry>
+ <entry>
+ <p >
+ <i>Memory usage</i>:
+ </p>
+
+
+
+ <p >
+ SELECT mem_rss WHERE entityName = "Coordinator Instance ID" AND category =
+ ROLE
+ </p>
+
+
+
+ <p >
+ <i>JVM heap usage (metadata cache)</i>:
+ </p>
+
+
+
+ <p >
+ SELECT impala_jvm_heap_current_usage_bytes WHERE entityName = "Coordinator
+ Instance ID" AND category = ROLE (only in release 5.15 and above)
+ </p>
+ </entry>
+ </row>
+ <row>
+ <entry >
+ TCP Connection
+ </entry>
+ <entry >
+ Incoming + outgoing < 16K
+ </entry>
+ <entry>
+ <p >
+ <i>Incoming connection usage</i>:
+ </p>
+
+
+
+ <p >
+ SELECT thrift_server_backend_connections_in_use WHERE entityName =
+ "Coordinator Instance ID" AND category = ROLE
+ </p>
+
+
+
+ <p >
+ <i>Outgoing connection usage</i>:
+ </p>
+
+
+
+ <p >
+ SELECT backends_client_cache_clients_in_use WHERE entityName =
+ "Coordinator Instance ID" AND category = ROLE
+ </p>
+ </entry>
+ </row>
+ <row>
+ <entry >
+ Threads
+ </entry>
+ <entry >
+ < 32K
+ </entry>
+ <entry >
+ SELECT thread_manager_running_threads WHERE entityName = "Coordinator
+ Instance ID" AND category = ROLE
+ </entry>
+ </row>
+ <row>
+ <entry >
+ CPU
+ </entry>
+ <entry>
+ <p >
+ Concurrency =
+ </p>
+
+
+
+ <p >
+ non-DDL query concurrency <=
+ </p>
+
+
+
+ <p>
+ number of virtual cores allocated to Impala per node
+ </p>
+ </entry>
+ <entry>
+ <p >
+ CPU usage estimation should be based on how many cores are allocated to
+ Impala per node, not a sum of all cores of the cluster.
+ </p>
+
+
+
+ <p>
+ It is recommended that concurrency should not be more than the number of
+ virtual cores allocated to Impala per node.
+ </p>
+
+
+
+ <p ></p>
+
+
+
+ <p >
+ <i>Query concurrency:</i>
+ </p>
+
+
+
+ <p>
+ SELECT total_impala_num_queries_registered_across_impalads WHERE
+ entityName = "IMPALA-1" AND category = SERVICE
+ </p>
+
+
+
+ <p ></p>
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </p>
+
+ <p>
+ If usage of any of the above resources exceeds the safe range, add one more
+ coordinator.
+ </p>
+
+ </conbody>
+
+ </concept>
+
+ </concept>
+
+ <concept id="concept_omm_gf1_n2b">
+
+ <title>Deploying Dedicated Coordinators and Executors</title>
+
+ <conbody>
+
+ <p >
+ This section describes the process to configure a dedicated coordinator and a dedicated
+ executor roles for Impala.
+ </p>
+
+ <ul>
+ <li >
+ <p>
+ <b>Dedicated coordinator</b>: They should be on edge node with no other services
+ running on it. They don’t need large local disks but still need some that can be
+ used for Spilling. They require at least same or even larger memory allocation.
+ </p>
+ </li>
+
+ <li >
+ <p>
+ <b>(Dedicated) Executors: </b>They should be collocated with DataNodes as usual.
+ The number of hosts with this setting typically increases as the cluster grows
+ larger and handles more table partitions, data files, and concurrent queries.
+ </p>
+ </li>
+ </ul>
+
+ <p> To configuring dedicated coordinators/executors, you specify one of
+ the following startup flags for the <cmdname>impalad</cmdname> daemon on
+ each host: <ul>
+ <li>
+ <p>
+ <codeph>is_executor=false</codeph> for each host that does not act
+ as an executor for Impala queries. These hosts act exclusively as
+ query coordinators. This setting typically applies to a relatively
+ small number of hosts, because the most common topology is to have
+ nearly all DataNodes doing work for query execution. </p>
+ </li>
+ <li>
+ <p>
+ <codeph>is_coordinator=false</codeph> for each host that does not
+ act as a coordinator for Impala queries. These hosts act
+ exclusively as executors. The number of hosts with this setting
+ typically increases as the cluster grows larger and handles more
+ table partitions, data files, and concurrent queries. As the
+ overhead for query coordination increases, it becomes more
+ important to centralize that work on dedicated hosts. </p>
+ </li>
+ </ul>
+ </p>
+
+ </conbody>
+
+ </concept>
+
+</concept>
http://git-wip-us.apache.org/repos/asf/impala/blob/312a8d60/docs/topics/impala_scalability.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_scalability.xml b/docs/topics/impala_scalability.xml
index 340d131..94c2fdb 100644
--- a/docs/topics/impala_scalability.xml
+++ b/docs/topics/impala_scalability.xml
@@ -305,120 +305,6 @@ Memory Usage: Additional Notes
</conbody>
</concept>
- <concept id="scalability_coordinator" rev="2.9.0 IMPALA-3807 IMPALA-5147 IMPALA-5503">
-
- <title>Controlling which Hosts are Coordinators and Executors</title>
-
- <conbody>
-
- <p>
- By default, each host in the cluster that runs the <cmdname>impalad</cmdname>
- daemon can act as the coordinator for an Impala query, execute the fragments
- of the execution plan for the query, or both. During highly concurrent
- workloads for large-scale queries, especially on large clusters, the dual
- roles can cause scalability issues:
- </p>
-
- <ul>
- <li>
- <p>
- The extra work required for a host to act as the coordinator could interfere
- with its capacity to perform other work for the earlier phases of the query.
- For example, the coordinator can experience significant network and CPU overhead
- during queries containing a large number of query fragments. Each coordinator
- caches metadata for all table partitions and data files, which can be substantial
- and contend with memory needed to process joins, aggregations, and other operations
- performed by query executors.
- </p>
- </li>
- <li>
- <p>
- Having a large number of hosts act as coordinators can cause unnecessary network
- overhead, or even timeout errors, as each of those hosts communicates with the
- <cmdname>statestored</cmdname> daemon for metadata updates.
- </p>
- </li>
- <li>
- <p>
- The <q>soft limits</q> imposed by the admission control feature are more likely
- to be exceeded when there are a large number of heavily loaded hosts acting as
- coordinators.
- </p>
- </li>
- </ul>
-
- <p>
- If such scalability bottlenecks occur, you can explicitly specify that certain
- hosts act as query coordinators, but not executors for query fragments.
- These hosts do not participate in I/O-intensive operations such as scans,
- and CPU-intensive operations such as aggregations.
- </p>
-
- <p>
- Then, you specify that the other hosts act as executors but not
- coordinators. These hosts do communicate with the
- <cmdname>statestored</cmdname> daemon or process the final result sets
- from queries, but the executor hosts do not get the metadata topic from
- <cmdname>statestored</cmdname>. You cannot connect to these hosts
- through clients such as <cmdname>impala-shell</cmdname> or business
- intelligence tools.
- </p>
-
- <p>
- This feature is available in <keyword keyref="impala29_full"/> and higher.
- </p>
-
- <p>
- To use this feature, you specify one of the following startup flags for the
- <cmdname>impalad</cmdname> daemon on each host:
- </p>
-
- <ul>
- <li>
- <p>
- <codeph>is_executor=false</codeph> for each host that
- does not act as an executor for Impala queries.
- These hosts act exclusively as query coordinators.
- This setting typically applies to a relatively small number of
- hosts, because the most common topology is to have nearly all
- DataNodes doing work for query execution.
- </p>
- </li>
- <li>
- <p>
- <codeph>is_coordinator=false</codeph> for each host that
- does not act as a coordinator for Impala queries.
- These hosts act exclusively as executors.
- The number of hosts with this setting typically increases
- as the cluster grows larger and handles more table partitions,
- data files, and concurrent queries. As the overhead for query
- coordination increases, it becomes more important to centralize
- that work on dedicated hosts.
- </p>
- </li>
- </ul>
-
- <p>
- By default, both of these settings are enabled for each <codeph>impalad</codeph>
- instance, allowing all such hosts to act as both executors and coordinators.
- </p>
-
- <p>
- For example, on a 100-node cluster, you might specify <codeph>is_executor=false</codeph>
- for 10 hosts, to dedicate those hosts as query coordinators. Then specify
- <codeph>is_coordinator=false</codeph> for the remaining 90 hosts. All explicit or
- load-balanced connections must go to the 10 hosts acting as coordinators. These hosts
- perform the network communication to keep metadata up-to-date and route query results
- to the appropriate clients. The remaining 90 hosts perform the intensive I/O, CPU, and
- memory operations that make up the bulk of the work for each query. If a bottleneck or
- other performance issue arises on a specific host, you can narrow down the cause more
- easily because each host is dedicated to specific operations within the overall
- Impala workload.
- </p>
-
- </conbody>
- </concept>
-
<concept id="scalability_buffer_pool" rev="2.10.0 IMPALA-3200">
<title>Effect of Buffer Pool on Memory Usage (<keyword keyref="impala210"/> and higher)</title>
<conbody>