You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2019/02/09 01:49:09 UTC
[impala] 05/05: IMPALA-8105: [DOCS] Document
cache_remote_file_handles flag
This is an automated email from the ASF dual-hosted git repository.
arodoni pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit 022ba2bbdb6feec0ba944847e92d5a051cedc9a1
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Mon Feb 4 17:41:23 2019 -0800
IMPALA-8105: [DOCS] Document cache_remote_file_handles flag
Change-Id: Id649e733324f55a80a0199302dfa3b627ad183cf
Reviewed-on: http://gerrit.cloudera.org:8080/12362
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
---
docs/topics/impala_scalability.xml | 880 +++++++++++++++++++++----------------
1 file changed, 498 insertions(+), 382 deletions(-)
diff --git a/docs/topics/impala_scalability.xml b/docs/topics/impala_scalability.xml
index 94c2fdb..71b8425 100644
--- a/docs/topics/impala_scalability.xml
+++ b/docs/topics/impala_scalability.xml
@@ -1,4 +1,5 @@
-<?xml version="1.0" encoding="UTF-8"?><!--
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
@@ -20,7 +21,13 @@ under the License.
<concept id="scalability">
<title>Scalability Considerations for Impala</title>
- <titlealts audience="PDF"><navtitle>Scalability Considerations</navtitle></titlealts>
+
+ <titlealts audience="PDF">
+
+ <navtitle>Scalability Considerations</navtitle>
+
+ </titlealts>
+
<prolog>
<metadata>
<data name="Category" value="Performance"/>
@@ -30,7 +37,7 @@ under the License.
<data name="Category" value="Developers"/>
<data name="Category" value="Memory"/>
<data name="Category" value="Scalability"/>
- <!-- Using domain knowledge about Impala, sizing, etc. to decide what to mark as 'Proof of Concept'. -->
+<!-- Using domain knowledge about Impala, sizing, etc. to decide what to mark as 'Proof of Concept'. -->
<data name="Category" value="Proof of Concept"/>
</metadata>
</prolog>
@@ -38,10 +45,11 @@ under the License.
<conbody>
<p>
- This section explains how the size of your cluster and the volume of data influences SQL performance and
- schema design for Impala tables. Typically, adding more cluster capacity reduces problems due to memory
- limits or disk throughput. On the other hand, larger clusters are more likely to have other kinds of
- scalability issues, such as a single slow node that causes performance problems for queries.
+ This section explains how the size of your cluster and the volume of data influences SQL
+ performance and schema design for Impala tables. Typically, adding more cluster capacity
+ reduces problems due to memory limits or disk throughput. On the other hand, larger
+ clusters are more likely to have other kinds of scalability issues, such as a single slow
+ node that causes performance problems for queries.
</p>
<p outputclass="toc inpage"/>
@@ -53,14 +61,15 @@ under the License.
<concept audience="hidden" id="scalability_memory">
<title>Overview and Guidelines for Impala Memory Usage</title>
- <prolog>
- <metadata>
- <data name="Category" value="Memory"/>
- <data name="Category" value="Concepts"/>
- <data name="Category" value="Best Practices"/>
- <data name="Category" value="Guidelines"/>
- </metadata>
- </prolog>
+
+ <prolog>
+ <metadata>
+ <data name="Category" value="Memory"/>
+ <data name="Category" value="Concepts"/>
+ <data name="Category" value="Best Practices"/>
+ <data name="Category" value="Guidelines"/>
+ </metadata>
+ </prolog>
<conbody>
@@ -158,7 +167,9 @@ Memory Usage: Additional Notes
* Review previous common issues on out-of-memory
* Note: Even with disk-based joins, you'll want to review these steps to speed up queries and use memory more efficiently
</codeblock>
+
</conbody>
+
</concept>
<concept id="scalability_catalog">
@@ -173,24 +184,25 @@ Memory Usage: Additional Notes
</p>
<p>
- Because Hadoop I/O is optimized for reading and writing large files, Impala is optimized for tables
- containing relatively few, large data files. Schemas containing thousands of tables, or tables containing
- thousands of partitions, can encounter performance issues during startup or during DDL operations such as
- <codeph>ALTER TABLE</codeph> statements.
+ Because Hadoop I/O is optimized for reading and writing large files, Impala is optimized
+ for tables containing relatively few, large data files. Schemas containing thousands of
+ tables, or tables containing thousands of partitions, can encounter performance issues
+ during startup or during DDL operations such as <codeph>ALTER TABLE</codeph> statements.
</p>
<note type="important" rev="TSB-168">
- <p>
- Because of a change in the default heap size for the <cmdname>catalogd</cmdname> daemon in
- <keyword keyref="impala25_full"/> and higher, the following procedure to increase the <cmdname>catalogd</cmdname>
- memory limit might be required following an upgrade to <keyword keyref="impala25_full"/> even if not
- needed previously.
- </p>
+ <p>
+ Because of a change in the default heap size for the <cmdname>catalogd</cmdname>
+ daemon in <keyword keyref="impala25_full"/> and higher, the following procedure to
+ increase the <cmdname>catalogd</cmdname> memory limit might be required following an
+ upgrade to <keyword keyref="impala25_full"/> even if not needed previously.
+ </p>
</note>
<p conref="../shared/impala_common.xml#common/increase_catalogd_heap_size"/>
</conbody>
+
</concept>
<concept rev="2.1.0" id="statestore_scalability">
@@ -200,30 +212,33 @@ Memory Usage: Additional Notes
<conbody>
<p>
- Before <keyword keyref="impala21_full"/>, the statestore sent only one kind of message to its subscribers. This message contained all
- updates for any topics that a subscriber had subscribed to. It also served to let subscribers know that the
- statestore had not failed, and conversely the statestore used the success of sending a heartbeat to a
+ Before <keyword keyref="impala21_full"/>, the statestore sent only one kind of message
+ to its subscribers. This message contained all updates for any topics that a subscriber
+ had subscribed to. It also served to let subscribers know that the statestore had not
+ failed, and conversely the statestore used the success of sending a heartbeat to a
subscriber to decide whether or not the subscriber had failed.
</p>
<p>
- Combining topic updates and failure detection in a single message led to bottlenecks in clusters with large
- numbers of tables, partitions, and HDFS data blocks. When the statestore was overloaded with metadata
- updates to transmit, heartbeat messages were sent less frequently, sometimes causing subscribers to time
- out their connection with the statestore. Increasing the subscriber timeout and decreasing the frequency of
- statestore heartbeats worked around the problem, but reduced responsiveness when the statestore failed or
- restarted.
+ Combining topic updates and failure detection in a single message led to bottlenecks in
+ clusters with large numbers of tables, partitions, and HDFS data blocks. When the
+ statestore was overloaded with metadata updates to transmit, heartbeat messages were
+ sent less frequently, sometimes causing subscribers to time out their connection with
+ the statestore. Increasing the subscriber timeout and decreasing the frequency of
+ statestore heartbeats worked around the problem, but reduced responsiveness when the
+ statestore failed or restarted.
</p>
<p>
- As of <keyword keyref="impala21_full"/>, the statestore now sends topic updates and heartbeats in separate messages. This allows the
- statestore to send and receive a steady stream of lightweight heartbeats, and removes the requirement to
- send topic updates according to a fixed schedule, reducing statestore network overhead.
+ As of <keyword keyref="impala21_full"/>, the statestore now sends topic updates and
+ heartbeats in separate messages. This allows the statestore to send and receive a steady
+ stream of lightweight heartbeats, and removes the requirement to send topic updates
+ according to a fixed schedule, reducing statestore network overhead.
</p>
<p>
- The statestore now has the following relevant configuration flags for the <cmdname>statestored</cmdname>
- daemon:
+ The statestore now has the following relevant configuration flags for the
+ <cmdname>statestored</cmdname> daemon:
</p>
<dl>
@@ -234,8 +249,8 @@ Memory Usage: Additional Notes
</dt>
<dd>
- The number of threads inside the statestore dedicated to sending topic updates. You should not
- typically need to change this value.
+ The number of threads inside the statestore dedicated to sending topic updates. You
+ should not typically need to change this value.
<p>
<b>Default:</b> 10
</p>
@@ -250,9 +265,10 @@ Memory Usage: Additional Notes
</dt>
<dd>
- The frequency, in milliseconds, with which the statestore tries to send topic updates to each
- subscriber. This is a best-effort value; if the statestore is unable to meet this frequency, it sends
- topic updates as fast as it can. You should not typically need to change this value.
+ The frequency, in milliseconds, with which the statestore tries to send topic
+ updates to each subscriber. This is a best-effort value; if the statestore is unable
+ to meet this frequency, it sends topic updates as fast as it can. You should not
+ typically need to change this value.
<p>
<b>Default:</b> 2000
</p>
@@ -267,8 +283,8 @@ Memory Usage: Additional Notes
</dt>
<dd>
- The number of threads inside the statestore dedicated to sending heartbeats. You should not typically
- need to change this value.
+ The number of threads inside the statestore dedicated to sending heartbeats. You
+ should not typically need to change this value.
<p>
<b>Default:</b> 10
</p>
@@ -283,9 +299,10 @@ Memory Usage: Additional Notes
</dt>
<dd>
- The frequency, in milliseconds, with which the statestore tries to send heartbeats to each subscriber.
- This value should be good for large catalogs and clusters up to approximately 150 nodes. Beyond that,
- you might need to increase this value to make the interval longer between heartbeat messages.
+ The frequency, in milliseconds, with which the statestore tries to send heartbeats
+ to each subscriber. This value should be good for large catalogs and clusters up to
+ approximately 150 nodes. Beyond that, you might need to increase this value to make
+ the interval longer between heartbeat messages.
<p>
<b>Default:</b> 1000 (one heartbeat message every second)
</p>
@@ -295,46 +312,54 @@ Memory Usage: Additional Notes
</dl>
<p>
- If it takes a very long time for a cluster to start up, and <cmdname>impala-shell</cmdname> consistently
- displays <codeph>This Impala daemon is not ready to accept user requests</codeph>, the statestore might be
- taking too long to send the entire catalog topic to the cluster. In this case, consider adding
- <codeph>--load_catalog_in_background=false</codeph> to your catalog service configuration. This setting
- stops the statestore from loading the entire catalog into memory at cluster startup. Instead, metadata for
- each table is loaded when the table is accessed for the first time.
+ If it takes a very long time for a cluster to start up, and
+ <cmdname>impala-shell</cmdname> consistently displays <codeph>This Impala daemon is not
+ ready to accept user requests</codeph>, the statestore might be taking too long to send
+ the entire catalog topic to the cluster. In this case, consider adding
+ <codeph>--load_catalog_in_background=false</codeph> to your catalog service
+ configuration. This setting stops the statestore from loading the entire catalog into
+ memory at cluster startup. Instead, metadata for each table is loaded when the table is
+ accessed for the first time.
</p>
+
</conbody>
+
</concept>
<concept id="scalability_buffer_pool" rev="2.10.0 IMPALA-3200">
+
<title>Effect of Buffer Pool on Memory Usage (<keyword keyref="impala210"/> and higher)</title>
+
<conbody>
+
<p>
- The buffer pool feature, available in <keyword keyref="impala210"/> and higher, changes the
- way Impala allocates memory during a query. Most of the memory needed is reserved at the
- beginning of the query, avoiding cases where a query might run for a long time before failing
- with an out-of-memory error. The actual memory estimates and memory buffers are typically
- smaller than before, so that more queries can run concurrently or process larger volumes
- of data than previously.
+ The buffer pool feature, available in <keyword keyref="impala210"/> and higher, changes
+ the way Impala allocates memory during a query. Most of the memory needed is reserved at
+ the beginning of the query, avoiding cases where a query might run for a long time
+ before failing with an out-of-memory error. The actual memory estimates and memory
+ buffers are typically smaller than before, so that more queries can run concurrently or
+ process larger volumes of data than previously.
</p>
+
<p>
The buffer pool feature includes some query options that you can fine-tune:
- <xref keyref="buffer_pool_limit"/>,
- <xref keyref="default_spillable_buffer_size"/>,
- <xref keyref="max_row_size"/>, and
- <xref keyref="min_spillable_buffer_size"/>.
- </p>
- <p>
- Most of the effects of the buffer pool are transparent to you as an Impala user.
- Memory use during spilling is now steadier and more predictable, instead of
- increasing rapidly as more data is spilled to disk. The main change from a user
- perspective is the need to increase the <codeph>MAX_ROW_SIZE</codeph> query option
- setting when querying tables with columns containing long strings, many columns,
- or other combinations of factors that produce very large rows. If Impala encounters
- rows that are too large to process with the default query option settings, the query
- fails with an error message suggesting to increase the <codeph>MAX_ROW_SIZE</codeph>
- setting.
+ <xref keyref="buffer_pool_limit"/>, <xref keyref="default_spillable_buffer_size"/>,
+ <xref keyref="max_row_size"/>, and <xref keyref="min_spillable_buffer_size"/>.
</p>
+
+ <p>
+ Most of the effects of the buffer pool are transparent to you as an Impala user. Memory
+ use during spilling is now steadier and more predictable, instead of increasing rapidly
+ as more data is spilled to disk. The main change from a user perspective is the need to
+ increase the <codeph>MAX_ROW_SIZE</codeph> query option setting when querying tables
+ with columns containing long strings, many columns, or other combinations of factors
+ that produce very large rows. If Impala encounters rows that are too large to process
+ with the default query option settings, the query fails with an error message suggesting
+ to increase the <codeph>MAX_ROW_SIZE</codeph> setting.
+ </p>
+
</conbody>
+
</concept>
<concept audience="hidden" id="scalability_cluster_size">
@@ -343,9 +368,10 @@ Memory Usage: Additional Notes
<conbody>
- <p>
- </p>
+ <p></p>
+
</conbody>
+
</concept>
<concept audience="hidden" id="concurrent_connections">
@@ -355,7 +381,9 @@ Memory Usage: Additional Notes
<conbody>
<p></p>
+
</conbody>
+
</concept>
<concept rev="2.0.0" id="spill_to_disk">
@@ -365,22 +393,26 @@ Memory Usage: Additional Notes
<conbody>
<p>
- Certain memory-intensive operations write temporary data to disk (known as <term>spilling</term> to disk)
- when Impala is close to exceeding its memory limit on a particular host.
+ Certain memory-intensive operations write temporary data to disk (known as
+ <term>spilling</term> to disk) when Impala is close to exceeding its memory limit on a
+ particular host.
</p>
<p>
- The result is a query that completes successfully, rather than failing with an out-of-memory error. The
- tradeoff is decreased performance due to the extra disk I/O to write the temporary data and read it back
- in. The slowdown could be potentially be significant. Thus, while this feature improves reliability,
- you should optimize your queries, system parameters, and hardware configuration to make this spilling a rare occurrence.
+ The result is a query that completes successfully, rather than failing with an
+ out-of-memory error. The tradeoff is decreased performance due to the extra disk I/O to
+ write the temporary data and read it back in. The slowdown could be potentially be
+ significant. Thus, while this feature improves reliability, you should optimize your
+ queries, system parameters, and hardware configuration to make this spilling a rare
+ occurrence.
</p>
<note rev="2.10.0 IMPALA-3200">
<p>
- In <keyword keyref="impala210"/> and higher, also see <xref keyref="scalability_buffer_pool"/> for
- changes to Impala memory allocation that might change the details of which queries spill to disk,
- and how much memory and disk space is involved in the spilling operation.
+ In <keyword keyref="impala210"/> and higher, also see
+ <xref keyref="scalability_buffer_pool"/> for changes to Impala memory allocation that
+ might change the details of which queries spill to disk, and how much memory and disk
+ space is involved in the spilling operation.
</p>
</note>
@@ -389,38 +421,42 @@ Memory Usage: Additional Notes
</p>
<p>
- Several SQL clauses and constructs require memory allocations that could activat the spilling mechanism:
+ Several SQL clauses and constructs require memory allocations that could activat the
+ spilling mechanism:
</p>
+
<ul>
<li>
<p>
- when a query uses a <codeph>GROUP BY</codeph> clause for columns
- with millions or billions of distinct values, Impala keeps a
- similar number of temporary results in memory, to accumulate the
- aggregate results for each value in the group.
+ when a query uses a <codeph>GROUP BY</codeph> clause for columns with millions or
+ billions of distinct values, Impala keeps a similar number of temporary results in
+ memory, to accumulate the aggregate results for each value in the group.
</p>
</li>
+
<li>
<p>
- When large tables are joined together, Impala keeps the values of
- the join columns from one table in memory, to compare them to
- incoming values from the other table.
+ When large tables are joined together, Impala keeps the values of the join columns
+ from one table in memory, to compare them to incoming values from the other table.
</p>
</li>
+
<li>
<p>
- When a large result set is sorted by the <codeph>ORDER BY</codeph>
- clause, each node sorts its portion of the result set in memory.
+ When a large result set is sorted by the <codeph>ORDER BY</codeph> clause, each node
+ sorts its portion of the result set in memory.
</p>
</li>
+
<li>
<p>
- The <codeph>DISTINCT</codeph> and <codeph>UNION</codeph> operators
- build in-memory data structures to represent all values found so
- far, to eliminate duplicates as the query progresses.
+ The <codeph>DISTINCT</codeph> and <codeph>UNION</codeph> operators build in-memory
+ data structures to represent all values found so far, to eliminate duplicates as the
+ query progresses.
</p>
</li>
- <!-- JIRA still in open state as of 5.8 / 2.6, commenting out.
+
+<!-- JIRA still in open state as of 5.8 / 2.6, commenting out.
<li>
<p rev="IMPALA-3471">
In <keyword keyref="impala26_full"/> and higher, <term>top-N</term> queries (those with
@@ -445,30 +481,32 @@ Memory Usage: Additional Notes
</p>
<p rev="2.10.0 IMPALA-3200">
- In <keyword keyref="impala210_full"/> and higher, the way SQL operators such as <codeph>GROUP BY</codeph>,
- <codeph>DISTINCT</codeph>, and joins, transition between using additional memory or activating the
- spill-to-disk feature is changed. The memory required to spill to disk is reserved up front, and you can
- examine it in the <codeph>EXPLAIN</codeph> plan when the <codeph>EXPLAIN_LEVEL</codeph> query option is
+ In <keyword keyref="impala210_full"/> and higher, the way SQL operators such as
+ <codeph>GROUP BY</codeph>, <codeph>DISTINCT</codeph>, and joins, transition between
+ using additional memory or activating the spill-to-disk feature is changed. The memory
+ required to spill to disk is reserved up front, and you can examine it in the
+ <codeph>EXPLAIN</codeph> plan when the <codeph>EXPLAIN_LEVEL</codeph> query option is
set to 2 or higher.
</p>
- <p>
- The infrastructure of the spilling feature affects the way the affected SQL operators, such as
- <codeph>GROUP BY</codeph>, <codeph>DISTINCT</codeph>, and joins, use memory.
- On each host that participates in the query, each such operator in a query requires memory
- to store rows of data and other data structures. Impala reserves a certain amount of memory
- up front for each operator that supports spill-to-disk that is sufficient to execute the
- operator. If an operator accumulates more data than can fit in the reserved memory, it
- can either reserve more memory to continue processing data in memory or start spilling
- data to temporary scratch files on disk. Thus, operators with spill-to-disk support
- can adapt to different memory constraints by using however much memory is available
- to speed up execution, yet tolerate low memory conditions by spilling data to disk.
- </p>
-
- <p>
- The amount data depends on the portion of the data being handled by that host, and thus
- the operator may end up consuming different amounts of memory on different hosts.
- </p>
+ <p>
+ The infrastructure of the spilling feature affects the way the affected SQL operators,
+ such as <codeph>GROUP BY</codeph>, <codeph>DISTINCT</codeph>, and joins, use memory. On
+ each host that participates in the query, each such operator in a query requires memory
+ to store rows of data and other data structures. Impala reserves a certain amount of
+ memory up front for each operator that supports spill-to-disk that is sufficient to
+ execute the operator. If an operator accumulates more data than can fit in the reserved
+ memory, it can either reserve more memory to continue processing data in memory or start
+ spilling data to temporary scratch files on disk. Thus, operators with spill-to-disk
+ support can adapt to different memory constraints by using however much memory is
+ available to speed up execution, yet tolerate low memory conditions by spilling data to
+ disk.
+ </p>
+
+ <p>
+ The amount data depends on the portion of the data being handled by that host, and thus
+ the operator may end up consuming different amounts of memory on different hosts.
+ </p>
<!--
<p>
@@ -505,12 +543,13 @@ Memory Usage: Additional Notes
-->
<p>
- <b>Added in:</b> This feature was added to the <codeph>ORDER BY</codeph> clause in Impala 1.4.
- This feature was extended to cover join queries, aggregation functions, and analytic
- functions in Impala 2.0. The size of the memory work area required by
- each operator that spills was reduced from 512 megabytes to 256 megabytes in Impala 2.2.
- <ph rev="2.10.0 IMPALA-3200">The spilling mechanism was reworked to take advantage of the
- Impala buffer pool feature and be more predictable and stable in <keyword keyref="impala210_full"/>.</ph>
+ <b>Added in:</b> This feature was added to the <codeph>ORDER BY</codeph> clause in
+ Impala 1.4. This feature was extended to cover join queries, aggregation functions, and
+ analytic functions in Impala 2.0. The size of the memory work area required by each
+ operator that spills was reduced from 512 megabytes to 256 megabytes in Impala 2.2.
+ <ph rev="2.10.0 IMPALA-3200">The spilling mechanism was reworked to take advantage of
+ the Impala buffer pool feature and be more predictable and stable in
+ <keyword keyref="impala210_full"/>.</ph>
</p>
<p>
@@ -518,28 +557,30 @@ Memory Usage: Additional Notes
</p>
<p>
- Because the extra I/O can impose significant performance overhead on these types of queries, try to avoid
- this situation by using the following steps:
+ Because the extra I/O can impose significant performance overhead on these types of
+ queries, try to avoid this situation by using the following steps:
</p>
<ol>
<li>
- Detect how often queries spill to disk, and how much temporary data is written. Refer to the following
- sources:
+ Detect how often queries spill to disk, and how much temporary data is written. Refer
+ to the following sources:
<ul>
<li>
- The output of the <codeph>PROFILE</codeph> command in the <cmdname>impala-shell</cmdname>
- interpreter. This data shows the memory usage for each host and in total across the cluster. The
- <codeph>WriteIoBytes</codeph> counter reports how much data was written to disk for each operator
- during the query. (In <keyword keyref="impala29_full"/>, the counter was named
- <codeph>ScratchBytesWritten</codeph>; in <keyword keyref="impala28_full"/> and earlier, it was named
- <codeph>BytesWritten</codeph>.)
+ The output of the <codeph>PROFILE</codeph> command in the
+ <cmdname>impala-shell</cmdname> interpreter. This data shows the memory usage for
+ each host and in total across the cluster. The <codeph>WriteIoBytes</codeph>
+ counter reports how much data was written to disk for each operator during the
+ query. (In <keyword keyref="impala29_full"/>, the counter was named
+ <codeph>ScratchBytesWritten</codeph>; in <keyword keyref="impala28_full"/> and
+ earlier, it was named <codeph>BytesWritten</codeph>.)
</li>
<li>
- The <uicontrol>Queries</uicontrol> tab in the Impala debug web user interface. Select the query to
- examine and click the corresponding <uicontrol>Profile</uicontrol> link. This data breaks down the
- memory usage for a single host within the cluster, the host whose web interface you are connected to.
+ The <uicontrol>Queries</uicontrol> tab in the Impala debug web user interface.
+ Select the query to examine and click the corresponding
+ <uicontrol>Profile</uicontrol> link. This data breaks down the memory usage for a
+ single host within the cluster, the host whose web interface you are connected to.
</li>
</ul>
</li>
@@ -548,15 +589,16 @@ Memory Usage: Additional Notes
Use one or more techniques to reduce the possibility of the queries spilling to disk:
<ul>
<li>
- Increase the Impala memory limit if practical, for example, if you can increase the available memory
- by more than the amount of temporary data written to disk on a particular node. Remember that in
- Impala 2.0 and later, you can issue <codeph>SET MEM_LIMIT</codeph> as a SQL statement, which lets you
- fine-tune the memory usage for queries from JDBC and ODBC applications.
+ Increase the Impala memory limit if practical, for example, if you can increase
+ the available memory by more than the amount of temporary data written to disk on
+ a particular node. Remember that in Impala 2.0 and later, you can issue
+ <codeph>SET MEM_LIMIT</codeph> as a SQL statement, which lets you fine-tune the
+ memory usage for queries from JDBC and ODBC applications.
</li>
<li>
- Increase the number of nodes in the cluster, to increase the aggregate memory available to Impala and
- reduce the amount of memory required on each node.
+ Increase the number of nodes in the cluster, to increase the aggregate memory
+ available to Impala and reduce the amount of memory required on each node.
</li>
<li>
@@ -564,54 +606,57 @@ Memory Usage: Additional Notes
</li>
<li>
- On a cluster with resources shared between Impala and other Hadoop components, use resource
- management features to allocate more memory for Impala. See
+ On a cluster with resources shared between Impala and other Hadoop components, use
+ resource management features to allocate more memory for Impala. See
<xref href="impala_resource_management.xml#resource_management"/> for details.
</li>
<li>
- If the memory pressure is due to running many concurrent queries rather than a few memory-intensive
- ones, consider using the Impala admission control feature to lower the limit on the number of
- concurrent queries. By spacing out the most resource-intensive queries, you can avoid spikes in
- memory usage and improve overall response times. See
- <xref href="impala_admission.xml#admission_control"/> for details.
+ If the memory pressure is due to running many concurrent queries rather than a few
+ memory-intensive ones, consider using the Impala admission control feature to
+ lower the limit on the number of concurrent queries. By spacing out the most
+ resource-intensive queries, you can avoid spikes in memory usage and improve
+ overall response times. See <xref href="impala_admission.xml#admission_control"/>
+ for details.
</li>
<li>
- Tune the queries with the highest memory requirements, using one or more of the following techniques:
+ Tune the queries with the highest memory requirements, using one or more of the
+ following techniques:
<ul>
<li>
- Run the <codeph>COMPUTE STATS</codeph> statement for all tables involved in large-scale joins and
- aggregation queries.
+ Run the <codeph>COMPUTE STATS</codeph> statement for all tables involved in
+ large-scale joins and aggregation queries.
</li>
<li>
- Minimize your use of <codeph>STRING</codeph> columns in join columns. Prefer numeric values
- instead.
+ Minimize your use of <codeph>STRING</codeph> columns in join columns. Prefer
+ numeric values instead.
</li>
<li>
- Examine the <codeph>EXPLAIN</codeph> plan to understand the execution strategy being used for the
- most resource-intensive queries. See <xref href="impala_explain_plan.xml#perf_explain"/> for
- details.
+ Examine the <codeph>EXPLAIN</codeph> plan to understand the execution strategy
+ being used for the most resource-intensive queries. See
+ <xref href="impala_explain_plan.xml#perf_explain"/> for details.
</li>
<li>
- If Impala still chooses a suboptimal execution strategy even with statistics available, or if it
- is impractical to keep the statistics up to date for huge or rapidly changing tables, add hints
- to the most resource-intensive queries to select the right execution strategy. See
+ If Impala still chooses a suboptimal execution strategy even with statistics
+ available, or if it is impractical to keep the statistics up to date for huge
+ or rapidly changing tables, add hints to the most resource-intensive queries
+ to select the right execution strategy. See
<xref href="impala_hints.xml#hints"/> for details.
</li>
</ul>
</li>
<li>
- If your queries experience substantial performance overhead due to spilling, enable the
- <codeph>DISABLE_UNSAFE_SPILLS</codeph> query option. This option prevents queries whose memory usage
- is likely to be exorbitant from spilling to disk. See
- <xref href="impala_disable_unsafe_spills.xml#disable_unsafe_spills"/> for details. As you tune
- problematic queries using the preceding steps, fewer and fewer will be cancelled by this option
- setting.
+ If your queries experience substantial performance overhead due to spilling,
+ enable the <codeph>DISABLE_UNSAFE_SPILLS</codeph> query option. This option
+ prevents queries whose memory usage is likely to be exorbitant from spilling to
+ disk. See <xref href="impala_disable_unsafe_spills.xml#disable_unsafe_spills"/>
+ for details. As you tune problematic queries using the preceding steps, fewer and
+ fewer will be cancelled by this option setting.
</li>
</ul>
</li>
@@ -622,22 +667,24 @@ Memory Usage: Additional Notes
</p>
<p>
- To artificially provoke spilling, to test this feature and understand the performance implications, use a
- test environment with a memory limit of at least 2 GB. Issue the <codeph>SET</codeph> command with no
- arguments to check the current setting for the <codeph>MEM_LIMIT</codeph> query option. Set the query
- option <codeph>DISABLE_UNSAFE_SPILLS=true</codeph>. This option limits the spill-to-disk feature to prevent
- runaway disk usage from queries that are known in advance to be suboptimal. Within
- <cmdname>impala-shell</cmdname>, run a query that you expect to be memory-intensive, based on the criteria
- explained earlier. A self-join of a large table is a good candidate:
+ To artificially provoke spilling, to test this feature and understand the performance
+ implications, use a test environment with a memory limit of at least 2 GB. Issue the
+ <codeph>SET</codeph> command with no arguments to check the current setting for the
+ <codeph>MEM_LIMIT</codeph> query option. Set the query option
+ <codeph>DISABLE_UNSAFE_SPILLS=true</codeph>. This option limits the spill-to-disk
+ feature to prevent runaway disk usage from queries that are known in advance to be
+ suboptimal. Within <cmdname>impala-shell</cmdname>, run a query that you expect to be
+ memory-intensive, based on the criteria explained earlier. A self-join of a large table
+ is a good candidate:
</p>
<codeblock>select count(*) from big_table a join big_table b using (column_with_many_values);
</codeblock>
<p>
- Issue the <codeph>PROFILE</codeph> command to get a detailed breakdown of the memory usage on each node
- during the query.
- <!--
+ Issue the <codeph>PROFILE</codeph> command to get a detailed breakdown of the memory
+ usage on each node during the query.
+<!--
The crucial part of the profile output concerning memory is the <codeph>BlockMgr</codeph>
portion. For example, this profile shows that the query did not quite exceed the memory limit.
-->
@@ -668,8 +715,8 @@ Memory Usage: Additional Notes
-->
<p>
- Set the <codeph>MEM_LIMIT</codeph> query option to a value that is smaller than the peak memory usage
- reported in the profile output. Now try the memory-intensive query again.
+ Set the <codeph>MEM_LIMIT</codeph> query option to a value that is smaller than the peak
+ memory usage reported in the profile output. Now try the memory-intensive query again.
</p>
<p>
@@ -682,46 +729,50 @@ these tables, hint the plan or disable this behavior via query options to enable
</codeblock>
<p>
- If so, the query could have consumed substantial temporary disk space, slowing down so much that it would
- not complete in any reasonable time. Rather than rely on the spill-to-disk feature in this case, issue the
- <codeph>COMPUTE STATS</codeph> statement for the table or tables in your sample query. Then run the query
- again, check the peak memory usage again in the <codeph>PROFILE</codeph> output, and adjust the memory
- limit again if necessary to be lower than the peak memory usage.
+ If so, the query could have consumed substantial temporary disk space, slowing down so
+ much that it would not complete in any reasonable time. Rather than rely on the
+ spill-to-disk feature in this case, issue the <codeph>COMPUTE STATS</codeph> statement
+ for the table or tables in your sample query. Then run the query again, check the peak
+ memory usage again in the <codeph>PROFILE</codeph> output, and adjust the memory limit
+ again if necessary to be lower than the peak memory usage.
</p>
<p>
- At this point, you have a query that is memory-intensive, but Impala can optimize it efficiently so that
- the memory usage is not exorbitant. You have set an artificial constraint through the
- <codeph>MEM_LIMIT</codeph> option so that the query would normally fail with an out-of-memory error. But
- the automatic spill-to-disk feature means that the query should actually succeed, at the expense of some
- extra disk I/O to read and write temporary work data.
+ At this point, you have a query that is memory-intensive, but Impala can optimize it
+ efficiently so that the memory usage is not exorbitant. You have set an artificial
+ constraint through the <codeph>MEM_LIMIT</codeph> option so that the query would
+ normally fail with an out-of-memory error. But the automatic spill-to-disk feature means
+ that the query should actually succeed, at the expense of some extra disk I/O to read
+ and write temporary work data.
</p>
<p>
- Try the query again, and confirm that it succeeds. Examine the <codeph>PROFILE</codeph> output again. This
- time, look for lines of this form:
+ Try the query again, and confirm that it succeeds. Examine the <codeph>PROFILE</codeph>
+ output again. This time, look for lines of this form:
</p>
<codeblock>- SpilledPartitions: <varname>N</varname>
</codeblock>
<p>
- If you see any such lines with <varname>N</varname> greater than 0, that indicates the query would have
- failed in Impala releases prior to 2.0, but now it succeeded because of the spill-to-disk feature. Examine
- the total time taken by the <codeph>AGGREGATION_NODE</codeph> or other query fragments containing non-zero
- <codeph>SpilledPartitions</codeph> values. Compare the times to similar fragments that did not spill, for
- example in the <codeph>PROFILE</codeph> output when the same query is run with a higher memory limit. This
- gives you an idea of the performance penalty of the spill operation for a particular query with a
- particular memory limit. If you make the memory limit just a little lower than the peak memory usage, the
- query only needs to write a small amount of temporary data to disk. The lower you set the memory limit, the
+ If you see any such lines with <varname>N</varname> greater than 0, that indicates the
+ query would have failed in Impala releases prior to 2.0, but now it succeeded because of
+ the spill-to-disk feature. Examine the total time taken by the
+ <codeph>AGGREGATION_NODE</codeph> or other query fragments containing non-zero
+ <codeph>SpilledPartitions</codeph> values. Compare the times to similar fragments that
+ did not spill, for example in the <codeph>PROFILE</codeph> output when the same query is
+ run with a higher memory limit. This gives you an idea of the performance penalty of the
+ spill operation for a particular query with a particular memory limit. If you make the
+ memory limit just a little lower than the peak memory usage, the query only needs to
+ write a small amount of temporary data to disk. The lower you set the memory limit, the
more temporary data is written and the slower the query becomes.
</p>
<p>
Now repeat this procedure for actual queries used in your environment. Use the
- <codeph>DISABLE_UNSAFE_SPILLS</codeph> setting to identify cases where queries used more memory than
- necessary due to lack of statistics on the relevant tables and columns, and issue <codeph>COMPUTE
- STATS</codeph> where necessary.
+ <codeph>DISABLE_UNSAFE_SPILLS</codeph> setting to identify cases where queries used more
+ memory than necessary due to lack of statistics on the relevant tables and columns, and
+ issue <codeph>COMPUTE STATS</codeph> where necessary.
</p>
<p>
@@ -729,242 +780,307 @@ these tables, hint the plan or disable this behavior via query options to enable
</p>
<p>
- You might wonder, why not leave <codeph>DISABLE_UNSAFE_SPILLS</codeph> turned on all the time. Whether and
- how frequently to use this option depends on your system environment and workload.
+ You might wonder, why not leave <codeph>DISABLE_UNSAFE_SPILLS</codeph> turned on all the
+ time. Whether and how frequently to use this option depends on your system environment
+ and workload.
</p>
<p>
- <codeph>DISABLE_UNSAFE_SPILLS</codeph> is suitable for an environment with ad hoc queries whose performance
- characteristics and memory usage are not known in advance. It prevents <q>worst-case scenario</q> queries
- that use large amounts of memory unnecessarily. Thus, you might turn this option on within a session while
- developing new SQL code, even though it is turned off for existing applications.
+ <codeph>DISABLE_UNSAFE_SPILLS</codeph> is suitable for an environment with ad hoc
+ queries whose performance characteristics and memory usage are not known in advance. It
+ prevents <q>worst-case scenario</q> queries that use large amounts of memory
+ unnecessarily. Thus, you might turn this option on within a session while developing new
+ SQL code, even though it is turned off for existing applications.
</p>
<p>
- Organizations where table and column statistics are generally up-to-date might leave this option turned on
- all the time, again to avoid worst-case scenarios for untested queries or if a problem in the ETL pipeline
- results in a table with no statistics. Turning on <codeph>DISABLE_UNSAFE_SPILLS</codeph> lets you <q>fail
- fast</q> in this case and immediately gather statistics or tune the problematic queries.
+ Organizations where table and column statistics are generally up-to-date might leave
+ this option turned on all the time, again to avoid worst-case scenarios for untested
+ queries or if a problem in the ETL pipeline results in a table with no statistics.
+ Turning on <codeph>DISABLE_UNSAFE_SPILLS</codeph> lets you <q>fail fast</q> in this case
+ and immediately gather statistics or tune the problematic queries.
</p>
<p>
- Some organizations might leave this option turned off. For example, you might have tables large enough that
- the <codeph>COMPUTE STATS</codeph> takes substantial time to run, making it impractical to re-run after
- loading new data. If you have examined the <codeph>EXPLAIN</codeph> plans of your queries and know that
- they are operating efficiently, you might leave <codeph>DISABLE_UNSAFE_SPILLS</codeph> turned off. In that
- case, you know that any queries that spill will not go overboard with their memory consumption.
+ Some organizations might leave this option turned off. For example, you might have
+ tables large enough that the <codeph>COMPUTE STATS</codeph> takes substantial time to
+ run, making it impractical to re-run after loading new data. If you have examined the
+ <codeph>EXPLAIN</codeph> plans of your queries and know that they are operating
+ efficiently, you might leave <codeph>DISABLE_UNSAFE_SPILLS</codeph> turned off. In that
+ case, you know that any queries that spill will not go overboard with their memory
+ consumption.
</p>
</conbody>
+
</concept>
-<concept id="complex_query">
-<title>Limits on Query Size and Complexity</title>
-<conbody>
-<p>
-There are hardcoded limits on the maximum size and complexity of queries.
-Currently, the maximum number of expressions in a query is 2000.
-You might exceed the limits with large or deeply nested queries
-produced by business intelligence tools or other query generators.
-</p>
-<p>
-If you have the ability to customize such queries or the query generation
-logic that produces them, replace sequences of repetitive expressions
-with single operators such as <codeph>IN</codeph> or <codeph>BETWEEN</codeph>
-that can represent multiple values or ranges.
-For example, instead of a large number of <codeph>OR</codeph> clauses:
-</p>
+ <concept id="complex_query">
+
+ <title>Limits on Query Size and Complexity</title>
+
+ <conbody>
+
+ <p>
+ There are hardcoded limits on the maximum size and complexity of queries. Currently, the
+ maximum number of expressions in a query is 2000. You might exceed the limits with large
+ or deeply nested queries produced by business intelligence tools or other query
+ generators.
+ </p>
+
+ <p>
+ If you have the ability to customize such queries or the query generation logic that
+ produces them, replace sequences of repetitive expressions with single operators such as
+ <codeph>IN</codeph> or <codeph>BETWEEN</codeph> that can represent multiple values or
+ ranges. For example, instead of a large number of <codeph>OR</codeph> clauses:
+ </p>
+
<codeblock>WHERE val = 1 OR val = 2 OR val = 6 OR val = 100 ...
</codeblock>
-<p>
-use a single <codeph>IN</codeph> clause:
-</p>
+
+ <p>
+ use a single <codeph>IN</codeph> clause:
+ </p>
+
<codeblock>WHERE val IN (1,2,6,100,...)</codeblock>
-</conbody>
-</concept>
-<concept id="scalability_io">
-<title>Scalability Considerations for Impala I/O</title>
-<conbody>
-<p>
-Impala parallelizes its I/O operations aggressively,
-therefore the more disks you can attach to each host, the better.
-Impala retrieves data from disk so quickly using
-bulk read operations on large blocks, that most queries
-are CPU-bound rather than I/O-bound.
-</p>
-<p>
-Because the kind of sequential scanning typically done by
-Impala queries does not benefit much from the random-access
-capabilities of SSDs, spinning disks typically provide
-the most cost-effective kind of storage for Impala data,
-with little or no performance penalty as compared to SSDs.
-</p>
-<p>
-Resource management features such as YARN, Llama, and admission control
-typically constrain the amount of memory, CPU, or overall number of
-queries in a high-concurrency environment.
-Currently, there is no throttling mechanism for Impala I/O.
-</p>
-</conbody>
-</concept>
+ </conbody>
+
+ </concept>
+
+ <concept id="scalability_io">
+
+ <title>Scalability Considerations for Impala I/O</title>
+
+ <conbody>
+
+ <p>
+ Impala parallelizes its I/O operations aggressively, therefore the more disks you can
+ attach to each host, the better. Impala retrieves data from disk so quickly using bulk
+ read operations on large blocks, that most queries are CPU-bound rather than I/O-bound.
+ </p>
+
+ <p>
+ Because the kind of sequential scanning typically done by Impala queries does not
+ benefit much from the random-access capabilities of SSDs, spinning disks typically
+ provide the most cost-effective kind of storage for Impala data, with little or no
+ performance penalty as compared to SSDs.
+ </p>
+
+ <p>
+ Resource management features such as YARN, Llama, and admission control typically
+ constrain the amount of memory, CPU, or overall number of queries in a high-concurrency
+ environment. Currently, there is no throttling mechanism for Impala I/O.
+ </p>
+
+ </conbody>
+
+ </concept>
<concept id="big_tables">
+
<title>Scalability Considerations for Table Layout</title>
+
<conbody>
+
<p>
- Due to the overhead of retrieving and updating table metadata
- in the metastore database, try to limit the number of columns
- in a table to a maximum of approximately 2000.
- Although Impala can handle wider tables than this, the metastore overhead
- can become significant, leading to query performance that is slower
- than expected based on the actual data volume.
+ Due to the overhead of retrieving and updating table metadata in the metastore database,
+ try to limit the number of columns in a table to a maximum of approximately 2000.
+ Although Impala can handle wider tables than this, the metastore overhead can become
+ significant, leading to query performance that is slower than expected based on the
+ actual data volume.
</p>
+
<p>
- To minimize overhead related to the metastore database and Impala query planning,
- try to limit the number of partitions for any partitioned table to a few tens of thousands.
+ To minimize overhead related to the metastore database and Impala query planning, try to
+ limit the number of partitions for any partitioned table to a few tens of thousands.
</p>
+
<p rev="IMPALA-5309">
- If the volume of data within a table makes it impractical to run exploratory
- queries, consider using the <codeph>TABLESAMPLE</codeph> clause to limit query processing
- to only a percentage of data within the table. This technique reduces the overhead
- for query startup, I/O to read the data, and the amount of network, CPU, and memory
- needed to process intermediate results during the query. See <xref keyref="tablesample"/>
- for details.
+ If the volume of data within a table makes it impractical to run exploratory queries,
+ consider using the <codeph>TABLESAMPLE</codeph> clause to limit query processing to only
+ a percentage of data within the table. This technique reduces the overhead for query
+ startup, I/O to read the data, and the amount of network, CPU, and memory needed to
+ process intermediate results during the query. See <xref keyref="tablesample"/> for
+ details.
</p>
+
</conbody>
+
</concept>
-<concept rev="" id="kerberos_overhead_cluster_size">
-<title>Kerberos-Related Network Overhead for Large Clusters</title>
-<conbody>
-<p>
-When Impala starts up, or after each <codeph>kinit</codeph> refresh, Impala sends a number of
-simultaneous requests to the KDC. For a cluster with 100 hosts, the KDC might be able to process
-all the requests within roughly 5 seconds. For a cluster with 1000 hosts, the time to process
-the requests would be roughly 500 seconds. Impala also makes a number of DNS requests at the same
-time as these Kerberos-related requests.
-</p>
-<p>
-While these authentication requests are being processed, any submitted Impala queries will fail.
-During this period, the KDC and DNS may be slow to respond to requests from components other than Impala,
-so other secure services might be affected temporarily.
-</p>
- <p>
- In <keyword keyref="impala212_full"/> or earlier, to reduce the
- frequency of the <codeph>kinit</codeph> renewal that initiates a new set
- of authentication requests, increase the <codeph>kerberos_reinit_interval</codeph>
- configuration setting for the <codeph>impalad</codeph> daemons. Currently,
- the default is 60 minutes. Consider using a higher value such as 360 (6 hours).
- </p>
- <p>
- The <codeph>kerberos_reinit_interval</codeph> configuration setting is removed
- in <keyword keyref="impala30_full"/>, and the above step is no longer needed.
- </p>
-
-</conbody>
-</concept>
+ <concept rev="" id="kerberos_overhead_cluster_size">
+
+ <title>Kerberos-Related Network Overhead for Large Clusters</title>
+
+ <conbody>
+
+ <p>
+ When Impala starts up, or after each <codeph>kinit</codeph> refresh, Impala sends a
+ number of simultaneous requests to the KDC. For a cluster with 100 hosts, the KDC might
+ be able to process all the requests within roughly 5 seconds. For a cluster with 1000
+ hosts, the time to process the requests would be roughly 500 seconds. Impala also makes
+ a number of DNS requests at the same time as these Kerberos-related requests.
+ </p>
+
+ <p>
+ While these authentication requests are being processed, any submitted Impala queries
+ will fail. During this period, the KDC and DNS may be slow to respond to requests from
+ components other than Impala, so other secure services might be affected temporarily.
+ </p>
+
+ <p>
+ In <keyword keyref="impala212_full"/> or earlier, to reduce the frequency of the
+ <codeph>kinit</codeph> renewal that initiates a new set of authentication requests,
+ increase the <codeph>kerberos_reinit_interval</codeph> configuration setting for the
+ <codeph>impalad</codeph> daemons. Currently, the default is 60 minutes. Consider using a
+ higher value such as 360 (6 hours).
+ </p>
+
+ <p>
+ The <codeph>kerberos_reinit_interval</codeph> configuration setting is removed in
+ <keyword keyref="impala30_full"/>, and the above step is no longer needed.
+ </p>
+
+ </conbody>
+
+ </concept>
<concept id="scalability_hotspots" rev="2.5.0 IMPALA-2696">
+
<title>Avoiding CPU Hotspots for HDFS Cached Data</title>
+
<conbody>
+
<p>
- You can use the HDFS caching feature, described in <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/>,
- with Impala to reduce I/O and memory-to-memory copying for frequently accessed tables or partitions.
+ You can use the HDFS caching feature, described in
+ <xref href="impala_perf_hdfs_caching.xml#hdfs_caching"/>, with Impala to reduce I/O and
+ memory-to-memory copying for frequently accessed tables or partitions.
</p>
+
<p>
In the early days of this feature, you might have found that enabling HDFS caching
resulted in little or no performance improvement, because it could result in
- <q>hotspots</q>: instead of the I/O to read the table data being parallelized across
- the cluster, the I/O was reduced but the CPU load to process the data blocks
- might be concentrated on a single host.
+ <q>hotspots</q>: instead of the I/O to read the table data being parallelized across the
+ cluster, the I/O was reduced but the CPU load to process the data blocks might be
+ concentrated on a single host.
</p>
+
<p>
To avoid hotspots, include the <codeph>WITH REPLICATION</codeph> clause with the
- <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph> statements for tables that use HDFS caching.
- This clause allows more than one host to cache the relevant data blocks, so the CPU load
- can be shared, reducing the load on any one host.
- See <xref href="impala_create_table.xml#create_table"/> and <xref href="impala_alter_table.xml#alter_table"/>
- for details.
+ <codeph>CREATE TABLE</codeph> or <codeph>ALTER TABLE</codeph> statements for tables that
+ use HDFS caching. This clause allows more than one host to cache the relevant data
+ blocks, so the CPU load can be shared, reducing the load on any one host. See
+ <xref href="impala_create_table.xml#create_table"/> and
+ <xref href="impala_alter_table.xml#alter_table"/> for details.
</p>
+
<p>
Hotspots with high CPU load for HDFS cached data could still arise in some cases, due to
- the way that Impala schedules the work of processing data blocks on different hosts.
- In <keyword keyref="impala25_full"/> and higher, scheduling improvements mean that the work for
- HDFS cached data is divided better among all the hosts that have cached replicas
- for a particular data block. When more than one host has a cached replica for a data block,
- Impala assigns the work of processing that block to whichever host has done the least work
- (in terms of number of bytes read) for the current query. If hotspots persist even with this
- load-based scheduling algorithm, you can enable the query option <codeph>SCHEDULE_RANDOM_REPLICA=TRUE</codeph>
- to further distribute the CPU load. This setting causes Impala to randomly pick a host to process a cached
- data block if the scheduling algorithm encounters a tie when deciding which host has done the
- least work.
+ the way that Impala schedules the work of processing data blocks on different hosts. In
+ <keyword keyref="impala25_full"/> and higher, scheduling improvements mean that the work
+ for HDFS cached data is divided better among all the hosts that have cached replicas for
+ a particular data block. When more than one host has a cached replica for a data block,
+ Impala assigns the work of processing that block to whichever host has done the least
+ work (in terms of number of bytes read) for the current query. If hotspots persist even
+ with this load-based scheduling algorithm, you can enable the query option
+ <codeph>SCHEDULE_RANDOM_REPLICA=TRUE</codeph> to further distribute the CPU load. This
+ setting causes Impala to randomly pick a host to process a cached data block if the
+ scheduling algorithm encounters a tie when deciding which host has done the least work.
</p>
+
</conbody>
+
</concept>
<concept id="scalability_file_handle_cache" rev="2.10.0 IMPALA-4623">
+
<title>Scalability Considerations for NameNode Traffic with File Handle Caching</title>
+
<conbody>
+
<p>
One scalability aspect that affects heavily loaded clusters is the load on the HDFS
- NameNode, from looking up the details as each HDFS file is opened. Impala queries
- often access many different HDFS files, for example if a query does a full table scan
- on a table with thousands of partitions, each partition containing multiple data files.
- Accessing each column of a Parquet file also involves a separate <q>open</q> call,
- further increasing the load on the NameNode. High NameNode overhead can add startup time
- (that is, increase latency) to Impala queries, and reduce overall throughput for non-Impala
- workloads that also require accessing HDFS files.
- </p>
- <p> In <keyword keyref="impala210_full"/> and higher, you can reduce
- NameNode overhead by enabling a caching feature for HDFS file handles.
- Data files that are accessed by different queries, or even multiple
- times within the same query, can be accessed without a new <q>open</q>
- call and without fetching the file details again from the NameNode. </p>
- <p>
- Because this feature only involves HDFS data files, it does not apply to non-HDFS tables,
- such as Kudu or HBase tables, or tables that store their data on cloud services such as
- S3 or ADLS. Any read operations that perform remote reads also skip the cached file handles.
- </p>
- <p> The feature is enabled by default with 20,000 file handles to be
- cached. To change the value, set the configuration option
- <codeph>max_cached_file_handles</codeph> to a non-zero value for each
- <cmdname>impalad</cmdname> daemon. From the initial default value of
- 20000, adjust upward if NameNode request load is still significant, or
- downward if it is more important to reduce the extra memory usage on
- each host. Each cache entry consumes 6 KB, meaning that caching 20,000
- file handles requires up to 120 MB on each Impala executor. The exact
- memory usage varies depending on how many file handles have actually
- been cached; memory is freed as file handles are evicted from the cache. </p>
- <p>
- If a manual HDFS operation moves a file to the HDFS Trashcan while the file handle is cached,
- Impala still accesses the contents of that file. This is a change from prior behavior. Previously,
- accessing a file that was in the trashcan would cause an error. This behavior only applies to
- non-Impala methods of removing HDFS files, not the Impala mechanisms such as <codeph>TRUNCATE TABLE</codeph>
- or <codeph>DROP TABLE</codeph>.
- </p>
- <p>
- If files are removed, replaced, or appended by HDFS operations outside of Impala, the way to bring the
- file information up to date is to run the <codeph>REFRESH</codeph> statement on the table.
- </p>
- <p>
- File handle cache entries are evicted as the cache fills up, or based on a timeout period
- when they have not been accessed for some time.
- </p>
- <p>
- To evaluate the effectiveness of file handle caching for a particular workload, issue the
- <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname> or examine query
- profiles in the Impala web UI. Look for the ratio of <codeph>CachedFileHandlesHitCount</codeph>
- (ideally, should be high) to <codeph>CachedFileHandlesMissCount</codeph> (ideally, should be low).
- Before starting any evaluation, run some representative queries to <q>warm up</q> the cache,
- because the first time each data file is accessed is always recorded as a cache miss.
+ NameNode from looking up the details as each HDFS file is opened. Impala queries often
+ access many different HDFS files. For example, a query that does a full table scan on a
+ partitioned table may need to read thousands of partitions, each partition containing
+ multiple data files. Accessing each column of a Parquet file also involves a separate
+ <q>open</q> call, further increasing the load on the NameNode. High NameNode overhead
+ can add startup time (that is, increase latency) to Impala queries, and reduce overall
+ throughput for non-Impala workloads that also require accessing HDFS files.
+ </p>
+
+ <p>
+ In <keyword keyref="impala210_full"/> and higher, you can reduce NameNode overhead by
+ enabling a caching feature for HDFS file handles. Data files that are accessed by
+ different queries, or even multiple times within the same query, can be accessed without
+ a new <q>open</q> call and without fetching the file details again from the NameNode.
+ </p>
+
+ <p>
+ In Impala 3.2 and higher, file handle caching also applies to remote HDFS file handles.
+ This is controlled by the <codeph>cache_remote_file_handles</codeph> flag for an
+ <codeph>impalad</codeph>. It is recommended that you use the default value of
+ <codeph>true</codeph> as this caching prevents your NameNode from overloading when your
+ cluster has many remote HDFS reads.
+ </p>
+
+ <p>
+ Because this feature only involves HDFS data files, it does not apply to non-HDFS
+ tables, such as Kudu or HBase tables, or tables that store their data on cloud services
+ such as S3 or ADLS.
+ </p>
+
+ <p>
+ The feature is enabled by default with 20,000 file handles to be cached. To change the
+ value, set the configuration option <codeph>max_cached_file_handles</codeph> to a
+ non-zero value for each <cmdname>impalad</cmdname> daemon. From the initial default
+ value of 20000, adjust upward if NameNode request load is still significant, or downward
+ if it is more important to reduce the extra memory usage on each host. Each cache entry
+ consumes 6 KB, meaning that caching 20,000 file handles requires up to 120 MB on each
+ Impala executor. The exact memory usage varies depending on how many file handles have
+ actually been cached; memory is freed as file handles are evicted from the cache.
+ </p>
+
+ <p>
+ If a manual HDFS operation moves a file to the HDFS Trashcan while the file handle is
+ cached, Impala still accesses the contents of that file. This is a change from prior
+ behavior. Previously, accessing a file that was in the trashcan would cause an error.
+ This behavior only applies to non-Impala methods of removing HDFS files, not the Impala
+ mechanisms such as <codeph>TRUNCATE TABLE</codeph> or <codeph>DROP TABLE</codeph>.
+ </p>
+
+ <p>
+ If files are removed, replaced, or appended by HDFS operations outside of Impala, the
+ way to bring the file information up to date is to run the <codeph>REFRESH</codeph>
+ statement on the table.
+ </p>
+
+ <p>
+ File handle cache entries are evicted as the cache fills up, or based on a timeout
+ period when they have not been accessed for some time.
+ </p>
+
+ <p>
+ To evaluate the effectiveness of file handle caching for a particular workload, issue
+ the <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname> or examine
+ query profiles in the Impala Web UI. Look for the ratio of
+ <codeph>CachedFileHandlesHitCount</codeph> (ideally, should be high) to
+ <codeph>CachedFileHandlesMissCount</codeph> (ideally, should be low). Before starting
+ any evaluation, run several representative queries to <q>warm up</q> the cache because
+ the first time each data file is accessed is always recorded as a cache miss.
+ </p>
+
+ <p>
To see metrics about file handle caching for each <cmdname>impalad</cmdname> instance,
- examine the <uicontrol>/metrics</uicontrol> page in the Impala web UI, in particular the fields
- <uicontrol>impala-server.io.mgr.cached-file-handles-miss-count</uicontrol>,
+ examine the <uicontrol>/metrics</uicontrol> page in the Impala Web UI, in particular the
+ fields <uicontrol>impala-server.io.mgr.cached-file-handles-miss-count</uicontrol>,
<uicontrol>impala-server.io.mgr.cached-file-handles-hit-count</uicontrol>, and
<uicontrol>impala-server.io.mgr.num-cached-file-handles</uicontrol>.
</p>
+
</conbody>
+
</concept>
</concept>