You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jr...@apache.org on 2017/07/10 23:09:07 UTC
[1/3] incubator-impala git commit: IMPALA-5605: [DOCS] New known
issue for upping thread resource limits
Repository: incubator-impala
Updated Branches:
refs/heads/master 717dd73d7 -> 801c32dec
IMPALA-5605: [DOCS] New known issue for upping thread resource limits
Change-Id: I4300fcb30c1bc0b1f3cd4eeeb25ad05ec4173fa6
Reviewed-on: http://gerrit.cloudera.org:8080/7348
Reviewed-by: Mostafa Mokhtar <mm...@cloudera.com>
Tested-by: Impala Public Jenkins
Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/cd443817
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/cd443817
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/cd443817
Branch: refs/heads/master
Commit: cd44381796aa9851e074991290120d0873a010ac
Parents: 717dd73
Author: John Russell <jr...@cloudera.com>
Authored: Fri Jun 30 15:23:36 2017 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Mon Jul 10 23:06:18 2017 +0000
----------------------------------------------------------------------
docs/topics/impala_known_issues.xml | 33 ++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/cd443817/docs/topics/impala_known_issues.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml
index c3f1e51..30cc245 100644
--- a/docs/topics/impala_known_issues.xml
+++ b/docs/topics/impala_known_issues.xml
@@ -564,6 +564,39 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
</conbody>
+ <concept id="IMPALA-5605">
+ <title>Configuration to prevent crashes caused by thread resource limits</title>
+ <conbody>
+ <p>
+ Impala could encounter a serious error due to resource usage under very high concurrency.
+ The error message is similar to:
+ </p>
+<codeblock><![CDATA[
+F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
+terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
+]]>
+</codeblock>
+ <p><b>Bug:</b> <xref keyref="IMPALA-5605">IMPALA-5605</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Workaround:</b>
+ To prevent such errors, configure each host running an <cmdname>impalad</cmdname>
+ daemon with the following settings:
+ </p>
+<codeblock>
+echo 2000000 > /proc/sys/kernel/threads-max
+echo 2000000 > /proc/sys/kernel/pid_max
+echo 8000000 > /proc/sys/vm/max_map_count
+</codeblock>
+ <p>
+ Add the following lines in <filepath>/etc/security/limits.conf</filepath>:
+ </p>
+<codeblock>
+impala soft nproc 262144
+impala hard nproc 262144
+</codeblock>
+ </conbody>
+ </concept>
+
<concept id="flatbuffers_mem_usage">
<title>Memory usage when compact_catalog_topic flag enabled</title>
<conbody>
[2/3] incubator-impala git commit: [DOCS] Advise setting
vm.overcommit_memory=1 in various places
Posted by jr...@apache.org.
[DOCS] Advise setting vm.overcommit_memory=1 in various places
Reusing the same advice under "Known Issues", scalability
considerations, and in the Impala + Kerberos section.
Change-Id: Icbfa755e2c9769a8458fd93362769856cf32e301
Reviewed-on: http://gerrit.cloudera.org:8080/7349
Reviewed-by: Mostafa Mokhtar <mm...@cloudera.com>
Tested-by: Impala Public Jenkins
Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/db3f3235
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/db3f3235
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/db3f3235
Branch: refs/heads/master
Commit: db3f3235d7c28ff9963e0332b830ecc86f54a505
Parents: cd44381
Author: John Russell <jr...@cloudera.com>
Authored: Fri Jun 30 16:00:30 2017 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Mon Jul 10 23:06:45 2017 +0000
----------------------------------------------------------------------
docs/shared/impala_common.xml | 32 ++++++++++++++++++++++++++++++++
docs/topics/impala_kerberos.xml | 8 ++++++++
docs/topics/impala_known_issues.xml | 11 +++++++++++
docs/topics/impala_scalability.xml | 8 ++++++++
4 files changed, 59 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/db3f3235/docs/shared/impala_common.xml
----------------------------------------------------------------------
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index 6e65c40..2ec2537 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -1365,6 +1365,38 @@ select c_first_name, c_last_name from customer where lower(trim(c_last_name)) rl
<codeph>user@example.com</codeph>, in a Kerberized environment.
</p>
+ <p id="vm_overcommit_memory_intro">
+ On a kerberized cluster with high memory utilization, <cmdname>kinit</cmdname> commands executed after
+ every <codeph>'kerberos_reinit_interval'</codeph> may cause out-of-memory errors, because executing
+ the command involves a fork of the Impala process. The error looks similar to the following:
+<codeblock><![CDATA[
+Failed to obtain Kerberos ticket for principal: <varname>principal_details</varname>
+Failed to execute shell cmd: 'kinit -k -t <varname>keytab_details</varname>',
+error was: Error(12): Cannot allocate memory
+]]>
+</codeblock>
+ </p>
+
+ <p id="vm_overcommit_memory_start">
+ The following command changes the <codeph>vm.overcommit_memory</codeph>
+ setting immediately on a running host. However, this setting is reset
+ when the host is restarted.
+<codeblock><![CDATA[
+echo 1 > /proc/sys/vm/overcommit_memory
+]]>
+</codeblock>
+ </p>
+ <p>
+ To change the setting in a persistent way, add the following line to the
+ <filepath>/etc/sysctl.conf</filepath> file:
+<codeblock><![CDATA[
+vm.overcommit_memory=1
+]]>
+</codeblock>
+ </p>
+ <p id="vm_overcommit_memory_end">
+ Then run <codeph>sysctl -p</codeph>. No reboot is needed.
+ </p>
<ul>
<li id="grant_revoke_single">
Currently, each Impala <codeph>GRANT</codeph> or <codeph>REVOKE</codeph> statement can only grant or
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/db3f3235/docs/topics/impala_kerberos.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_kerberos.xml b/docs/topics/impala_kerberos.xml
index 5032fba..5d97aeb 100644
--- a/docs/topics/impala_kerberos.xml
+++ b/docs/topics/impala_kerberos.xml
@@ -351,4 +351,12 @@ $ chown impala:impala impala-http.keytab</codeblock>
</conbody>
</concept>
+ <concept rev="IMPALA-2294" id="kerberos_overhead_memory_usage">
+ <title>Kerberos-Related Memory Overhead for Large Clusters</title>
+ <conbody>
+ <p conref="../shared/impala_common.xml#common/vm_overcommit_memory_intro"/>
+ <p conref="../shared/impala_common.xml#common/vm_overcommit_memory_start" conrefend="vm_overcommit_memory_end"/>
+ </conbody>
+ </concept>
+
</concept>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/db3f3235/docs/topics/impala_known_issues.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml
index 30cc245..51920c9 100644
--- a/docs/topics/impala_known_issues.xml
+++ b/docs/topics/impala_known_issues.xml
@@ -623,6 +623,17 @@ impala hard nproc 262144
</conbody>
</concept>
+ <concept id="IMPALA-2294">
+ <title>Kerberos initialization errors due to high memory usage</title>
+ <conbody>
+ <p conref="../shared/impala_common.xml#common/vm_overcommit_memory_intro"/>
+ <p><b>Bug:</b> <xref keyref="IMPALA-2294">IMPALA-2294</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Workaround:</b></p>
+ <p conref="../shared/impala_common.xml#common/vm_overcommit_memory_start" conrefend="vm_overcommit_memory_end"/>
+ </conbody>
+ </concept>
+
<concept id="drop_table_purge_s3a">
<title>DROP TABLE PURGE on S3A table may not delete externally written files</title>
<conbody>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/db3f3235/docs/topics/impala_scalability.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_scalability.xml b/docs/topics/impala_scalability.xml
index af61e7a..2ea7603 100644
--- a/docs/topics/impala_scalability.xml
+++ b/docs/topics/impala_scalability.xml
@@ -887,6 +887,14 @@ so other secure services might be affected temporarily.
</conbody>
</concept>
+ <concept rev="IMPALA-2294" id="kerberos_overhead_memory_usage">
+ <title>Kerberos-Related Memory Overhead for Large Clusters</title>
+ <conbody>
+ <p conref="../shared/impala_common.xml#common/vm_overcommit_memory_intro"/>
+ <p conref="../shared/impala_common.xml#common/vm_overcommit_memory_start" conrefend="vm_overcommit_memory_end"/>
+ </conbody>
+ </concept>
+
<concept id="scalability_hotspots" rev="2.5.0 IMPALA-2696">
<title>Avoiding CPU Hotspots for HDFS Cached Data</title>
<conbody>
[3/3] incubator-impala git commit: IMPALA-5583: [DOCS] Document
default_join_distribution_mode query option
Posted by jr...@apache.org.
IMPALA-5583: [DOCS] Document default_join_distribution_mode query option
New page for the query option.
Change-Id: I4ec6213efc46bce0fe07c590841d51c009fb5c84
Reviewed-on: http://gerrit.cloudera.org:8080/7300
Reviewed-by: Mostafa Mokhtar <mm...@cloudera.com>
Tested-by: Impala Public Jenkins
Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/801c32de
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/801c32de
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/801c32de
Branch: refs/heads/master
Commit: 801c32dec3914939c95c2cab07f8628dd627aef5
Parents: db3f323
Author: John Russell <jr...@cloudera.com>
Authored: Mon Jun 26 15:49:27 2017 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Mon Jul 10 23:08:12 2017 +0000
----------------------------------------------------------------------
docs/impala.ditamap | 1 +
docs/impala_keydefs.ditamap | 1 +
.../impala_default_join_distribution_mode.xml | 134 +++++++++++++++++++
3 files changed, 136 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/801c32de/docs/impala.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index 574602a..b10ddbf 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -176,6 +176,7 @@ under the License.
<topicref href="topics/impala_batch_size.xml"/>
<topicref href="topics/impala_compression_codec.xml"/>
<topicref href="topics/impala_debug_action.xml"/>
+ <topicref rev="2.9.0 IMPALA-5381" href="topics/impala_default_join_distribution_mode.xml"/>
<topicref href="topics/impala_default_order_by_limit.xml"/>
<topicref audience="hidden" href="topics/impala_disable_cached_reads.xml"/>
<topicref href="topics/impala_disable_codegen.xml"/>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/801c32de/docs/impala_keydefs.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index 7c9bb60..378a5bb 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -10749,6 +10749,7 @@ under the License.
<keydef href="topics/impala_batch_size.xml" keys="batch_size"/>
<keydef href="topics/impala_compression_codec.xml" keys="compression_codec"/>
<keydef href="topics/impala_debug_action.xml" keys="debug_action"/>
+ <keydef href="topics/impala_default_join_distribution_mode.xml" keys="default_join_distribution_mode"/>
<keydef href="topics/impala_default_order_by_limit.xml" keys="default_order_by_limit"/>
<keydef href="topics/impala_disable_cached_reads.xml" keys="disable_cached_reads"/>
<keydef href="topics/impala_disable_codegen.xml" keys="disable_codegen"/>
http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/801c32de/docs/topics/impala_default_join_distribution_mode.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_default_join_distribution_mode.xml b/docs/topics/impala_default_join_distribution_mode.xml
new file mode 100644
index 0000000..1b17d50
--- /dev/null
+++ b/docs/topics/impala_default_join_distribution_mode.xml
@@ -0,0 +1,134 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="default_join_distribution_mode" rev="2.9.0 IMPALA-5381 IMPALA-5583">
+
+ <title>DEFAULT_JOIN_DISTRIBUTION_MODE Query Option</title>
+ <titlealts audience="PDF"><navtitle>DEFAULT_JOIN_DISTRIBUTION_MODE</navtitle></titlealts>
+ <prolog>
+ <metadata>
+ <data name="Category" value="Impala"/>
+ <data name="Category" value="Impala Query Options"/>
+ <data name="Category" value="Performance"/>
+ <data name="Category" value="Querying"/>
+ <data name="Category" value="Developers"/>
+ <data name="Category" value="Data Analysts"/>
+ </metadata>
+ </prolog>
+
+ <conbody>
+
+ <p>
+ <indexterm audience="hidden">DEFAULT_JOIN_DISTRIBUTION_MODE query option</indexterm>
+ This option determines the join distribution that Impala uses when any of the tables
+ involved in a join query is missing statistics.
+ </p>
+
+ <p>
+ Impala optimizes join queries based on the presence of table statistics,
+ which are produced by the Impala <codeph>COMPUTE STATS</codeph> statement.
+ By default, when a table involved in the join query does not have statistics,
+ Impala uses the <q>broadcast</q> technique that transmits the entire contents
+ of the table to all executor nodes participating in the query. If one table
+ involved in a join has statistics and the other does not, the table without
+ statistics is broadcast. If both tables are missing statistics, the table
+ that is referenced second in the join order is broadcast. This behavior
+ is appropriate when the table involved is relatively small, but can lead to
+ excessive network, memory, and CPU overhead if the table being broadcast is
+ large.
+ </p>
+
+ <p>
+ Because Impala queries frequently involve very large tables, and suboptimal
+ joins for such tables could result in spilling or out-of-memory errors,
+ the setting <codeph>DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE</codeph> lets you
+ override the default behavior. The shuffle join mechanism divides the corresponding rows
+ of each table involved in a join query using a hashing algorithm, and transmits
+ subsets of the rows to other nodes for processing. Typically, this kind of join is
+ more efficient for joins between large tables of similar size.
+ </p>
+
+ <p>
+ The setting <codeph>DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE</codeph> is
+ recommended when setting up and deploying new clusters, because it is less likely
+ to result in serious consequences such as spilling or out-of-memory errors if
+ the query plan is based on incomplete information. This setting is not the default,
+ to avoid changing the performance characteristics of join queries for clusters that
+ are already tuned for their existing workloads.
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/type_integer"/>
+ <p>
+ The allowed values are <codeph>BROADCAST</codeph> (equivalent to 0)
+ or <codeph>SHUFFLE</codeph> (equivalent to 1).
+ </p>
+
+ <p conref="../shared/impala_common.xml#common/example_blurb"/>
+ <p>
+ The following examples demonstrate appropriate scenarios for each
+ setting of this query option.
+ </p>
+
+<codeblock>
+-- Create a billion-row table.
+create table big_table stored as parquet
+ as select * from huge_table limit 1e9;
+
+-- For a big table with no statistics, the
+-- shuffle join mechanism is appropriate.
+set default_join_distribution_mode=shuffle;
+
+...join queries involving the big table...
+</codeblock>
+
+<codeblock>
+-- Create a hundred-row table.
+create table tiny_table stored as parquet
+ as select * from huge_table limit 100;
+
+-- For a tiny table with no statistics, the
+-- broadcast join mechanism is appropriate.
+set default_join_distribution_mode=broadcast;
+
+...join queries involving the tiny table...
+</codeblock>
+
+<codeblock>
+compute stats tiny_table;
+compute stats big_table;
+
+-- Once the stats are computed, the query option has
+-- no effect on join queries involving these tables.
+-- Impala can determine the absolute and relative sizes
+-- of each side of the join query by examining the
+-- row size, cardinality, and so on of each table.
+
+...join queries involving both of these tables...
+</codeblock>
+
+ <p conref="../shared/impala_common.xml#common/related_info"/>
+ <p>
+ <xref keyref="compute_stats"/>,
+ <xref keyref="joins"/>,
+ <xref keyref="perf_joins"/>
+ </p>
+
+ </conbody>
+</concept>