You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2018/04/23 23:53:29 UTC
[1/4] impala git commit: Revert "IMPALA-6869: [DOCS] Update Known
Issues doc for 2.12"
Repository: impala
Updated Branches:
refs/heads/master da363a99a -> c84764d57
http://git-wip-us.apache.org/repos/asf/impala/blob/62eed0d5/docs/topics/impala_known_issues.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml
index 47e0c5c..a09188e 100644
--- a/docs/topics/impala_known_issues.xml
+++ b/docs/topics/impala_known_issues.xml
@@ -38,26 +38,22 @@ under the License.
<conbody>
<p>
- The following sections describe known issues and workarounds in Impala, as of the current
- production release. This page summarizes the most serious or frequently encountered issues
- in the current release, to help you make planning decisions about installing and
- upgrading. Any workarounds are listed here. The bug links take you to the Impala issues
- site, where you can see the diagnosis and whether a fix is in the pipeline.
+ The following sections describe known issues and workarounds in Impala, as of the current production release. This page summarizes the
+ most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and
+ upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and
+ whether a fix is in the pipeline.
</p>
<note>
- The online issue tracking system for Impala contains comprehensive information and is
- updated in real time. To verify whether an issue you are experiencing has already been
- reported, or which release an issue is fixed in, search on the
- <xref href="https://issues.apache.org/jira/" scope="external" format="html">issues.apache.org
- JIRA tracker</xref>.
+ The online issue tracking system for Impala contains comprehensive information and is updated in real time. To verify whether an issue
+ you are experiencing has already been reported, or which release an issue is fixed in, search on the
+ <xref href="https://issues.apache.org/jira/" scope="external" format="html">issues.apache.org JIRA tracker</xref>.
</note>
<p outputclass="toc inpage"/>
<p>
- For issues fixed in various Impala releases, see
- <xref href="impala_fixed_issues.xml#fixed_issues"/>.
+ For issues fixed in various Impala releases, see <xref href="impala_fixed_issues.xml#fixed_issues"/>.
</p>
<!-- Use as a template for new issues.
@@ -77,6 +73,62 @@ under the License.
</conbody>
+<!-- New known issues for Impala 2.3.
+
+Title: Server-to-server SSL and Kerberos do not work together
+Description: If server<->server SSL is enabled (with ssl_client_ca_certificate), and Kerberos auth is used between servers, the cluster will fail to start.
+Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2598
+Severity: Medium. Server-to-server SSL is practically unusable but this is a new feature.
+Workaround: No known workaround.
+
+Title: Queries may hang on server-to-server exchange errors
+Description: The DataStreamSender::Channel::CloseInternal() does not close the channel on an error. This will cause the node on the other side of the channel to wait indefinitely causing a hang.
+Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2592
+Severity: Low. This does not occur frequently.
+Workaround: No known workaround.
+
+Title: Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats
+Description: Incremental stats use up about 400 bytes per partition X column. So for a table with 20K partitions and 100 columns this is about 800 MB. When serialized this goes past the 2 GB Java array size limit and leads to a catalog crash.
+Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2648, IMPALA-2647, IMPALA-2649.
+Severity: Low. This does not occur frequently.
+Workaround: Reduce the number of partitions.
+
+More from the JIRA report of blocker/critical issues:
+
+IMPALA-2093
+Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate
+IMPALA-1652
+Incorrect results with basic predicate on CHAR typed column.
+IMPALA-1459
+Incorrect assignment of predicates through an outer join in an inline view.
+IMPALA-2665
+Incorrect assignment of On-clause predicate inside inline view with an outer join.
+IMPALA-2603
+Crash: impala::Coordinator::ValidateCollectionSlots
+IMPALA-2375
+Fix issues with the legacy join and agg nodes using enable_partitioned_hash_join=false and enable_partitioned_aggregation=false
+IMPALA-1862
+Invalid bool value not reported as a scanner error
+IMPALA-1792
+ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)
+IMPALA-1578
+Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block
+IMPALA-2643
+Duplicated column in inline view causes dropping null slots during scan
+IMPALA-2005
+A failed CTAS does not drop the table if the insert fails.
+IMPALA-1821
+Casting scenarios with invalid/inconsistent results
+
+Another list from Alex, of correctness problems with predicates; might overlap with ones I already have:
+
+https://issues.apache.org/jira/browse/IMPALA-2665 - Already have
+https://issues.apache.org/jira/browse/IMPALA-2643 - Already have
+https://issues.apache.org/jira/browse/IMPALA-1459 - Already have
+https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
+
+-->
+
<concept id="known_issues_startup">
<title>Impala Known Issues: Startup</title>
@@ -84,60 +136,42 @@ under the License.
<conbody>
<p>
- These issues can prevent one or more Impala-related daemons from starting properly.
+ These issues can prevent one or more Impala-related daemons
+ from starting properly.
</p>
</conbody>
<concept id="IMPALA-4978">
-
<title id="IMPALA-5253">Problem retrieving FQDN causes startup problem on kerberized clusters</title>
-
<conbody>
-
<p>
The method Impala uses to retrieve the host name while constructing the Kerberos
- principal is the <codeph>gethostname()</codeph> system call. This function might not
- always return the fully qualified domain name, depending on the network configuration.
- If the daemons cannot determine the FQDN, Impala does not start on a kerberized
- cluster.
+ principal is the <codeph>gethostname()</codeph> system call. This function might
+ not always return the fully qualified domain name, depending on the network
+ configuration. If the daemons cannot determine the FQDN, Impala does not start
+ on a kerberized cluster.
</p>
-
<p>
This problem might occur immediately after an upgrade of a CDH cluster, due to changes
- in Cloudera Manager that supplies the <codeph>--hostname</codeph> flag automatically
- to the Impala-related daemons. (See the issue <q>hostname parameter is not passed to
- Impala catalog role</q> at
- <xref href="https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_rn_known_issues.html" scope="external" format="html">the
- Cloudera Manager Known Issues page</xref>.)
- </p>
-
- <p>
- <b>Bugs:</b> <xref keyref="IMPALA-4978">IMPALA-4978</xref>,
- <xref keyref="IMPALA-5253">IMPALA-5253</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
+ in Cloudera Manager that supplies the <codeph>--hostname</codeph> flag automatically to
+ the Impala-related daemons. (See the issue <q>hostname parameter is not passed to Impala catalog role</q>
+ at <xref href="https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_rn_known_issues.html" scope="external" format="html">the Cloudera Manager Known Issues page</xref>.)
</p>
-
- <p>
- <b>Resolution:</b> The issue is expected to occur less frequently on systems with
- fixes for <xref keyref="IMPALA-4978">IMPALA-4978</xref>,
- <xref keyref="IMPALA-5253">IMPALA-5253</xref>, or both. Even on systems with fixes for
- both of these issues, the workaround might still be required in some cases.
+ <p><b>Bugs:</b> <xref keyref="IMPALA-4978">IMPALA-4978</xref>, <xref keyref="IMPALA-5253">IMPALA-5253</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Resolution:</b> The issue is expected to occur less frequently on systems
+ with fixes for <xref keyref="IMPALA-4978">IMPALA-4978</xref>, <xref keyref="IMPALA-5253">IMPALA-5253</xref>,
+ or both. Even on systems with fixes for both of these issues, the workaround might still
+ be required in some cases.
</p>
-
- <p>
- <b>Workaround:</b> Test if a host is affected by checking whether the output of the
- <cmdname>hostname</cmdname> command includes the FQDN. On hosts where
- <cmdname>hostname</cmdname> only returns the short name, pass the command-line flag
- <codeph>--hostname=<varname>fully_qualified_domain_name</varname></codeph> in the
- startup options of all Impala-related daemons.
+ <p><b>Workaround:</b> Test if a host is affected by checking whether the output of the
+ <cmdname>hostname</cmdname> command includes the FQDN. On hosts where <cmdname>hostname</cmdname>
+ only returns the short name, pass the command-line flag
+ <codeph>--hostname=<varname>fully_qualified_domain_name</varname></codeph>
+ in the startup options of all Impala-related daemons.
</p>
-
</conbody>
-
</concept>
</concept>
@@ -154,100 +188,23 @@ under the License.
</conbody>
- <concept id="impala-6841">
-
- <title>Unable to view large catalog objects in catalogd Web UI</title>
-
- <conbody>
-
- <p>
- In <codeph>catalogd</codeph> Web UI, you can list metadata objects and view their
- details. These details are accessed via a link and printed to a string formatted using
- thrift's <codeph>DebugProtocol</codeph>. Printing large objects (> 1 GB) in Web UI can
- crash <codeph>catalogd</codeph>.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-6841">IMPALA-6841</xref>
- </p>
-
- </conbody>
-
- </concept>
-
- <concept id="impala-6389">
-
- <title><b>Crash when querying tables with "\0" as a row delimiter</b></title>
-
- <conbody>
-
- <p>
- When querying a textfile-based Impala table that uses <codeph>\0</codeph> as a new
- line separator, Impala crashes.
- </p>
-
- <p>
- The following sequence causes <codeph>impalad</codeph> to crash:
- </p>
-
-<pre>create table tab_separated(id bigint, s string, n int, t timestamp, b boolean)
- row format delimited
- fields terminated by '\t' escaped by '\\' lines terminated by '\000'
- stored as textfile;
-select * from tab_separated; -- Done. 0 results.
-insert into tab_separated (id, s) values (100, ''); -- Success.
-select * from tab_separated; -- 20 second delay before getting "Cancelled due to unreachable impalad(s): xxxx:22000"</pre>
-
- <p>
- <b>Bug:</b>
- <xref keyref="IMPALA-6389" scope="external" format="html"
- >IMPALA-6389</xref>
- </p>
-
- <p>
- <b>Workaround:</b> Use an alternative delimiter, e.g. <codeph>\001</codeph>.
- </p>
-
- </conbody>
-
- </concept>
-
<concept id="IMPALA-4828">
-
<title>Altering Kudu table schema outside of Impala may result in crash on read</title>
-
<conbody>
-
- <p>
- Creating a table in Impala, changing the column schema outside of Impala, and then
- reading again in Impala may result in a crash. Neither Impala nor the Kudu client
- validates the schema immediately before reading, so Impala may attempt to dereference
- pointers that aren't there. This happens if a string column is dropped and then a new,
- non-string column is added with the old string column's name.
- </p>
-
- <p>
- <b>Bug:</b>
- <xref keyref="IMPALA-4828" scope="external" format="html">IMPALA-4828</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
<p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala290"/>.
+ Creating a table in Impala, changing the column schema outside of Impala,
+ and then reading again in Impala may result in a crash. Neither Impala nor
+ the Kudu client validates the schema immediately before reading, so Impala may attempt to
+ dereference pointers that aren't there. This happens if a string column is dropped
+ and then a new, non-string column is added with the old string column's name.
</p>
-
- <p>
- <b>Workaround:</b> Run the statement <codeph>REFRESH
- <varname>table_name</varname></codeph> after any occasion when the table structure,
- such as the number, names, and data types of columns, are modified outside of Impala
- using the Kudu API.
+ <p><b>Bug:</b> <xref keyref="IMPALA-4828" scope="external" format="html">IMPALA-4828</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Workaround:</b> Run the statement <codeph>REFRESH <varname>table_name</varname></codeph>
+ after any occasion when the table structure, such as the number, names, and data types
+ of columns, are modified outside of Impala using the Kudu API.
</p>
-
</conbody>
-
</concept>
<concept id="IMPALA-1972" rev="IMPALA-1972">
@@ -257,9 +214,10 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- Trying to get the details of a query through the debug web page while the query is
- planning will block new queries that had not started when the web page was requested.
- The web UI becomes unresponsive until the planning phase is finished.
+ Trying to get the details of a query through the debug web page
+ while the query is planning will block new queries that had not
+ started when the web page was requested. The web UI becomes
+ unresponsive until the planning phase is finished.
</p>
<p>
@@ -270,44 +228,22 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<b>Severity:</b> High
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala290"/>.
- </p>
-
</conbody>
-
</concept>
<concept id="IMPALA-4595">
-
<title>Linking IR UDF module to main module crashes Impala</title>
-
<conbody>
-
<p>
- A UDF compiled as an LLVM module (<codeph>.ll</codeph>) could cause a crash when
- executed.
+ A UDF compiled as an LLVM module (<codeph>.ll</codeph>) could cause a crash
+ when executed.
</p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-4595">IMPALA-4595</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.
- </p>
-
- <p>
- <b>Workaround:</b> Compile the external UDFs to a <codeph>.so</codeph> library instead
- of a <codeph>.ll</codeph> IR module.
- </p>
-
+ <p><b>Bug:</b> <xref keyref="IMPALA-4595">IMPALA-4595</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
+ <p><b>Workaround:</b> Compile the external UDFs to a <codeph>.so</codeph> library instead of a
+ <codeph>.ll</codeph> IR module.</p>
</conbody>
-
</concept>
<concept id="IMPALA-3069" rev="IMPALA-3069">
@@ -317,9 +253,8 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- Using a value in the millions for the <codeph>BATCH_SIZE</codeph> query option,
- together with wide rows or large string values in columns, could cause a memory
- allocation of more than 2 GB resulting in a crash.
+ Using a value in the millions for the <codeph>BATCH_SIZE</codeph> query option, together with wide rows or large string values in
+ columns, could cause a memory allocation of more than 2 GB resulting in a crash.
</p>
<p>
@@ -330,9 +265,7 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<b>Severity:</b> High
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.</p>
</conbody>
@@ -345,8 +278,7 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- Malformed Avro data, such as out-of-bounds integers or values in the wrong format,
- could cause a crash when queried.
+ Malformed Avro data, such as out-of-bounds integers or values in the wrong format, could cause a crash when queried.
</p>
<p>
@@ -357,10 +289,7 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<b>Severity:</b> High
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala270"/> and
- <keyword keyref="impala262"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/> and <keyword keyref="impala262"/>.</p>
</conbody>
@@ -373,9 +302,8 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- The <codeph>DataStreamSender::Channel::CloseInternal()</codeph> does not close the
- channel on an error. This causes the node on the other side of the channel to wait
- indefinitely, causing a hang.
+ The <codeph>DataStreamSender::Channel::CloseInternal()</codeph> does not close the channel on an error. This causes the node on
+ the other side of the channel to wait indefinitely, causing a hang.
</p>
<p>
@@ -397,18 +325,15 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala
- <codeph>CREATE FUNCTION</codeph> statement is issued, the <cmdname>impalad</cmdname>
- daemon crashes.
+ If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala <codeph>CREATE FUNCTION</codeph> statement is
+ issued, the <cmdname>impalad</cmdname> daemon crashes.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2365">IMPALA-2365</xref>
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
</conbody>
@@ -428,94 +353,30 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
</conbody>
- <concept id="impala-6671">
-
- <title>Metadata operations block read-only operations on unrelated tables</title>
-
- <conbody>
-
- <p>
- Metadata operations that change the state of a table, like <codeph>COMPUTE
- STATS</codeph> or <codeph>ALTER RECOVER PARTITIONS</codeph>, may delay metadata
- propagation of unrelated unloaded tables triggered by statements like
- <codeph>DESCRIBE</codeph> or <codeph>SELECT</codeph> queries.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-6671">IMPALA-6671</xref>
- </p>
-
- </conbody>
-
- </concept>
-
- <concept id="impala-5200">
-
- <title>Profile timers not updated during long-running sort</title>
-
- <conbody>
-
- <p>
- If you have a query plan with a long-running sort operation, e.g. minutes, the profile
- timers are not updated to reflect the time spent in the sort until the sort starts
- returning rows.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-5200">IMPALA-5200</xref>
- </p>
-
- <p>
- <b>Workaround:</b> Slow sorts can be identified by looking at "Peak Mem" in the
- summary or "PeakMemoryUsage" in the profile. If a sort is consuming multiple GB of
- memory per host, it will likely spend a significant amount of time sorting the data.
- </p>
-
- </conbody>
-
- </concept>
-
<concept id="IMPALA-3316">
-
<title>Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true</title>
-
<conbody>
-
<p>
- The configuration setting
- <codeph>convert_legacy_hive_parquet_utc_timestamps=true</codeph> uses an underlying
- function that can be a bottleneck on high volume, highly concurrent queries due to the
- use of a global lock while loading time zone information. This bottleneck can cause
- slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount of
- slowdown depends on factors such as the number of cores and number of threads involved
- in the query.
+ The configuration setting <codeph>convert_legacy_hive_parquet_utc_timestamps=true</codeph>
+ uses an underlying function that can be a bottleneck on high volume, highly concurrent
+ queries due to the use of a global lock while loading time zone information. This bottleneck
+ can cause slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount
+ of slowdown depends on factors such as the number of cores and number of threads involved in the query.
</p>
-
<note>
<p>
- The slowdown only occurs when accessing <codeph>TIMESTAMP</codeph> columns within
- Parquet files that were generated by Hive, and therefore require the on-the-fly
- timezone conversion processing.
+ The slowdown only occurs when accessing <codeph>TIMESTAMP</codeph> columns within Parquet files that
+ were generated by Hive, and therefore require the on-the-fly timezone conversion processing.
</p>
</note>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-3316">IMPALA-3316</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Workaround:</b> If the <codeph>TIMESTAMP</codeph> values stored in the table
- represent dates only, with no time portion, consider storing them as strings in
- <codeph>yyyy-MM-dd</codeph> format. Impala implicitly converts such string values to
- <codeph>TIMESTAMP</codeph> in calls to date/time functions.
+ <p><b>Bug:</b> <xref keyref="IMPALA-3316">IMPALA-3316</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Workaround:</b> If the <codeph>TIMESTAMP</codeph> values stored in the table represent dates only,
+ with no time portion, consider storing them as strings in <codeph>yyyy-MM-dd</codeph> format.
+ Impala implicitly converts such string values to <codeph>TIMESTAMP</codeph> in calls to date/time
+ functions.
</p>
-
</conbody>
-
</concept>
<concept id="IMPALA-1480" rev="IMPALA-1480">
@@ -538,37 +399,31 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<b>Workaround:</b> Run the DDL statement in Hive if the slowness is an issue.
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
</conbody>
</concept>
<concept id="ki_file_handle_cache">
-
<title>Interaction of File Handle Cache with HDFS Appends and Short-Circuit Reads</title>
-
<conbody>
-
<p>
- If a data file used by Impala is being continuously appended or overwritten in place
- by an HDFS mechanism, such as <cmdname>hdfs dfs -appendToFile</cmdname>, interaction
- with the file handle caching feature in <keyword keyref="impala210_full"/> and higher
- could cause short-circuit reads to sometimes be disabled on some DataNodes. When a
- mismatch is detected between the cached file handle and a data block that was
- rewritten because of an append, short-circuit reads are turned off on the affected
- host for a 10-minute period.
+ If a data file used by Impala is being continuously appended or
+ overwritten in place by an HDFS mechanism, such as <cmdname>hdfs dfs
+ -appendToFile</cmdname>, interaction with the file handle caching
+ feature in <keyword keyref="impala210_full"/> and higher could cause
+ short-circuit reads to sometimes be disabled on some DataNodes. When a
+ mismatch is detected between the cached file handle and a data block
+ that was rewritten because of an append, short-circuit reads are
+ turned off on the affected host for a 10-minute period.
</p>
-
<p>
- The possibility of encountering such an issue is the reason why the file handle
- caching feature is currently turned off by default. See
- <xref keyref="scalability_file_handle_cache"/> for information about this feature and
- how to enable it.
+ The possibility of encountering such an issue is the reason why the
+ file handle caching feature is currently turned off by default. See
+ <xref keyref="scalability_file_handle_cache"/> for information about
+ this feature and how to enable it.
</p>
-
<p>
<b>Bug:</b>
<xref href="https://issues.apache.org/jira/browse/HDFS-12528"
@@ -579,29 +434,31 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<b>Severity:</b> High
</p>
- <p>
- <b>Workaround:</b> Verify whether your ETL process is susceptible to this issue before
- enabling the file handle caching feature. You can set the <cmdname>impalad</cmdname>
- configuration option <codeph>unused_file_handle_timeout_sec</codeph> to a time period
+ <p><b>Workaround:</b> Verify whether your ETL process is susceptible to
+ this issue before enabling the file handle caching feature. You can
+ set the <cmdname>impalad</cmdname> configuration option
+ <codeph>unused_file_handle_timeout_sec</codeph> to a time period
that is shorter than the HDFS setting
- <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>. (Keep in mind
- that the HDFS setting is in milliseconds while the Impala setting is in seconds.)
+ <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>.
+ (Keep in mind that the HDFS setting is in milliseconds while the
+ Impala setting is in seconds.)
</p>
<p>
- <b>Resolution:</b> Fixed in HDFS 2.10 and higher. Use the new HDFS parameter
- <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to specify the amount of
- time that short circuit reads are disabled on encountering an error. The default value
- is 10 minutes (<codeph>600</codeph> seconds). It is recommended that you set
- <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to a small value, such as
- <codeph>1</codeph> second, when using the file handle cache. Setting <codeph>
- dfs.domain.socket.disable.interval.seconds</codeph> to <codeph>0</codeph> is not
- recommended as a non-zero interval protects the system if there is a persistent
- problem with short circuit reads.
+ <b>Resolution:</b> Fixed in HDFS 2.10 and higher. Use the new HDFS
+ parameter <codeph>dfs.domain.socket.disable.interval.seconds</codeph>
+ to specify the amount of time that short circuit reads are disabled on
+ encountering an error. The default value is 10 minutes
+ (<codeph>600</codeph> seconds). It is recommended that you set
+ <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to a
+ small value, such as <codeph>1</codeph> second, when using the file
+ handle cache. Setting <codeph>
+ dfs.domain.socket.disable.interval.seconds</codeph> to
+ <codeph>0</codeph> is not recommended as a non-zero interval
+ protects the system if there is a persistent problem with short
+ circuit reads.
</p>
-
</conbody>
-
</concept>
</concept>
@@ -613,41 +470,24 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- These issues affect the convenience of interacting directly with Impala, typically
- through the Impala shell or Hue.
+ These issues affect the convenience of interacting directly with Impala, typically through the Impala shell or Hue.
</p>
</conbody>
<concept id="IMPALA-4570">
-
<title>Impala shell tarball is not usable on systems with setuptools versions where '0.7' is a substring of the full version string</title>
-
<conbody>
-
<p>
For example, this issue could occur on a system using setuptools version 20.7.0.
</p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-4570">IMPALA-4570</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.
- </p>
-
- <p>
- <b>Workaround:</b> Change to a setuptools version that does not have
- <codeph>0.7</codeph> as a substring.
+ <p><b>Bug:</b> <xref keyref="IMPALA-4570">IMPALA-4570</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
+ <p><b>Workaround:</b> Change to a setuptools version that does not have <codeph>0.7</codeph> as
+ a substring.
</p>
-
</conbody>
-
</concept>
<concept id="IMPALA-3133" rev="IMPALA-3133">
@@ -657,10 +497,9 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- Due to a timing condition in updating cached policy data from Sentry, the
- <codeph>SHOW</codeph> statements for Sentry roles could sometimes display out-of-date
- role settings. Because Impala rechecks authorization for each SQL statement, this
- discrepancy does not represent a security issue for other statements.
+ Due to a timing condition in updating cached policy data from Sentry, the <codeph>SHOW</codeph> statements for Sentry roles could
+ sometimes display out-of-date role settings. Because Impala rechecks authorization for each SQL statement, this discrepancy does
+ not represent a security issue for other statements.
</p>
<p>
@@ -672,10 +511,11 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
</p>
<p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala260"/> and
- <keyword keyref="impala251"/>.
+ <b>Resolution:</b> Fixes have been issued for some but not all Impala releases. Check the JIRA for details of fix releases.
</p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala260"/> and <keyword keyref="impala251"/>.</p>
+
</conbody>
</concept>
@@ -687,8 +527,7 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- Simple <codeph>SELECT</codeph> queries show less than 100% progress even though they
- are already completed.
+ Simple <codeph>SELECT</codeph> queries show less than 100% progress even though they are already completed.
</p>
<p>
@@ -708,11 +547,8 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<p conref="../shared/impala_common.xml#common/int_overflow_behavior" />
<p>
- <b>Bug:</b> <xref keyref="IMPALA-3123">IMPALA-3123</xref>
- </p>
-
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala260"/>.
+ <b>Bug:</b>
+ <xref keyref="IMPALA-3123">IMPALA-3123</xref>
</p>
</conbody>
@@ -728,8 +564,8 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- These issues affect applications that use the JDBC or ODBC APIs, such as business
- intelligence tools or custom-written applications in languages such as Java or C++.
+ These issues affect applications that use the JDBC or ODBC APIs, such as business intelligence tools or custom-written applications
+ in languages such as Java or C++.
</p>
</conbody>
@@ -743,9 +579,8 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- If the ODBC <codeph>SQLGetData</codeph> is called on a series of columns, the function
- calls must follow the same order as the columns. For example, if data is fetched from
- column 2 then column 1, the <codeph>SQLGetData</codeph> call for column 1 returns
+ If the ODBC <codeph>SQLGetData</codeph> is called on a series of columns, the function calls must follow the same order as the
+ columns. For example, if data is fetched from column 2 then column 1, the <codeph>SQLGetData</codeph> call for column 1 returns
<codeph>NULL</codeph>.
</p>
@@ -770,78 +605,31 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- These issues relate to security features, such as Kerberos authentication, Sentry
- authorization, encryption, auditing, and redaction.
+ These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and
+ redaction.
</p>
</conbody>
- <concept id="impala-4712">
-
- <title>Transient kerberos authentication error during table loading</title>
-
- <conbody>
-
- <p>
- A transient Kerberos error can cause a table to get into a bad state with an error:
- <codeph>Failed to load metadata for table</codeph>.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-4712">IMPALA-4712</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Workaround:</b> Resolve the Kerberos authentication problem and run
- <codeph>INVALIDATE METADATA</codeph> on the affected table.
- </p>
-
- </conbody>
-
- </concept>
-
<concept id="IMPALA-5638">
-
<title>Malicious user can gain unauthorized access to Kudu table data via Impala</title>
-
<conbody>
-
- <p>
- A malicious user with <codeph>ALTER</codeph> permissions on an Impala table can access
- any other Kudu table data by altering the table properties to make it <q>external</q>
- and then changing the underlying table mapping to point to other Kudu tables. This
- violates and works around the authorization requirement that creating a Kudu external
- table via Impala requires an <codeph>ALL</codeph> privilege at the server scope. This
- privilege requirement for <codeph>CREATE</codeph> commands is enforced to precisely
- avoid this scenario where a malicious user can change the underlying Kudu table
- mapping. The fix is to enforce the same privilege requirement for
- <codeph>ALTER</codeph> commands that would make existing non-external Kudu tables
- external.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-5638">IMPALA-5638</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
<p>
- <b>Workaround:</b> A temporary workaround is to revoke <codeph>ALTER</codeph>
- permissions on Impala tables.
+ A malicious user with <codeph>ALTER</codeph> permissions on an Impala table can access any
+ other Kudu table data by altering the table properties to make it <q>external</q>
+ and then changing the underlying table mapping to point to other Kudu tables.
+ This violates and works around the authorization requirement that creating a
+ Kudu external table via Impala requires an <codeph>ALL</codeph> privilege at the server scope.
+ This privilege requirement for <codeph>CREATE</codeph> commands is enforced to precisely avoid
+ this scenario where a malicious user can change the underlying Kudu table
+ mapping. The fix is to enforce the same privilege requirement for <codeph>ALTER</codeph>
+ commands that would make existing non-external Kudu tables external.
</p>
-
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala2100"/>.
- </p>
-
+ <p><b>Bug:</b> <xref keyref="IMPALA-5638">IMPALA-5638</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Workaround:</b> A temporary workaround is to revoke <codeph>ALTER</codeph> permissions on Impala tables.</p>
+ <p><b>Resolution:</b> Upgrade to an Impala version containing the fix for <xref keyref="IMPALA-5638">IMPALA-5638</xref>.</p>
</conbody>
-
</concept>
<concept id="renewable_kerberos_tickets">
@@ -853,13 +641,12 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- In a Kerberos environment, the <cmdname>impalad</cmdname> daemon might not start if
- Kerberos tickets are not renewable.
+ In a Kerberos environment, the <cmdname>impalad</cmdname> daemon might not start if Kerberos tickets are not renewable.
</p>
<p>
- <b>Workaround:</b> Configure your KDC to allow tickets to be renewed, and configure
- <filepath>krb5.conf</filepath> to request renewable tickets.
+ <b>Workaround:</b> Configure your KDC to allow tickets to be renewed, and configure <filepath>krb5.conf</filepath> to request
+ renewable tickets.
</p>
</conbody>
@@ -898,38 +685,22 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
</concept>
- <concept id="impala-6726">
+<!--
+ <concept id="known_issues_supportability">
- <title>Catalog server's kerberos ticket gets deleted after 'ticket_lifetime' on SLES11</title>
+ <title id="ki_supportability">Impala Known Issues: Supportability</title>
<conbody>
<p>
- On SLES11, after 'ticket_lifetime', the kerberos ticket gets deleted by the Java krb5
- library.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-6726"/>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Workaround:</b> On Impala 2.11.0, set <codeph>--use_kudu_kinit=false</codeph> in
- Impala startup flag.
- </p>
-
- <p>
- On Impala 2.12.0, set <codeph>--use_kudu_kinit=false</codeph> and
- <codeph>--use_krpc=false</codeph> in Impala startup flags.
+ These issues affect the ability to debug and troubleshoot Impala, such as incorrect output in query profiles or the query state
+ shown in monitoring applications.
</p>
</conbody>
</concept>
+-->
<concept id="known_issues_resources">
@@ -938,156 +709,92 @@ select * from tab_separated; -- 20 second delay before getting "Cancelled due to
<conbody>
<p>
- These issues involve memory or disk usage, including out-of-memory conditions, the
- spill-to-disk feature, and resource management features.
+ These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management
+ features.
</p>
</conbody>
<concept id="IMPALA-5605">
-
<title>Configuration to prevent crashes caused by thread resource limits</title>
-
<conbody>
-
<p>
- Impala could encounter a serious error due to resource usage under very high
- concurrency. The error message is similar to:
+ Impala could encounter a serious error due to resource usage under very high concurrency.
+ The error message is similar to:
</p>
-
<codeblock><![CDATA[
F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
]]>
</codeblock>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-5605">IMPALA-5605</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Workaround:</b> To prevent such errors, configure each host running an
- <cmdname>impalad</cmdname> daemon with the following settings:
+ <p><b>Bug:</b> <xref keyref="IMPALA-5605">IMPALA-5605</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Workaround:</b>
+ To prevent such errors, configure each host running an <cmdname>impalad</cmdname>
+ daemon with the following settings:
</p>
-
<codeblock>
echo 2000000 > /proc/sys/kernel/threads-max
echo 2000000 > /proc/sys/kernel/pid_max
echo 8000000 > /proc/sys/vm/max_map_count
</codeblock>
-
<p>
- Add the following lines in <filepath>/etc/security/limits.conf</filepath>:
+ Add the following lines in <filepath>/etc/security/limits.conf</filepath>:
</p>
-
<codeblock>
impala soft nproc 262144
impala hard nproc 262144
</codeblock>
-
</conbody>
-
</concept>
<concept id="flatbuffers_mem_usage">
-
<title>Memory usage when compact_catalog_topic flag enabled</title>
-
<conbody>
-
<p>
- The efficiency improvement from <xref keyref="IMPALA-4029">IMPALA-4029</xref> can
- cause an increase in size of the updates to Impala catalog metadata that are broadcast
- to the <cmdname>impalad</cmdname> daemons by the <cmdname>statestored</cmdname>
- daemon. The increase in catalog update topic size results in higher CPU and network
+ The efficiency improvement from <xref keyref="IMPALA-4029">IMPALA-4029</xref>
+ can cause an increase in size of the updates to Impala catalog metadata
+ that are broadcast to the <cmdname>impalad</cmdname> daemons
+ by the <cmdname>statestored</cmdname> daemon.
+ The increase in catalog update topic size results in higher CPU and network
utilization. By default, the increase in topic size is about 5-7%. If the
- <codeph>compact_catalog_topic</codeph> flag is used, the size increase is more
- substantial, with a topic size approximately twice as large as in previous versions.
+ <codeph>compact_catalog_topic</codeph> flag is used, the
+ size increase is more substantial, with a topic size approximately twice as
+ large as in previous versions.
</p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-5500">IMPALA-5500</xref>
- </p>
-
- <p>
- <b>Severity:</b> Medium
- </p>
-
+ <p><b>Bug:</b> <xref keyref="IMPALA-5500">IMPALA-5500</xref></p>
+ <p><b>Severity:</b> Medium</p>
<p>
- <b>Workaround:</b> Consider setting the <codeph>compact_catalog_topic</codeph>
- configuration setting to <codeph>false</codeph> until this issue is resolved.
- </p>
-
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala210"/>.
- </p>
-
+ <b>Workaround:</b> Consider setting the
+ <codeph>compact_catalog_topic</codeph> configuration setting to
+ <codeph>false</codeph> until this issue is resolved. </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala210"/>.</p>
</conbody>
-
</concept>
<concept id="IMPALA-2294">
-
<title>Kerberos initialization errors due to high memory usage</title>
-
<conbody>
-
<p conref="../shared/impala_common.xml#common/vm_overcommit_memory_intro"/>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-2294">IMPALA-2294</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala211"/>.
- </p>
-
- <p>
- <b>Workaround:</b>
- </p>
-
+ <p><b>Bug:</b> <xref keyref="IMPALA-2294">IMPALA-2294</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Workaround:</b></p>
<p conref="../shared/impala_common.xml#common/vm_overcommit_memory_start" conrefend="../shared/impala_common.xml#common/vm_overcommit_memory_end"/>
-
</conbody>
-
</concept>
<concept id="drop_table_purge_s3a">
-
<title>DROP TABLE PURGE on S3A table may not delete externally written files</title>
-
<conbody>
-
<p>
- A <codeph>DROP TABLE PURGE</codeph> statement against an S3 table could leave the data
- files behind, if the table directory and the data files were created with a
- combination of <cmdname>hadoop fs</cmdname> and <cmdname>aws s3</cmdname> commands.
+ A <codeph>DROP TABLE PURGE</codeph> statement against an S3 table could leave the data files
+ behind, if the table directory and the data files were created with a combination of
+ <cmdname>hadoop fs</cmdname> and <cmdname>aws s3</cmdname> commands.
</p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-3558">IMPALA-3558</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Resolution:</b> The underlying issue with the S3A connector depends on the
- resolution of
- <xref href="https://issues.apache.org/jira/browse/HADOOP-13230" format="html" scope="external">HADOOP-13230</xref>.
- </p>
-
+ <p><b>Bug:</b> <xref keyref="IMPALA-3558">IMPALA-3558</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Resolution:</b> The underlying issue with the S3A connector depends on the resolution of <xref href="https://issues.apache.org/jira/browse/HADOOP-13230" format="html" scope="external">HADOOP-13230</xref>.</p>
</conbody>
-
</concept>
<concept id="catalogd_heap">
@@ -1097,30 +804,27 @@ impala hard nproc 262144
<conbody>
<p>
- The default heap size for Impala <cmdname>catalogd</cmdname> has changed in
- <keyword keyref="impala25_full"/> and higher:
+ The default heap size for Impala <cmdname>catalogd</cmdname> has changed in <keyword keyref="impala25_full"/> and higher:
</p>
<ul>
<li>
<p>
- Previously, by default <cmdname>catalogd</cmdname> was using the JVM's default
- heap size, which is the smaller of 1/4th of the physical memory or 32 GB.
+ Previously, by default <cmdname>catalogd</cmdname> was using the JVM's default heap size, which is the smaller of 1/4th of the
+ physical memory or 32 GB.
</p>
</li>
<li>
<p>
- Starting with <keyword keyref="impala250"/>, the default
- <cmdname>catalogd</cmdname> heap size is 4 GB.
+ Starting with <keyword keyref="impala250"/>, the default <cmdname>catalogd</cmdname> heap size is 4 GB.
</p>
</li>
</ul>
<p>
- For example, on a host with 128GB physical memory this will result in catalogd heap
- decreasing from 32GB to 4GB. This can result in out-of-memory errors in catalogd and
- leading to query failures.
+ For example, on a host with 128GB physical memory this will result in catalogd heap decreasing from 32GB to 4GB. This can result
+ in out-of-memory errors in catalogd and leading to query failures.
</p>
<p>
@@ -1129,6 +833,9 @@ impala hard nproc 262144
<p>
<b>Workaround:</b> Increase the <cmdname>catalogd</cmdname> memory limit as follows.
+<!-- See <xref href="impala_scalability.xml#scalability_catalog"/> for the procedure. -->
+<!-- Including full details here via conref, for benefit of PDF readers or anyone else
+ who might have trouble seeing or following the link. -->
</p>
<p conref="../shared/impala_common.xml#common/increase_catalogd_heap_size"/>
@@ -1144,9 +851,8 @@ impala hard nproc 262144
<conbody>
<p>
- The size of the breakpad minidump files grows linearly with the number of threads. By
- default, each thread adds 8 KB to the minidump size. Minidump files could consume
- significant disk space when the daemons have a high number of threads.
+ The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the
+ minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.
</p>
<p>
@@ -1158,13 +864,11 @@ impala hard nproc 262144
</p>
<p>
- <b>Workaround:</b> Add
- <codeph>--minidump_size_limit_hint_kb=<varname>size</varname></codeph> to set a soft
- upper limit on the size of each minidump file. If the minidump file would exceed that
- limit, Impala reduces the amount of information for each thread from 8 KB to 2 KB.
- (Full thread information is captured for the first 20 threads, then 2 KB per thread
- after that.) The minidump file can still grow larger than the <q>hinted</q> size. For
- example, if you have 10,000 threads, the minidump file can be more than 20 MB.
+ <b>Workaround:</b> Add <codeph>--minidump_size_limit_hint_kb=<varname>size</varname></codeph> to set a soft upper limit on the
+ size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread
+ from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump
+ file can still grow larger than the <q>hinted</q> size. For example, if you have 10,000 threads, the minidump file can be more
+ than 20 MB.
</p>
</conbody>
@@ -1178,16 +882,14 @@ impala hard nproc 262144
<conbody>
<p>
- The initial release of <keyword keyref="impala26_full"/> sometimes has a higher peak
- memory usage than in previous releases while reading Parquet files.
+ The initial release of <keyword keyref="impala26_full"/> sometimes has a higher peak memory usage than in previous releases while reading
+ Parquet files.
</p>
<p>
- <keyword keyref="impala26_full"/> addresses the issue IMPALA-2736, which improves the
- efficiency of Parquet scans by up to 2x. The faster scans may result in a higher peak
- memory consumption compared to earlier versions of Impala due to the new column-wise
- row materialization strategy. You are likely to experience higher memory consumption
- in any of the following scenarios:
+ <keyword keyref="impala26_full"/> addresses the issue IMPALA-2736, which improves the efficiency of Parquet scans by up to 2x. The faster scans
+ may result in a higher peak memory consumption compared to earlier versions of Impala due to the new column-wise row
+ materialization strategy. You are likely to experience higher memory consumption in any of the following scenarios:
<ul>
<li>
<p>
@@ -1197,15 +899,14 @@ impala hard nproc 262144
<li>
<p>
- Very large rows due to big column values, for example, long strings or nested
- collections with many items.
+ Very large rows due to big column values, for example, long strings or nested collections with many items.
</p>
</li>
<li>
<p>
- Producer/consumer speed imbalances, leading to more rows being buffered between
- a scan (producer) and downstream (consumer) plan nodes.
+ Producer/consumer speed imbalances, leading to more rows being buffered between a scan (producer) and downstream (consumer)
+ plan nodes.
</p>
</li>
</ul>
@@ -1220,16 +921,10 @@ impala hard nproc 262144
</p>
<p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala280"/>.
- </p>
-
- <p>
- <b>Workaround:</b> The following query options might help to reduce memory consumption
- in the Parquet scanner:
+ <b>Workaround:</b> The following query options might help to reduce memory consumption in the Parquet scanner:
<ul>
<li>
- Reduce the number of scanner threads, for example: <codeph>set
- num_scanner_threads=30</codeph>
+ Reduce the number of scanner threads, for example: <codeph>set num_scanner_threads=30</codeph>
</li>
<li>
@@ -1255,8 +950,8 @@ impala hard nproc 262144
<conbody>
<p>
- Some memory allocated by the JVM used internally by Impala is not counted against the
- memory limit for the <cmdname>impalad</cmdname> daemon.
+ Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the
+ <cmdname>impalad</cmdname> daemon.
</p>
<p>
@@ -1264,9 +959,8 @@ impala hard nproc 262144
</p>
<p>
- <b>Workaround:</b> To monitor overall memory usage, use the <cmdname>top</cmdname>
- command, or add the memory figures in the Impala web UI <uicontrol>/memz</uicontrol>
- tab to JVM memory usage shown on the <uicontrol>/metrics</uicontrol> tab.
+ <b>Workaround:</b> To monitor overall memory usage, use the <cmdname>top</cmdname> command, or add the memory figures in the
+ Impala web UI <uicontrol>/memz</uicontrol> tab to JVM memory usage shown on the <uicontrol>/metrics</uicontrol> tab.
</p>
</conbody>
@@ -1288,13 +982,10 @@ impala hard nproc 262144
</p>
<p>
- <b>Workaround:</b> Transition away from the <q>old-style</q> join and aggregation
- mechanism if practical.
+ <b>Workaround:</b> Transition away from the <q>old-style</q> join and aggregation mechanism if practical.
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
</conbody>
@@ -1309,145 +1000,88 @@ impala hard nproc 262144
<conbody>
<p>
- These issues can cause incorrect or unexpected results from queries. They typically only
- arise in very specific circumstances.
+ These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances.
</p>
</conbody>
<concept id="IMPALA-4539">
-
<title>Parquet scanner memory bug: I/O buffer is attached to output batch while scratch batch rows still reference it</title>
-
<!-- TSB-225 title: Possibly incorrect results when scanning uncompressed Parquet files with Impala. -->
-
<conbody>
-
<p>
- Impala queries may return incorrect results when scanning plain-encoded string columns
- in uncompressed Parquet files. I/O buffers holding the string data are prematurely
- freed, leading to invalid memory reads and possibly non-deterministic results. This
- does not affect Parquet files that use a compression codec such as Snappy. Snappy is
- both strongly recommended generally and the default choice for Impala-written Parquet
- files.
+ Impala queries may return incorrect results when scanning plain-encoded string
+ columns in uncompressed Parquet files. I/O buffers holding the string data are
+ prematurely freed, leading to invalid memory reads and possibly
+ non-deterministic results. This does not affect Parquet files that use a
+ compression codec such as Snappy. Snappy is both strongly recommended generally
+ and the default choice for Impala-written Parquet files.
</p>
-
<p>
How to determine whether a query might be affected:
</p>
-
<ul>
<li>
The query must reference <codeph>STRING</codeph> columns from a Parquet table.
</li>
-
<li>
A selective filter on the Parquet table makes this issue more likely.
</li>
-
<li>
- Identify any uncompressed Parquet files processed by the query. Examine the
- <codeph>HDFS_SCAN_NODE</codeph> portion of a query profile that scans the suspected
- table. Use a query that performs a full table scan, and materializes the column
- values. (For example, <codeph>SELECT MIN(<varname>colname</varname>) FROM
- <varname>tablename</varname></codeph>.) Look for <q>File Formats</q>. A value
- containing <codeph>PARQUET/NONE</codeph> means uncompressed Parquet.
+ Identify any uncompressed Parquet files processed by the query.
+ Examine the <codeph>HDFS_SCAN_NODE</codeph> portion of a query profile that scans the
+ suspected table. Use a query that performs a full table scan, and materializes the column
+ values. (For example, <codeph>SELECT MIN(<varname>colname</varname>) FROM <varname>tablename</varname></codeph>.)
+ Look for <q>File Formats</q>. A value containing <codeph>PARQUET/NONE</codeph> means uncompressed Parquet.
</li>
-
<li>
- Identify any plain-encoded string columns in the associated table. Pay special
- attention to tables containing Parquet files generated through Hive, Spark, or other
- mechanisms outside of Impala, because Impala uses Snappy compression by default for
- Parquet files. Use <codeph>parquet-tools</codeph> to dump the file metadata. Note
- that a column could have several encodings within the same file (the column data is
- stored in several column chunks). Look for <codeph>VLE:PLAIN</codeph> in the output
- of <codeph>parquet-tools</codeph>, which means the values are plain encoded.
+ Identify any plain-encoded string columns in the associated table. Pay special attention to tables
+ containing Parquet files generated through Hive, Spark, or other mechanisms outside of Impala,
+ because Impala uses Snappy compression by default for Parquet files. Use <codeph>parquet-tools</codeph>
+ to dump the file metadata. Note that a column could have several encodings within the same file (the column
+ data is stored in several column chunks). Look for <codeph>VLE:PLAIN</codeph> in the output of
+ <codeph>parquet-tools</codeph>, which means the values are plain encoded.
</li>
</ul>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-4539">IMPALA-4539</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala280"/>.
- </p>
-
- <p>
- <b>Workaround:</b> Use Snappy or another compression codec for Parquet files.
- </p>
-
+ <p><b>Bug:</b> <xref keyref="IMPALA-4539">IMPALA-4539</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Resolution:</b> Upgrade to a version of Impala containing the fix for <xref keyref="IMPALA-4539">IMPALA-4539</xref>.</p>
+ <p><b>Workaround:</b> Use Snappy or another compression codec for Parquet files.</p>
</conbody>
-
</concept>
<concept id="IMPALA-4513">
-
<title>ABS(n) where n is the lowest bound for the int types returns negative values</title>
-
<conbody>
-
- <p>
- If the <codeph>abs()</codeph> function evaluates a number that is right at the lower
- bound for an integer data type, the positive result cannot be represented in the same
- type, and the result is returned as a negative number. For example,
- <codeph>abs(-128)</codeph> returns -128 because the argument is interpreted as a
- <codeph>TINYINT</codeph> and the return value is also a <codeph>TINYINT</codeph>.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-4513">IMPALA-4513</xref>
- </p>
-
<p>
- <b>Severity:</b> High
+ If the <codeph>abs()</codeph> function evaluates a number that is right at the lower bound for
+ an integer data type, the positive result cannot be represented in the same type, and the
+ result is returned as a negative number. For example, <codeph>abs(-128)</codeph> returns -128
+ because the argument is interpreted as a <codeph>TINYINT</codeph> and the return value is also
+ a <codeph>TINYINT</codeph>.
</p>
-
- <p>
- <b>Workaround:</b> Cast the integer value to a larger type. For example, rewrite
- <codeph>abs(<varname>tinyint_col</varname>)</codeph> as
- <codeph>abs(cast(<varname>tinyint_col</varname> as smallint))</codeph>.
- </p>
-
+ <p><b>Bug:</b> <xref keyref="IMPALA-4513">IMPALA-4513</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Workaround:</b> Cast the integer value to a larger type. For example, rewrite
+ <codeph>abs(<varname>tinyint_col</varname>)</codeph> as <codeph>abs(cast(<varname>tinyint_col</varname> as smallint))</codeph>.</p>
</conbody>
-
</concept>
<concept id="IMPALA-4266">
-
<title>Java udf expression returning string in group by can give incorrect results.</title>
-
<conbody>
-
- <p>
- If the <codeph>GROUP BY</codeph> clause included a call to a Java UDF that returned a
- string value, the UDF could return an incorrect result.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-4266">IMPALA-4266</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
<p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.
+ If the <codeph>GROUP BY</codeph> clause included a call to a Java UDF that returned a string value,
+ the UDF could return an incorrect result.
</p>
-
- <p>
- <b>Workaround:</b> Rewrite the expression to concatenate the results of the Java UDF
- with an empty string call. For example, rewrite <codeph>my_hive_udf()</codeph> as
+ <p><b>Bug:</b> <xref keyref="IMPALA-4266">IMPALA-4266</xref></p>
+ <p><b>Severity:</b> High</p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
+ <p><b>Workaround:</b> Rewrite the expression to concatenate the results of the Java UDF with an
+ empty string call. For example, rewrite <codeph>my_hive_udf()</codeph> as
<codeph>concat(my_hive_udf(), '')</codeph>.
</p>
-
</conbody>
-
</concept>
<concept id="IMPALA-3084" rev="IMPALA-3084">
@@ -1457,9 +1091,8 @@ impala hard nproc 262144
<conbody>
<p>
- A query could return wrong results (too many or too few <codeph>NULL</codeph> values)
- if it referenced an outer-joined nested collection and also contained a null-checking
- predicate (<codeph>IS NULL</codeph>, <codeph>IS NOT NULL</codeph>, or the
+ A query could return wrong results (too many or too few <codeph>NULL</codeph> values) if it referenced an outer-joined nested
+ collection and also contained a null-checking predicate (<codeph>IS NULL</codeph>, <codeph>IS NOT NULL</codeph>, or the
<codeph><=></codeph> operator) in the <codeph>WHERE</codeph> clause.
</p>
@@ -1471,9 +1104,7 @@ impala hard nproc 262144
<b>Severity:</b> High
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.</p>
</conbody>
@@ -1486,8 +1117,8 @@ impala hard nproc 262144
<conbody>
<p>
- An <codeph>OUTER JOIN</codeph> query could omit some expected result rows due to a
- constant such as <codeph>FALSE</codeph> in another join clause. For example:
+ An <codeph>OUTER JOIN</codeph> query could omit some expected result rows due to a constant such as <codeph>FALSE</codeph> in
+ another join clause. For example:
</p>
<codeblock><![CDATA[
@@ -1513,6 +1144,10 @@ explain SELECT 1 FROM alltypestiny a1
</p>
<p>
+ <b>Resolution:</b>
+ </p>
+
+ <p>
<b>Workaround:</b>
</p>
@@ -1539,8 +1174,8 @@ explain SELECT 1 FROM alltypestiny a1
<li>
<p>
- The INNER JOIN has an On-clause with a predicate that references at least two
- tables that are on the nullable side of the preceding OUTER JOINs.
+ The INNER JOIN has an On-clause with a predicate that references at least two tables that are on the nullable side of the
+ preceding OUTER JOINs.
</p>
</li>
</ul>
@@ -1623,19 +1258,13 @@ on b.int_col = c.int_col;
</p>
<p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala280"/>.
- </p>
-
- <p>
<b>Workaround:</b> High
</p>
<p>
- For some queries, this problem can be worked around by placing the problematic
- <codeph>ON</codeph> clause predicate in the <codeph>WHERE</codeph> clause instead, or
- changing the preceding <codeph>OUTER JOIN</codeph>s to <codeph>INNER JOIN</codeph>s
- (if the <codeph>ON</codeph> clause predicate would discard <codeph>NULL</codeph>s).
- For example, to fix the problematic query above:
+ For some queries, this problem can be worked around by placing the problematic <codeph>ON</codeph> clause predicate in the
+ <codeph>WHERE</codeph> clause instead, or changing the preceding <codeph>OUTER JOIN</codeph>s to <codeph>INNER JOIN</codeph>s (if
+ the <codeph>ON</codeph> clause predicate would discard <codeph>NULL</codeph>s). For example, to fix the problematic query above:
</p>
<codeblock><![CDATA[
@@ -1711,8 +1340,7 @@ where b.int_col = c.int_col
<conbody>
<p>
- Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by Impala is LSB first.
- The parquet standard says it is MSB first.
+ Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by Impala is LSB first. The parquet standard says it is MSB first.
</p>
<p>
@@ -1720,8 +1348,8 @@ where b.int_col = c.int_col
</p>
<p>
- <b>Severity:</b> High, but rare in practice because BIT_PACKED is infrequently used,
- is not written by Impala, and is deprecated in Parquet 2.0.
+ <b>Severity:</b> High, but rare in practice because BIT_PACKED is infrequently used, is not written by Impala, and is deprecated
+ in Parquet 2.0.
</p>
</conbody>
@@ -1735,11 +1363,10 @@ where b.int_col = c.int_col
<conbody>
<p>
- The calculation of start and end times for the BST (British Summer Time) time zone
- could be incorrect between 1972 and 1995. Between 1972 and 1995, BST began and ended
- at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
- third) and fourth Sunday in October. For example, both function calls should return
- 13, but actually return 12, in a query such as:
+ The calculation of start and end times for the BST (British Summer Time) time zone could be incorrect between 1972 and 1995.
+ Between 1972 and 1995, BST began and ended at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
+ third) and fourth Sunday in October. For example, both function calls should return 13, but actually return 12, in a query such
+ as:
</p>
<codeblock>
@@ -1767,18 +1394,15 @@ select
<conbody>
<p>
- If a URL contains an <codeph>@</codeph> character, the <codeph>parse_url()</codeph>
- function could return an incorrect value for the hostname field.
+ If a URL contains an <codeph>@</codeph> character, the <codeph>parse_url()</codeph> function could return an incorrect value for
+ the hostname field.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1170"></xref>IMPALA-1170
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and
- <keyword keyref="impala234"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and <keyword keyref="impala234"/>.</p>
</conbody>
@@ -1791,9 +1415,8 @@ select
<conbody>
<p>
- If the final character in the RHS argument of a <codeph>LIKE</codeph> operator is an
- escaped <codeph>\%</codeph> character, it does not match a <codeph>%</codeph> final
- character of the LHS argument.
+ If the final character in the RHS argument of a <codeph>LIKE</codeph> operator is an escaped <codeph>\%</codeph> character, it
+ does not match a <codeph>%</codeph> final character of the LHS argument.
</p>
<p>
@@ -1811,9 +1434,8 @@ select
<conbody>
<p>
- Because the value for <codeph>rand()</codeph> is computed early in a query, using an
- <codeph>ORDER BY</codeph> expression involving a call to <codeph>rand()</codeph> does
- not actually randomize the results.
+ Because the value for <codeph>rand()</codeph> is computed early in a query, using an <codeph>ORDER BY</codeph> expression
+ involving a call to <codeph>rand()</codeph> does not actually randomize the results.
</p>
<p>
@@ -1831,9 +1453,8 @@ select
<conbody>
<p>
- If the same column is queried twice within a view, <codeph>NULL</codeph> values for
- that column are omitted. For example, the result of <codeph>COUNT(*)</codeph> on the
- view could be less than expected.
+ If the same column is queried twice within a view, <codeph>NULL</codeph> values for that column are omitted. For example, the
+ result of <codeph>COUNT(*)</codeph> on the view could be less than expected.
</p>
<p>
@@ -1844,10 +1465,7 @@ select
<b>Workaround:</b> Avoid selecting the same column twice within an inline view.
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>,
- <keyword keyref="impala232"/>, and <keyword keyref="impala2210"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala2210"/>.</p>
</conbody>
@@ -1862,19 +1480,15 @@ select
<conbody>
<p>
- A query involving an <codeph>OUTER JOIN</codeph> clause where one of the table
- references is an inline view might apply predicates from the <codeph>ON</codeph>
- clause incorrectly.
+ A query involving an <codeph>OUTER JOIN</codeph> clause where one of the table references is an inline view might apply predicates
+ from the <codeph>ON</codeph> clause incorrectly.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1459">IMPALA-1459</xref>
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>,
- <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.</p>
</conbody>
@@ -1887,8 +1501,8 @@ select
<conbody>
<p>
- A query could encounter a serious error if includes multiple nested levels of
- <codeph>INNER JOIN</codeph> clauses involving subqueries.
+ A query could encounter a serious error if includes multiple nested levels of <codeph>INNER JOIN</codeph> clauses involving
+ subqueries.
</p>
<p>
@@ -1906,8 +1520,7 @@ select
<conbody>
<p>
- A query might return incorrect results due to wrong predicate assignment in the
- following scenario:
+ A query might return incorrect results due to wrong predicate assignment in the following scenario:
</p>
<ol>
@@ -1920,8 +1533,8 @@ select
</li>
<li>
- That join has an On-clause containing a predicate that only references columns
- originating from the outer-joined tables inside the inline view
+ That join has an On-clause containing a predicate that only references columns originating from the outer-joined tables inside
+ the inline view
</li>
</ol>
@@ -1929,10 +1542,7 @@ select
<b>Bug:</b> <xref keyref="IMPALA-2665">IMPALA-2665</xref>
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>,
- <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.</p>
</conbody>
@@ -1945,18 +1555,15 @@ select
<conbody>
<p>
- In an <codeph>OUTER JOIN</codeph> query with a <codeph>HAVING</codeph> clause, the
- comparison from the <codeph>HAVING</codeph> clause might be applied at the wrong stage
- of query processing, leading to incorrect results.
+ In an <codeph>OUTER JOIN</codeph> query with a <codeph>HAVING</codeph> clause, the comparison from the <codeph>HAVING</codeph>
+ clause might be applied at the wrong stage of query processing, leading to incorrect results.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2144">IMPALA-2144</xref>
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
</conbody>
@@ -1969,18 +1576,15 @@ select
<conbody>
<p>
- A <codeph>NOT IN</codeph> operator with a subquery that calls an aggregate function,
- such as <codeph>NOT IN (SELECT SUM(...))</codeph>, could return incorrect results.
+ A <codeph>NOT IN</codeph> operator with a subquery that calls an aggregate function, such as <codeph>NOT IN (SELECT
+ SUM(...))</codeph>, could return incorrect results.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2093">IMPALA-2093</xref>
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and
- <keyword keyref="impala234"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and <keyword keyref="impala234"/>.</p>
</conbody>
@@ -1995,9 +1599,8 @@ select
<conbody>
<p>
- These issues affect how Impala interacts with metadata. They cover areas such as the
- metastore database, the <codeph>COMPUTE STATS</codeph> statement, and the Impala
- <cmdname>catalogd</cmdname> daemon.
+ These issues affect how Impala interacts with metadata. They cover areas such as the metastore database, the <codeph>COMPUTE
+ STATS</codeph> statement, and the Impala <cmdname>catalogd</cmdname> daemon.
</p>
</conbody>
@@ -2009,11 +1612,9 @@ select
<conbody>
<p>
- Incremental stats use up about 400 bytes per partition for each column. For example,
- for a table with 20K partitions and 100 columns, the memory overhead from incremental
- statistics is about 800 MB. When serialized for transmission across the network, this
- metadata exceeds the 2 GB Java array size limit and leads to a
- <codeph>catalogd</codeph> crash.
+ Incremental stats use up about 400 bytes per partition for each column. For example, for a table with 20K partitions and 100
+ columns, the memory overhead from incremental statistics is about 800 MB. When serialized for transmission across the network,
+ this metadata exceeds the 2 GB Java array size limit and leads to a <codeph>catalogd</codeph> crash.
</p>
<p>
@@ -2023,9 +1624,8 @@ select
</p>
<p>
- <b>Workaround:</b> If feasible, compute full stats periodically and avoid computing
- incremental stats for that table. The scalability of incremental stats computation is
- a continuing work item.
+ <b>Workaround:</b> If feasible, compute full stats periodically and avoid computing incremental stats for that table. The
+ scalability of incremental stats computation is a continuing work item.
</p>
</conbody>
@@ -2047,21 +1647,17 @@ select
</p>
<p>
- <b>Workaround:</b> On <keyword keyref="impala20"/>, when adjusting table statistics
- manually by setting the <codeph>numRows</codeph>, you must also enable the Boolean
- property <codeph>STATS_GENERATED_VIA_STATS_TASK</codeph>. For example, use a statement
- like the following to set both properties with a single <codeph>ALTER TABLE</codeph>
- statement:
+ <b>Workaround:</b> On <keyword keyref="impala20"/>, when adjusting table statistics manually by setting the <codeph>numRows</codeph>, you must also
+ enable the Boolean property <codeph>STATS_GENERATED_VIA_STATS_TASK</codeph>. For example, use a statement like the following to
+ set both properties with a single <codeph>ALTER TABLE</codeph> statement:
</p>
<codeblock>ALTER TABLE <varname>table_name</varname> SET TBLPROPERTIES('numRows'='<varname>new_value</varname>', 'STATS_GENERATED_VIA_STATS_TASK' = 'true');</codeblock>
<p>
<b>Resolution:</b> The underlying cause is the issue
- <xref
- href="https://issues.apache.org/jira/browse/HIVE-8648"
- scope="external" format="html">HIVE-8648</xref>
- that affects the metastore in Hive 0.13.
+ <xref href="https://issues.apache.org/jira/browse/HIVE-8648" scope="external" format="html">HIVE-8648</xref> that affects the
+ metastore in Hive 0.13. The workaround is only needed until the fix for this issue is incorporated into release of <keyword keyref="distro"/>.
</p>
</conbody>
@@ -2077,8 +1673,8 @@ select
<conbody>
<p>
- These issues affect the ability to interchange data between Impala and other database
- systems. They cover areas such as data types and file formats.
+ These issues affect the ability to interchange data between Impala and other database systems. They cover areas such as data types
+ and file formats.
</p>
</conbody>
@@ -2092,32 +1688,26 @@ select
<conbody>
<p>
- This issue can occur either on old Avro tables (created prior to Hive 1.1) or when
- changing the Avro schema file by adding or removing columns. Columns added to the
- schema file will not show up in the output of the <codeph>DESCRIBE FORMATTED</codeph>
- command. Removing columns from the schema file will trigger a
- <codeph>NullPointerException</codeph>.
+ This issue can occur either on old Avro tables (created prior to Hive 1.1) or when changing the Avro schema file by
+ adding or removing columns. Columns added to the schema file will not show up in the output of the <codeph>DESCRIBE
+ FORMATTED</codeph> command. Removing columns from the schema file will trigger a <codeph>NullPointerException</codeph>.
</p>
<p>
- As a workaround, you can use the output of <codeph>SHOW CREATE TABLE</codeph> to drop
- and recreate the table. This will populate the Hive metastore database with the
- correct column definitions.
+ As a workaround, you can use the output of <codeph>SHOW CREATE TABLE</codeph> to drop and recreate the table. This will populate
+ the Hive metastore database with the correct column definitions.
</p>
<note type="warning">
- <p>
- Only use this for external tables, or Impala will remove the data files. In case of
- an internal table, set it to external first:
+ <p>Only use this for external tables, or Impala will remove the data
+ files. In case of an internal table, set it to external first:
<codeblock>
ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
</codeblock>
- (The part in parentheses is case sensitive.) Make sure to pick the right choice
- between internal and external when recreating the table. See
- <xref href="impala_tables.xml#tables"/> for the differences between internal and
- external tables.
- </p>
- </note>
+ (The part in parentheses is case sensitive.) Make sure to pick the
+ right choice between internal and external when recreating the table.
+ See <xref href="impala_tables.xml#tables"/> for the differences
+ between internal and external tables. </p></note>
<p>
<b>Severity:</b> High
@@ -2156,8 +1746,8 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<conbody>
<p>
- Impala behavior differs from Hive with respect to out of range float/double values.
- Out of range values are returned as maximum allowed value of type (Hive returns NULL).
+ Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum
+ allowed value of type (Hive returns NULL).
</p>
<p>
@@ -2177,16 +1767,14 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<conbody>
<p>
- For compatibility with Impala, the value for the Flume HDFS Sink
- <codeph>hdfs.writeFormat</codeph> must be set to <codeph>Text</codeph>, rather than
- its default value of <codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph>
- setting must be changed to <codeph>Text</codeph> before creating data files with
- Flume; otherwise, those files cannot be read by either Impala or Hive.
+ For compatibility with Impala, the value for the Flume HDFS Sink <codeph>hdfs.writeFormat</codeph> must be set to
+ <codeph>Text</codeph>, rather than its default value of <codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph> setting
+ must be changed to <codeph>Text</codeph> before creating data files with Flume; otherwise, those files cannot be read by either
+ Impala or Hive.
</p>
<p>
- <b>Resolution:</b> This information has been requested to be added to the upstream
- Flume documentation.
+ <b>Resolution:</b> This information has been requested to be added to the upstream Flume documentation.
</p>
</conbody>
@@ -2202,8 +1790,7 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<conbody>
<p>
- Querying certain Avro tables could cause a crash or return no rows, even though Impala
- could <codeph>DESCRIBE</codeph> the table.
+ Querying certain Avro tables could cause a crash or return no rows, even though Impala could <codeph>DESCRIBE</codeph> the table.
</p>
<p>
@@ -2211,14 +1798,13 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
</p>
<p>
- <b>Workaround:</b> Swap the order of the fields in the schema specification. For
- example, <codeph>["null", "string"]</codeph> instead of <codeph>["string",
- "null"]</codeph>.
+ <b>Workaround:</b> Swap the order of the fields in the schema specification. For example, <codeph>["null", "string"]</codeph>
+ instead of <codeph>["string", "null"]</codeph>.
</p>
<p>
- <b>Resolution:</b> Not allowing this syntax agrees with the Avro specification, so it
- may still cause an error even when the crashing issue is resolved.
+ <b>Resolution:</b> Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the
+ crashing issue is resolved.
</p>
</conbody>
@@ -2234,8 +1820,7 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<conbody>
<p>
- If an Avro table has a schema definition with a trailing semicolon, Impala encounters
- an error when the table is queried.
+ If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.
</p>
<p>
@@ -2259,9 +1844,8 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<conbody>
<p>
- Currently, Impala can only read gzipped files containing a single stream. If a gzipped
- file contains multiple concatenated streams, the Impala query only processes the data
- from the first stream.
+ Currently, Impala can only read gzipped files containing a single stream. If a gzipped file contains multiple concatenated
+ streams, the Impala query only processes the data from the first stream.
</p>
<p>
@@ -2272,9 +1856,7 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<b>Workaround:</b> Use a different gzip tool to compress file to a single stream file.
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
</conbody>
@@ -2289,9 +1871,8 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<conbody>
<p>
- If a carriage return / newline pair of characters in a text table is split between
- HDFS data blocks, Impala incorrectly processes the row following the
- <codeph>\n\r</codeph> pair twice.
+ If a carriage return / newline pair of characters in a text table is split between HDFS data blocks, Impala incorrectly processes
+ the row following the <codeph>\n\r</codeph> pair twice.
</p>
<p>
@@ -2302,9 +1883,7 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<b>Workaround:</b> Use the Parquet format for large volumes of data where practical.
</p>
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala260"/>.
- </p>
+ <p><b>Resolution:</b> Fixed in <keyword keyref="impala260"/>.</p>
</conbody>
@@ -2319,33 +1898,30 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<conbody>
<p>
- I
<TRUNCATED>
[3/4] impala git commit: IMPALA-6872: [DOCS] Known Issues in Impala
3.0
Posted by ar...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/c84764d5/docs/topics/impala_known_issues.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml
index a09188e..e67b535 100644
--- a/docs/topics/impala_known_issues.xml
+++ b/docs/topics/impala_known_issues.xml
@@ -38,22 +38,26 @@ under the License.
<conbody>
<p>
- The following sections describe known issues and workarounds in Impala, as of the current production release. This page summarizes the
- most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and
- upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and
- whether a fix is in the pipeline.
+ The following sections describe known issues and workarounds in Impala, as of the current
+ production release. This page summarizes the most serious or frequently encountered issues
+ in the current release, to help you make planning decisions about installing and
+ upgrading. Any workarounds are listed here. The bug links take you to the Impala issues
+ site, where you can see the diagnosis and whether a fix is in the pipeline.
</p>
<note>
- The online issue tracking system for Impala contains comprehensive information and is updated in real time. To verify whether an issue
- you are experiencing has already been reported, or which release an issue is fixed in, search on the
- <xref href="https://issues.apache.org/jira/" scope="external" format="html">issues.apache.org JIRA tracker</xref>.
+ The online issue tracking system for Impala contains comprehensive information and is
+ updated in real time. To verify whether an issue you are experiencing has already been
+ reported, or which release an issue is fixed in, search on the
+ <xref href="https://issues.apache.org/jira/" scope="external" format="html">issues.apache.org
+ JIRA tracker</xref>.
</note>
<p outputclass="toc inpage"/>
<p>
- For issues fixed in various Impala releases, see <xref href="impala_fixed_issues.xml#fixed_issues"/>.
+ For issues fixed in various Impala releases, see
+ <xref href="impala_fixed_issues.xml#fixed_issues"/>.
</p>
<!-- Use as a template for new issues.
@@ -71,63 +75,11 @@ under the License.
-->
- </conbody>
-
-<!-- New known issues for Impala 2.3.
-
-Title: Server-to-server SSL and Kerberos do not work together
-Description: If server<->server SSL is enabled (with ssl_client_ca_certificate), and Kerberos auth is used between servers, the cluster will fail to start.
-Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2598
-Severity: Medium. Server-to-server SSL is practically unusable but this is a new feature.
-Workaround: No known workaround.
-
-Title: Queries may hang on server-to-server exchange errors
-Description: The DataStreamSender::Channel::CloseInternal() does not close the channel on an error. This will cause the node on the other side of the channel to wait indefinitely causing a hang.
-Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2592
-Severity: Low. This does not occur frequently.
-Workaround: No known workaround.
-
-Title: Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats
-Description: Incremental stats use up about 400 bytes per partition X column. So for a table with 20K partitions and 100 columns this is about 800 MB. When serialized this goes past the 2 GB Java array size limit and leads to a catalog crash.
-Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2648, IMPALA-2647, IMPALA-2649.
-Severity: Low. This does not occur frequently.
-Workaround: Reduce the number of partitions.
-
-More from the JIRA report of blocker/critical issues:
-
-IMPALA-2093
-Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate
-IMPALA-1652
-Incorrect results with basic predicate on CHAR typed column.
-IMPALA-1459
-Incorrect assignment of predicates through an outer join in an inline view.
-IMPALA-2665
-Incorrect assignment of On-clause predicate inside inline view with an outer join.
-IMPALA-2603
-Crash: impala::Coordinator::ValidateCollectionSlots
-IMPALA-2375
-Fix issues with the legacy join and agg nodes using enable_partitioned_hash_join=false and enable_partitioned_aggregation=false
-IMPALA-1862
-Invalid bool value not reported as a scanner error
-IMPALA-1792
-ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)
-IMPALA-1578
-Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block
-IMPALA-2643
-Duplicated column in inline view causes dropping null slots during scan
-IMPALA-2005
-A failed CTAS does not drop the table if the insert fails.
-IMPALA-1821
-Casting scenarios with invalid/inconsistent results
-
-Another list from Alex, of correctness problems with predicates; might overlap with ones I already have:
-
-https://issues.apache.org/jira/browse/IMPALA-2665 - Already have
-https://issues.apache.org/jira/browse/IMPALA-2643 - Already have
-https://issues.apache.org/jira/browse/IMPALA-1459 - Already have
-https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
+ <p>
+ c
+ </p>
--->
+ </conbody>
<concept id="known_issues_startup">
@@ -136,419 +88,171 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
<conbody>
<p>
- These issues can prevent one or more Impala-related daemons
- from starting properly.
+ These issues can prevent one or more Impala-related daemons from starting properly.
</p>
</conbody>
<concept id="IMPALA-4978">
- <title id="IMPALA-5253">Problem retrieving FQDN causes startup problem on kerberized clusters</title>
- <conbody>
- <p>
- The method Impala uses to retrieve the host name while constructing the Kerberos
- principal is the <codeph>gethostname()</codeph> system call. This function might
- not always return the fully qualified domain name, depending on the network
- configuration. If the daemons cannot determine the FQDN, Impala does not start
- on a kerberized cluster.
- </p>
- <p>
- This problem might occur immediately after an upgrade of a CDH cluster, due to changes
- in Cloudera Manager that supplies the <codeph>--hostname</codeph> flag automatically to
- the Impala-related daemons. (See the issue <q>hostname parameter is not passed to Impala catalog role</q>
- at <xref href="https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_rn_known_issues.html" scope="external" format="html">the Cloudera Manager Known Issues page</xref>.)
- </p>
- <p><b>Bugs:</b> <xref keyref="IMPALA-4978">IMPALA-4978</xref>, <xref keyref="IMPALA-5253">IMPALA-5253</xref></p>
- <p><b>Severity:</b> High</p>
- <p><b>Resolution:</b> The issue is expected to occur less frequently on systems
- with fixes for <xref keyref="IMPALA-4978">IMPALA-4978</xref>, <xref keyref="IMPALA-5253">IMPALA-5253</xref>,
- or both. Even on systems with fixes for both of these issues, the workaround might still
- be required in some cases.
- </p>
- <p><b>Workaround:</b> Test if a host is affected by checking whether the output of the
- <cmdname>hostname</cmdname> command includes the FQDN. On hosts where <cmdname>hostname</cmdname>
- only returns the short name, pass the command-line flag
- <codeph>--hostname=<varname>fully_qualified_domain_name</varname></codeph>
- in the startup options of all Impala-related daemons.
- </p>
- </conbody>
- </concept>
-
- </concept>
-
- <concept id="known_issues_crash">
-
- <title>Impala Known Issues: Crashes and Hangs</title>
-
- <conbody>
-
- <p>
- These issues can cause Impala to quit or become unresponsive.
- </p>
-
- </conbody>
-
- <concept id="IMPALA-4828">
- <title>Altering Kudu table schema outside of Impala may result in crash on read</title>
- <conbody>
- <p>
- Creating a table in Impala, changing the column schema outside of Impala,
- and then reading again in Impala may result in a crash. Neither Impala nor
- the Kudu client validates the schema immediately before reading, so Impala may attempt to
- dereference pointers that aren't there. This happens if a string column is dropped
- and then a new, non-string column is added with the old string column's name.
- </p>
- <p><b>Bug:</b> <xref keyref="IMPALA-4828" scope="external" format="html">IMPALA-4828</xref></p>
- <p><b>Severity:</b> High</p>
- <p><b>Workaround:</b> Run the statement <codeph>REFRESH <varname>table_name</varname></codeph>
- after any occasion when the table structure, such as the number, names, and data types
- of columns, are modified outside of Impala using the Kudu API.
- </p>
- </conbody>
- </concept>
-
- <concept id="IMPALA-1972" rev="IMPALA-1972">
-
- <title>Queries that take a long time to plan can cause webserver to block other queries</title>
-
- <conbody>
-
- <p>
- Trying to get the details of a query through the debug web page
- while the query is planning will block new queries that had not
- started when the web page was requested. The web UI becomes
- unresponsive until the planning phase is finished.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-1972">IMPALA-1972</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- </conbody>
- </concept>
-
- <concept id="IMPALA-4595">
- <title>Linking IR UDF module to main module crashes Impala</title>
- <conbody>
- <p>
- A UDF compiled as an LLVM module (<codeph>.ll</codeph>) could cause a crash
- when executed.
- </p>
- <p><b>Bug:</b> <xref keyref="IMPALA-4595">IMPALA-4595</xref></p>
- <p><b>Severity:</b> High</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
- <p><b>Workaround:</b> Compile the external UDFs to a <codeph>.so</codeph> library instead of a
- <codeph>.ll</codeph> IR module.</p>
- </conbody>
- </concept>
-
- <concept id="IMPALA-3069" rev="IMPALA-3069">
- <title>Setting BATCH_SIZE query option too large can cause a crash</title>
+ <title>Impala requires FQDN from hostname command on kerberized clusters</title>
<conbody>
<p>
- Using a value in the millions for the <codeph>BATCH_SIZE</codeph> query option, together with wide rows or large string values in
- columns, could cause a memory allocation of more than 2 GB resulting in a crash.
+ The method Impala uses to retrieve the host name while constructing the Kerberos
+ principal is the <codeph>gethostname()</codeph> system call. This function might not
+ always return the fully qualified domain name, depending on the network configuration.
+ If the daemons cannot determine the FQDN, Impala does not start on a kerberized
+ cluster.
</p>
<p>
- <b>Bug:</b> <xref keyref="IMPALA-3069">IMPALA-3069</xref>
+ <b>Workaround:</b> Test if a host is affected by checking whether the output of the
+ <cmdname>hostname</cmdname> command includes the FQDN. On hosts where
+ <cmdname>hostname</cmdname>, only returns the short name, pass the command-line flag
+ <codeph>--hostname=<varname>fully_qualified_domain_name</varname></codeph> in the
+ startup options of all Impala-related daemons.
</p>
<p>
- <b>Severity:</b> High
+ <b>Apache Issue:</b> <xref keyref="IMPALA-4978">IMPALA-4978</xref>
</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.</p>
-
</conbody>
</concept>
- <concept id="IMPALA-3441" rev="IMPALA-3441">
-
- <title>Impala should not crash for invalid avro serialized data</title>
-
- <conbody>
-
- <p>
- Malformed Avro data, such as out-of-bounds integers or values in the wrong format, could cause a crash when queried.
- </p>
+ </concept>
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-3441">IMPALA-3441</xref>
- </p>
+ <concept id="known_issues_performance">
- <p>
- <b>Severity:</b> High
- </p>
+ <title id="ki_performance">Impala Known Issues: Performance</title>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/> and <keyword keyref="impala262"/>.</p>
+ <conbody>
- </conbody>
+ <p>
+ These issues involve the performance of operations such as queries or DDL statements.
+ </p>
- </concept>
+ </conbody>
- <concept id="IMPALA-2592" rev="IMPALA-2592">
+ <concept id="impala-6671">
- <title>Queries may hang on server-to-server exchange errors</title>
+ <title>Metadata operations block read-only operations on unrelated tables</title>
<conbody>
<p>
- The <codeph>DataStreamSender::Channel::CloseInternal()</codeph> does not close the channel on an error. This causes the node on
- the other side of the channel to wait indefinitely, causing a hang.
+ Metadata operations that change the state of a table, like <codeph>COMPUTE
+ STATS</codeph> or <codeph>ALTER RECOVER PARTITIONS</codeph>, may delay metadata
+ propagation of unrelated unloaded tables triggered by statements like
+ <codeph>DESCRIBE</codeph> or <codeph>SELECT</codeph> queries.
</p>
<p>
- <b>Bug:</b> <xref keyref="IMPALA-2592">IMPALA-2592</xref>
- </p>
-
- <p>
- <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
+ <b>Bug:</b> <xref keyref="IMPALA-6671">IMPALA-6671</xref>
</p>
</conbody>
</concept>
- <concept id="IMPALA-2365" rev="IMPALA-2365">
+ <concept id="IMPALA-3316">
- <title>Impalad is crashing if udf jar is not available in hdfs location for first time</title>
+ <title>Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true</title>
<conbody>
<p>
- If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala <codeph>CREATE FUNCTION</codeph> statement is
- issued, the <cmdname>impalad</cmdname> daemon crashes.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-2365">IMPALA-2365</xref>
+ The configuration setting
+ <codeph>convert_legacy_hive_parquet_utc_timestamps=true</codeph> uses an underlying
+ function that can be a bottleneck on high volume, highly concurrent queries due to the
+ use of a global lock while loading time zone information. This bottleneck can cause
+ slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount of
+ slowdown depends on factors such as the number of cores and number of threads involved
+ in the query.
</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
-
- </conbody>
-
- </concept>
-
- </concept>
-
- <concept id="known_issues_performance">
-
- <title id="ki_performance">Impala Known Issues: Performance</title>
-
- <conbody>
-
- <p>
- These issues involve the performance of operations such as queries or DDL statements.
- </p>
-
- </conbody>
-
- <concept id="IMPALA-3316">
- <title>Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true</title>
- <conbody>
- <p>
- The configuration setting <codeph>convert_legacy_hive_parquet_utc_timestamps=true</codeph>
- uses an underlying function that can be a bottleneck on high volume, highly concurrent
- queries due to the use of a global lock while loading time zone information. This bottleneck
- can cause slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount
- of slowdown depends on factors such as the number of cores and number of threads involved in the query.
- </p>
<note>
<p>
- The slowdown only occurs when accessing <codeph>TIMESTAMP</codeph> columns within Parquet files that
- were generated by Hive, and therefore require the on-the-fly timezone conversion processing.
+ The slowdown only occurs when accessing <codeph>TIMESTAMP</codeph> columns within
+ Parquet files that were generated by Hive, and therefore require the on-the-fly
+ timezone conversion processing.
</p>
</note>
- <p><b>Bug:</b> <xref keyref="IMPALA-3316">IMPALA-3316</xref></p>
- <p><b>Severity:</b> High</p>
- <p><b>Workaround:</b> If the <codeph>TIMESTAMP</codeph> values stored in the table represent dates only,
- with no time portion, consider storing them as strings in <codeph>yyyy-MM-dd</codeph> format.
- Impala implicitly converts such string values to <codeph>TIMESTAMP</codeph> in calls to date/time
- functions.
- </p>
- </conbody>
- </concept>
-
- <concept id="IMPALA-1480" rev="IMPALA-1480">
-
-<!-- Not part of Alex's spreadsheet. Spreadsheet has IMPALA-1423 which mentions it's similar to this one but not a duplicate. -->
-
- <title>Slow DDL statements for tables with large number of partitions</title>
-
- <conbody>
<p>
- DDL statements for tables with a large number of partitions might be slow.
+ <b>Bug:</b> <xref keyref="IMPALA-3316">IMPALA-3316</xref>
</p>
<p>
- <b>Bug:</b> <xref keyref="IMPALA-1480">IMPALA-1480</xref>
+ <b>Severity:</b> High
</p>
<p>
- <b>Workaround:</b> Run the DDL statement in Hive if the slowness is an issue.
+ <b>Workaround:</b> If the <codeph>TIMESTAMP</codeph> values stored in the table
+ represent dates only, with no time portion, consider storing them as strings in
+ <codeph>yyyy-MM-dd</codeph> format. Impala implicitly converts such string values to
+ <codeph>TIMESTAMP</codeph> in calls to date/time functions.
</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
-
</conbody>
</concept>
<concept id="ki_file_handle_cache">
- <title>Interaction of File Handle Cache with HDFS Appends and Short-Circuit Reads</title>
- <conbody>
- <p>
- If a data file used by Impala is being continuously appended or
- overwritten in place by an HDFS mechanism, such as <cmdname>hdfs dfs
- -appendToFile</cmdname>, interaction with the file handle caching
- feature in <keyword keyref="impala210_full"/> and higher could cause
- short-circuit reads to sometimes be disabled on some DataNodes. When a
- mismatch is detected between the cached file handle and a data block
- that was rewritten because of an append, short-circuit reads are
- turned off on the affected host for a 10-minute period.
- </p>
- <p>
- The possibility of encountering such an issue is the reason why the
- file handle caching feature is currently turned off by default. See
- <xref keyref="scalability_file_handle_cache"/> for information about
- this feature and how to enable it.
- </p>
- <p>
- <b>Bug:</b>
- <xref href="https://issues.apache.org/jira/browse/HDFS-12528"
- scope="external" format="html">HDFS-12528</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p><b>Workaround:</b> Verify whether your ETL process is susceptible to
- this issue before enabling the file handle caching feature. You can
- set the <cmdname>impalad</cmdname> configuration option
- <codeph>unused_file_handle_timeout_sec</codeph> to a time period
- that is shorter than the HDFS setting
- <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>.
- (Keep in mind that the HDFS setting is in milliseconds while the
- Impala setting is in seconds.)
- </p>
-
- <p>
- <b>Resolution:</b> Fixed in HDFS 2.10 and higher. Use the new HDFS
- parameter <codeph>dfs.domain.socket.disable.interval.seconds</codeph>
- to specify the amount of time that short circuit reads are disabled on
- encountering an error. The default value is 10 minutes
- (<codeph>600</codeph> seconds). It is recommended that you set
- <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to a
- small value, such as <codeph>1</codeph> second, when using the file
- handle cache. Setting <codeph>
- dfs.domain.socket.disable.interval.seconds</codeph> to
- <codeph>0</codeph> is not recommended as a non-zero interval
- protects the system if there is a persistent problem with short
- circuit reads.
- </p>
- </conbody>
- </concept>
-
- </concept>
-
- <concept id="known_issues_usability">
-
- <title id="ki_usability">Impala Known Issues: Usability</title>
-
- <conbody>
- <p>
- These issues affect the convenience of interacting directly with Impala, typically through the Impala shell or Hue.
- </p>
-
- </conbody>
-
- <concept id="IMPALA-4570">
- <title>Impala shell tarball is not usable on systems with setuptools versions where '0.7' is a substring of the full version string</title>
- <conbody>
- <p>
- For example, this issue could occur on a system using setuptools version 20.7.0.
- </p>
- <p><b>Bug:</b> <xref keyref="IMPALA-4570">IMPALA-4570</xref></p>
- <p><b>Severity:</b> High</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
- <p><b>Workaround:</b> Change to a setuptools version that does not have <codeph>0.7</codeph> as
- a substring.
- </p>
- </conbody>
- </concept>
-
- <concept id="IMPALA-3133" rev="IMPALA-3133">
-
- <title>Unexpected privileges in show output</title>
+ <title>Interaction of File Handle Cache with HDFS Appends and Short-Circuit Reads</title>
<conbody>
<p>
- Due to a timing condition in updating cached policy data from Sentry, the <codeph>SHOW</codeph> statements for Sentry roles could
- sometimes display out-of-date role settings. Because Impala rechecks authorization for each SQL statement, this discrepancy does
- not represent a security issue for other statements.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-3133">IMPALA-3133</xref>
+ If a data file used by Impala is being continuously appended or overwritten in place
+ by an HDFS mechanism, such as <cmdname>hdfs dfs -appendToFile</cmdname>, interaction
+ with the file handle caching feature in <keyword keyref="impala210_full"/> and higher
+ could cause short-circuit reads to sometimes be disabled on some DataNodes. When a
+ mismatch is detected between the cached file handle and a data block that was
+ rewritten because of an append, short-circuit reads are turned off on the affected
+ host for a 10-minute period.
</p>
<p>
- <b>Severity:</b> High
+ The possibility of encountering such an issue is the reason why the file handle
+ caching feature is currently turned off by default. See
+ <xref keyref="scalability_file_handle_cache"/> for information about this feature and
+ how to enable it.
</p>
<p>
- <b>Resolution:</b> Fixes have been issued for some but not all Impala releases. Check the JIRA for details of fix releases.
+ <b>Bug:</b>
+ <xref href="https://issues.apache.org/jira/browse/HDFS-12528"
+ scope="external" format="html">HDFS-12528</xref>
</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala260"/> and <keyword keyref="impala251"/>.</p>
-
- </conbody>
-
- </concept>
-
- <concept id="IMPALA-1776" rev="IMPALA-1776">
-
- <title>Less than 100% progress on completed simple SELECT queries</title>
-
- <conbody>
-
<p>
- Simple <codeph>SELECT</codeph> queries show less than 100% progress even though they are already completed.
+ <b>Severity:</b> High
</p>
<p>
- <b>Bug:</b> <xref keyref="IMPALA-1776">IMPALA-1776</xref>
+ <b>Workaround:</b> Verify whether your ETL process is susceptible to this issue before
+ enabling the file handle caching feature. You can set the <cmdname>impalad</cmdname>
+ configuration option <codeph>unused_file_handle_timeout_sec</codeph> to a time period
+ that is shorter than the HDFS setting
+ <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>. (Keep in mind
+ that the HDFS setting is in milliseconds while the Impala setting is in seconds.)
</p>
- </conbody>
-
- </concept>
-
- <concept id="concept_lmx_dk5_lx">
-
- <title>Unexpected column overflow behavior with INT datatypes</title>
-
- <conbody>
-
- <p conref="../shared/impala_common.xml#common/int_overflow_behavior" />
-
<p>
- <b>Bug:</b>
- <xref keyref="IMPALA-3123">IMPALA-3123</xref>
+ <b>Resolution:</b> Fixed in HDFS 2.10 and higher. Use the new HDFS parameter
+ <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to specify the amount of
+ time that short circuit reads are disabled on encountering an error. The default value
+ is 10 minutes (<codeph>600</codeph> seconds). It is recommended that you set
+ <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to a small value, such as
+ <codeph>1</codeph> second, when using the file handle cache. Setting <codeph>
+ dfs.domain.socket.disable.interval.seconds</codeph> to <codeph>0</codeph> is not
+ recommended as a non-zero interval protects the system if there is a persistent
+ problem with short circuit reads.
</p>
</conbody>
@@ -557,6 +261,8 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
</concept>
+<!--<concept id="known_issues_usability"><title id="ki_usability">Impala Known Issues: Usability</title><conbody><p> These issues affect the convenience of interacting directly with Impala, typically through the Impala shell or Hue. </p></conbody></concept>-->
+
<concept id="known_issues_drivers">
<title id="ki_drivers">Impala Known Issues: JDBC and ODBC Drivers</title>
@@ -564,8 +270,8 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
<conbody>
<p>
- These issues affect applications that use the JDBC or ODBC APIs, such as business intelligence tools or custom-written applications
- in languages such as Java or C++.
+ These issues affect applications that use the JDBC or ODBC APIs, such as business
+ intelligence tools or custom-written applications in languages such as Java or C++.
</p>
</conbody>
@@ -579,8 +285,9 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
<conbody>
<p>
- If the ODBC <codeph>SQLGetData</codeph> is called on a series of columns, the function calls must follow the same order as the
- columns. For example, if data is fetched from column 2 then column 1, the <codeph>SQLGetData</codeph> call for column 1 returns
+ If the ODBC <codeph>SQLGetData</codeph> is called on a series of columns, the function
+ calls must follow the same order as the columns. For example, if data is fetched from
+ column 2 then column 1, the <codeph>SQLGetData</codeph> call for column 1 returns
<codeph>NULL</codeph>.
</p>
@@ -598,1066 +305,299 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
</concept>
- <concept id="known_issues_security">
+<!--<concept id="known_issues_security"><title id="ki_security">Impala Known Issues: Security</title><conbody><p> These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and redaction. </p></conbody></concept>-->
- <title id="ki_security">Impala Known Issues: Security</title>
+ <concept id="known_issues_resources">
+
+ <title id="ki_resources">Impala Known Issues: Resources</title>
<conbody>
<p>
- These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and
- redaction.
+ These issues involve memory or disk usage, including out-of-memory conditions, the
+ spill-to-disk feature, and resource management features.
</p>
</conbody>
- <concept id="IMPALA-5638">
- <title>Malicious user can gain unauthorized access to Kudu table data via Impala</title>
- <conbody>
- <p>
- A malicious user with <codeph>ALTER</codeph> permissions on an Impala table can access any
- other Kudu table data by altering the table properties to make it <q>external</q>
- and then changing the underlying table mapping to point to other Kudu tables.
- This violates and works around the authorization requirement that creating a
- Kudu external table via Impala requires an <codeph>ALL</codeph> privilege at the server scope.
- This privilege requirement for <codeph>CREATE</codeph> commands is enforced to precisely avoid
- this scenario where a malicious user can change the underlying Kudu table
- mapping. The fix is to enforce the same privilege requirement for <codeph>ALTER</codeph>
- commands that would make existing non-external Kudu tables external.
- </p>
- <p><b>Bug:</b> <xref keyref="IMPALA-5638">IMPALA-5638</xref></p>
- <p><b>Severity:</b> High</p>
- <p><b>Workaround:</b> A temporary workaround is to revoke <codeph>ALTER</codeph> permissions on Impala tables.</p>
- <p><b>Resolution:</b> Upgrade to an Impala version containing the fix for <xref keyref="IMPALA-5638">IMPALA-5638</xref>.</p>
- </conbody>
- </concept>
-
- <concept id="renewable_kerberos_tickets">
-
-<!-- Not part of Alex's spreadsheet. Not associated with a JIRA number AFAIK. -->
-
- <title>Kerberos tickets must be renewable</title>
-
- <conbody>
-
- <p>
- In a Kerberos environment, the <cmdname>impalad</cmdname> daemon might not start if Kerberos tickets are not renewable.
- </p>
-
- <p>
- <b>Workaround:</b> Configure your KDC to allow tickets to be renewed, and configure <filepath>krb5.conf</filepath> to request
- renewable tickets.
- </p>
-
- </conbody>
-
- </concept>
-
-<!-- To do: Fixed in 2.5.0, 2.3.2. Commenting out until I see how it can fix into "known issues now fixed" convention.
- That set of fix releases looks incomplete so probably have to do some detective work with the JIRA.
- https://issues.apache.org/jira/browse/IMPALA-2598
- <concept id="IMPALA-2598" rev="IMPALA-2598">
+ <concept id="IMPALA-6028">
- <title>Server-to-server SSL and Kerberos do not work together</title>
+ <title>Handling large rows during upgrade to <keyword
+ keyref="impala210_full"/> or higher</title>
<conbody>
<p>
- If SSL is enabled between internal Impala components (with <codeph>ssl_client_ca_certificate</codeph>), and Kerberos
- authentication is used between servers, the cluster fails to start.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-2598">IMPALA-2598</xref>
- </p>
-
- <p>
- <b>Workaround:</b> Do not use the new <codeph>ssl_client_ca_certificate</codeph> setting on Kerberos-enabled clusters until this
- issue is resolved.
+ After an upgrade to <keyword keyref="impala210_full"/> or higher, users who process
+ very large column values (long strings), or have increased the
+ <codeph>--read_size</codeph> configuration setting from its default of 8 MB, might
+ encounter capacity errors for some queries that previously worked.
</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and <keyword keyref="impala232"/>.</p>
-
- </conbody>
-
- </concept>
--->
-
- </concept>
-
-<!--
- <concept id="known_issues_supportability">
-
- <title id="ki_supportability">Impala Known Issues: Supportability</title>
-
- <conbody>
-
- <p>
- These issues affect the ability to debug and troubleshoot Impala, such as incorrect output in query profiles or the query state
- shown in monitoring applications.
- </p>
-
- </conbody>
-
- </concept>
--->
-
- <concept id="known_issues_resources">
-
- <title id="ki_resources">Impala Known Issues: Resources</title>
-
- <conbody>
-
- <p>
- These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management
- features.
- </p>
-
- </conbody>
-
- <concept id="IMPALA-5605">
- <title>Configuration to prevent crashes caused by thread resource limits</title>
- <conbody>
- <p>
- Impala could encounter a serious error due to resource usage under very high concurrency.
- The error message is similar to:
- </p>
-<codeblock><![CDATA[
-F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
-terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
-]]>
-</codeblock>
- <p><b>Bug:</b> <xref keyref="IMPALA-5605">IMPALA-5605</xref></p>
- <p><b>Severity:</b> High</p>
- <p><b>Workaround:</b>
- To prevent such errors, configure each host running an <cmdname>impalad</cmdname>
- daemon with the following settings:
- </p>
-<codeblock>
-echo 2000000 > /proc/sys/kernel/threads-max
-echo 2000000 > /proc/sys/kernel/pid_max
-echo 8000000 > /proc/sys/vm/max_map_count
-</codeblock>
- <p>
- Add the following lines in <filepath>/etc/security/limits.conf</filepath>:
- </p>
-<codeblock>
-impala soft nproc 262144
-impala hard nproc 262144
-</codeblock>
- </conbody>
- </concept>
-
- <concept id="flatbuffers_mem_usage">
- <title>Memory usage when compact_catalog_topic flag enabled</title>
- <conbody>
- <p>
- The efficiency improvement from <xref keyref="IMPALA-4029">IMPALA-4029</xref>
- can cause an increase in size of the updates to Impala catalog metadata
- that are broadcast to the <cmdname>impalad</cmdname> daemons
- by the <cmdname>statestored</cmdname> daemon.
- The increase in catalog update topic size results in higher CPU and network
- utilization. By default, the increase in topic size is about 5-7%. If the
- <codeph>compact_catalog_topic</codeph> flag is used, the
- size increase is more substantial, with a topic size approximately twice as
- large as in previous versions.
- </p>
- <p><b>Bug:</b> <xref keyref="IMPALA-5500">IMPALA-5500</xref></p>
- <p><b>Severity:</b> Medium</p>
- <p>
- <b>Workaround:</b> Consider setting the
- <codeph>compact_catalog_topic</codeph> configuration setting to
- <codeph>false</codeph> until this issue is resolved. </p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala210"/>.</p>
- </conbody>
- </concept>
-
- <concept id="IMPALA-2294">
- <title>Kerberos initialization errors due to high memory usage</title>
- <conbody>
- <p conref="../shared/impala_common.xml#common/vm_overcommit_memory_intro"/>
- <p><b>Bug:</b> <xref keyref="IMPALA-2294">IMPALA-2294</xref></p>
- <p><b>Severity:</b> High</p>
- <p><b>Workaround:</b></p>
- <p conref="../shared/impala_common.xml#common/vm_overcommit_memory_start" conrefend="../shared/impala_common.xml#common/vm_overcommit_memory_end"/>
- </conbody>
- </concept>
-
- <concept id="drop_table_purge_s3a">
- <title>DROP TABLE PURGE on S3A table may not delete externally written files</title>
- <conbody>
- <p>
- A <codeph>DROP TABLE PURGE</codeph> statement against an S3 table could leave the data files
- behind, if the table directory and the data files were created with a combination of
- <cmdname>hadoop fs</cmdname> and <cmdname>aws s3</cmdname> commands.
- </p>
- <p><b>Bug:</b> <xref keyref="IMPALA-3558">IMPALA-3558</xref></p>
- <p><b>Severity:</b> High</p>
- <p><b>Resolution:</b> The underlying issue with the S3A connector depends on the resolution of <xref href="https://issues.apache.org/jira/browse/HADOOP-13230" format="html" scope="external">HADOOP-13230</xref>.</p>
- </conbody>
- </concept>
-
- <concept id="catalogd_heap">
-
- <title>Impala catalogd heap issues when upgrading to <keyword keyref="impala25"/></title>
-
- <conbody>
-
- <p>
- The default heap size for Impala <cmdname>catalogd</cmdname> has changed in <keyword keyref="impala25_full"/> and higher:
- </p>
-
- <ul>
- <li>
- <p>
- Previously, by default <cmdname>catalogd</cmdname> was using the JVM's default heap size, which is the smaller of 1/4th of the
- physical memory or 32 GB.
- </p>
- </li>
-
- <li>
- <p>
- Starting with <keyword keyref="impala250"/>, the default <cmdname>catalogd</cmdname> heap size is 4 GB.
- </p>
- </li>
- </ul>
-
- <p>
- For example, on a host with 128GB physical memory this will result in catalogd heap decreasing from 32GB to 4GB. This can result
- in out-of-memory errors in catalogd and leading to query failures.
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Workaround:</b> Increase the <cmdname>catalogd</cmdname> memory limit as follows.
-<!-- See <xref href="impala_scalability.xml#scalability_catalog"/> for the procedure. -->
-<!-- Including full details here via conref, for benefit of PDF readers or anyone else
- who might have trouble seeing or following the link. -->
- </p>
-
- <p conref="../shared/impala_common.xml#common/increase_catalogd_heap_size"/>
-
- </conbody>
-
- </concept>
-
- <concept id="IMPALA-3509" rev="IMPALA-3509">
-
- <title>Breakpad minidumps can be very large when the thread count is high</title>
-
- <conbody>
-
- <p>
- The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the
- minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-3509">IMPALA-3509</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Workaround:</b> Add <codeph>--minidump_size_limit_hint_kb=<varname>size</varname></codeph> to set a soft upper limit on the
- size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread
- from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump
- file can still grow larger than the <q>hinted</q> size. For example, if you have 10,000 threads, the minidump file can be more
- than 20 MB.
- </p>
-
- </conbody>
-
- </concept>
-
- <concept id="IMPALA-3662" rev="IMPALA-3662">
-
- <title>Parquet scanner memory increase after IMPALA-2736</title>
-
- <conbody>
-
- <p>
- The initial release of <keyword keyref="impala26_full"/> sometimes has a higher peak memory usage than in previous releases while reading
- Parquet files.
- </p>
-
- <p>
- <keyword keyref="impala26_full"/> addresses the issue IMPALA-2736, which improves the efficiency of Parquet scans by up to 2x. The faster scans
- may result in a higher peak memory consumption compared to earlier versions of Impala due to the new column-wise row
- materialization strategy. You are likely to experience higher memory consumption in any of the following scenarios:
- <ul>
- <li>
- <p>
- Very wide rows due to projecting many columns in a scan.
- </p>
- </li>
-
- <li>
- <p>
- Very large rows due to big column values, for example, long strings or nested collections with many items.
- </p>
- </li>
-
- <li>
- <p>
- Producer/consumer speed imbalances, leading to more rows being buffered between a scan (producer) and downstream (consumer)
- plan nodes.
- </p>
- </li>
- </ul>
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-3662">IMPALA-3662</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Workaround:</b> The following query options might help to reduce memory consumption in the Parquet scanner:
- <ul>
- <li>
- Reduce the number of scanner threads, for example: <codeph>set num_scanner_threads=30</codeph>
- </li>
-
- <li>
- Reduce the batch size, for example: <codeph>set batch_size=512</codeph>
- </li>
-
- <li>
- Increase the memory limit, for example: <codeph>set mem_limit=64g</codeph>
- </li>
- </ul>
- </p>
-
- </conbody>
-
- </concept>
-
- <concept id="IMPALA-691" rev="IMPALA-691">
-
- <title>Process mem limit does not account for the JVM's memory usage</title>
-
-<!-- Supposed to be resolved for Impala 2.3.0. -->
-
- <conbody>
-
- <p>
- Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the
- <cmdname>impalad</cmdname> daemon.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-691">IMPALA-691</xref>
- </p>
-
- <p>
- <b>Workaround:</b> To monitor overall memory usage, use the <cmdname>top</cmdname> command, or add the memory figures in the
- Impala web UI <uicontrol>/memz</uicontrol> tab to JVM memory usage shown on the <uicontrol>/metrics</uicontrol> tab.
- </p>
-
- </conbody>
-
- </concept>
-
- <concept id="IMPALA-2375" rev="IMPALA-2375">
-
-<!-- Not part of Alex's spreadsheet -->
-
- <title>Fix issues with the legacy join and agg nodes using --enable_partitioned_hash_join=false and --enable_partitioned_aggregation=false</title>
-
- <conbody>
-
- <p></p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-2375">IMPALA-2375</xref>
- </p>
-
- <p>
- <b>Workaround:</b> Transition away from the <q>old-style</q> join and aggregation mechanism if practical.
- </p>
-
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
-
- </conbody>
-
- </concept>
-
- </concept>
-
- <concept id="known_issues_correctness">
-
- <title id="ki_correctness">Impala Known Issues: Correctness</title>
-
- <conbody>
-
- <p>
- These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances.
- </p>
-
- </conbody>
-
- <concept id="IMPALA-4539">
- <title>Parquet scanner memory bug: I/O buffer is attached to output batch while scratch batch rows still reference it</title>
-<!-- TSB-225 title: Possibly incorrect results when scanning uncompressed Parquet files with Impala. -->
- <conbody>
- <p>
- Impala queries may return incorrect results when scanning plain-encoded string
- columns in uncompressed Parquet files. I/O buffers holding the string data are
- prematurely freed, leading to invalid memory reads and possibly
- non-deterministic results. This does not affect Parquet files that use a
- compression codec such as Snappy. Snappy is both strongly recommended generally
- and the default choice for Impala-written Parquet files.
- </p>
- <p>
- How to determine whether a query might be affected:
- </p>
- <ul>
- <li>
- The query must reference <codeph>STRING</codeph> columns from a Parquet table.
- </li>
- <li>
- A selective filter on the Parquet table makes this issue more likely.
- </li>
- <li>
- Identify any uncompressed Parquet files processed by the query.
- Examine the <codeph>HDFS_SCAN_NODE</codeph> portion of a query profile that scans the
- suspected table. Use a query that performs a full table scan, and materializes the column
- values. (For example, <codeph>SELECT MIN(<varname>colname</varname>) FROM <varname>tablename</varname></codeph>.)
- Look for <q>File Formats</q>. A value containing <codeph>PARQUET/NONE</codeph> means uncompressed Parquet.
- </li>
- <li>
- Identify any plain-encoded string columns in the associated table. Pay special attention to tables
- containing Parquet files generated through Hive, Spark, or other mechanisms outside of Impala,
- because Impala uses Snappy compression by default for Parquet files. Use <codeph>parquet-tools</codeph>
- to dump the file metadata. Note that a column could have several encodings within the same file (the column
- data is stored in several column chunks). Look for <codeph>VLE:PLAIN</codeph> in the output of
- <codeph>parquet-tools</codeph>, which means the values are plain encoded.
- </li>
- </ul>
- <p><b>Bug:</b> <xref keyref="IMPALA-4539">IMPALA-4539</xref></p>
- <p><b>Severity:</b> High</p>
- <p><b>Resolution:</b> Upgrade to a version of Impala containing the fix for <xref keyref="IMPALA-4539">IMPALA-4539</xref>.</p>
- <p><b>Workaround:</b> Use Snappy or another compression codec for Parquet files.</p>
- </conbody>
- </concept>
-
- <concept id="IMPALA-4513">
- <title>ABS(n) where n is the lowest bound for the int types returns negative values</title>
- <conbody>
- <p>
- If the <codeph>abs()</codeph> function evaluates a number that is right at the lower bound for
- an integer data type, the positive result cannot be represented in the same type, and the
- result is returned as a negative number. For example, <codeph>abs(-128)</codeph> returns -128
- because the argument is interpreted as a <codeph>TINYINT</codeph> and the return value is also
- a <codeph>TINYINT</codeph>.
- </p>
- <p><b>Bug:</b> <xref keyref="IMPALA-4513">IMPALA-4513</xref></p>
- <p><b>Severity:</b> High</p>
- <p><b>Workaround:</b> Cast the integer value to a larger type. For example, rewrite
- <codeph>abs(<varname>tinyint_col</varname>)</codeph> as <codeph>abs(cast(<varname>tinyint_col</varname> as smallint))</codeph>.</p>
- </conbody>
- </concept>
-
- <concept id="IMPALA-4266">
- <title>Java udf expression returning string in group by can give incorrect results.</title>
- <conbody>
- <p>
- If the <codeph>GROUP BY</codeph> clause included a call to a Java UDF that returned a string value,
- the UDF could return an incorrect result.
- </p>
- <p><b>Bug:</b> <xref keyref="IMPALA-4266">IMPALA-4266</xref></p>
- <p><b>Severity:</b> High</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
- <p><b>Workaround:</b> Rewrite the expression to concatenate the results of the Java UDF with an
- empty string call. For example, rewrite <codeph>my_hive_udf()</codeph> as
- <codeph>concat(my_hive_udf(), '')</codeph>.
- </p>
- </conbody>
- </concept>
-
- <concept id="IMPALA-3084" rev="IMPALA-3084">
-
- <title>Incorrect assignment of NULL checking predicate through an outer join of a nested collection.</title>
-
- <conbody>
-
- <p>
- A query could return wrong results (too many or too few <codeph>NULL</codeph> values) if it referenced an outer-joined nested
- collection and also contained a null-checking predicate (<codeph>IS NULL</codeph>, <codeph>IS NOT NULL</codeph>, or the
- <codeph><=></codeph> operator) in the <codeph>WHERE</codeph> clause.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-3084">IMPALA-3084</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.</p>
-
- </conbody>
-
- </concept>
-
- <concept id="IMPALA-3094" rev="IMPALA-3094">
-
- <title>Incorrect result due to constant evaluation in query with outer join</title>
-
- <conbody>
-
- <p>
- An <codeph>OUTER JOIN</codeph> query could omit some expected result rows due to a constant such as <codeph>FALSE</codeph> in
- another join clause. For example:
- </p>
-
-<codeblock><![CDATA[
-explain SELECT 1 FROM alltypestiny a1
- INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
- RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
-+---------------------------------------------------------+
-| Explain String |
-+---------------------------------------------------------+
-| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
-| |
-| 00:EMPTYSET |
-+---------------------------------------------------------+
-]]>
-</codeblock>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-3094">IMPALA-3094</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Resolution:</b>
- </p>
-
- <p>
- <b>Workaround:</b>
- </p>
-
- </conbody>
-
- </concept>
-
- <concept id="IMPALA-3126" rev="IMPALA-3126">
-
- <title>Incorrect assignment of an inner join On-clause predicate through an outer join.</title>
-
- <conbody>
-
- <p>
- Impala may return incorrect results for queries that have the following properties:
- </p>
-
- <ul>
- <li>
- <p>
- There is an INNER JOIN following a series of OUTER JOINs.
- </p>
- </li>
-
- <li>
- <p>
- The INNER JOIN has an On-clause with a predicate that references at least two tables that are on the nullable side of the
- preceding OUTER JOINs.
- </p>
- </li>
- </ul>
-
- <p>
- The following query demonstrates the issue:
- </p>
-
-<codeblock>
-select 1 from functional.alltypes a left outer join
- functional.alltypes b on a.id = b.id left outer join
- functional.alltypes c on b.id = c.id right outer join
- functional.alltypes d on c.id = d.id inner join functional.alltypes e
-on b.int_col = c.int_col;
-</codeblock>
-
- <p>
- The following listing shows the incorrect <codeph>EXPLAIN</codeph> plan:
- </p>
-
-<codeblock><![CDATA[
-+-----------------------------------------------------------+
-| Explain String |
-+-----------------------------------------------------------+
-| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
-| |
-| 14:EXCHANGE [UNPARTITIONED] |
-| | |
-| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST] |
-| | |
-| |--13:EXCHANGE [BROADCAST] |
-| | | |
-| | 04:SCAN HDFS [functional.alltypes e] |
-| | partitions=24/24 files=24 size=478.45KB |
-| | |
-| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] |
-| | hash predicates: c.id = d.id |
-| | runtime filters: RF000 <- d.id |
-| | |
-| |--12:EXCHANGE [HASH(d.id)] |
-| | | |
-| | 03:SCAN HDFS [functional.alltypes d] |
-| | partitions=24/24 files=24 size=478.45KB |
-| | |
-| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED] |
-| | hash predicates: b.id = c.id |
-| | other predicates: b.int_col = c.int_col <--- incorrect placement; should be at node 07 or 08
-| | runtime filters: RF001 <- c.int_col |
-| | |
-| |--11:EXCHANGE [HASH(c.id)] |
-| | | |
-| | 02:SCAN HDFS [functional.alltypes c] |
-| | partitions=24/24 files=24 size=478.45KB |
-| | runtime filters: RF000 -> c.id |
-| | |
-| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] |
-| | hash predicates: b.id = a.id |
-| | runtime filters: RF002 <- a.id |
-| | |
-| |--10:EXCHANGE [HASH(a.id)] |
-| | | |
-| | 00:SCAN HDFS [functional.alltypes a] |
-| | partitions=24/24 files=24 size=478.45KB |
-| | |
-| 09:EXCHANGE [HASH(b.id)] |
-| | |
-| 01:SCAN HDFS [functional.alltypes b] |
-| partitions=24/24 files=24 size=478.45KB |
-| runtime filters: RF001 -> b.int_col, RF002 -> b.id |
-+-----------------------------------------------------------+
-]]>
-</codeblock>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-3126">IMPALA-3126</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- <p>
- <b>Workaround:</b> High
- </p>
-
- <p>
- For some queries, this problem can be worked around by placing the problematic <codeph>ON</codeph> clause predicate in the
- <codeph>WHERE</codeph> clause instead, or changing the preceding <codeph>OUTER JOIN</codeph>s to <codeph>INNER JOIN</codeph>s (if
- the <codeph>ON</codeph> clause predicate would discard <codeph>NULL</codeph>s). For example, to fix the problematic query above:
- </p>
-
-<codeblock><![CDATA[
-select 1 from functional.alltypes a
- left outer join functional.alltypes b
- on a.id = b.id
- left outer join functional.alltypes c
- on b.id = c.id
- right outer join functional.alltypes d
- on c.id = d.id
- inner join functional.alltypes e
-where b.int_col = c.int_col
-
-+-----------------------------------------------------------+
-| Explain String |
-+-----------------------------------------------------------+
-| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
-| |
-| 14:EXCHANGE [UNPARTITIONED] |
-| | |
-| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST] |
-| | |
-| |--13:EXCHANGE [BROADCAST] |
-| | | |
-| | 04:SCAN HDFS [functional.alltypes e] |
-| | partitions=24/24 files=24 size=478.45KB |
-| | |
-| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] |
-| | hash predicates: c.id = d.id |
-| | other predicates: b.int_col = c.int_col <-- correct assignment
-| | runtime filters: RF000 <- d.id |
-| | |
-| |--12:EXCHANGE [HASH(d.id)] |
-| | | |
-| | 03:SCAN HDFS [functional.alltypes d] |
-| | partitions=24/24 files=24 size=478.45KB |
-| | |
-| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED] |
-| | hash predicates: b.id = c.id |
-| | |
-| |--11:EXCHANGE [HASH(c.id)] |
-| | | |
-| | 02:SCAN HDFS [functional.alltypes c] |
-| | partitions=24/24 files=24 size=478.45KB |
-| | runtime filters: RF000 -> c.id |
-| | |
-| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] |
-| | hash predicates: b.id = a.id |
-| | runtime filters: RF001 <- a.id |
-| | |
-| |--10:EXCHANGE [HASH(a.id)] |
-| | | |
-| | 00:SCAN HDFS [functional.alltypes a] |
-| | partitions=24/24 files=24 size=478.45KB |
-| | |
-| 09:EXCHANGE [HASH(b.id)] |
-| | |
-| 01:SCAN HDFS [functional.alltypes b] |
-| partitions=24/24 files=24 size=478.45KB |
-| runtime filters: RF001 -> b.id |
-+-----------------------------------------------------------+
-]]>
-</codeblock>
-
- </conbody>
-
- </concept>
-
- <concept id="IMPALA-3006" rev="IMPALA-3006">
-
- <title>Impala may use incorrect bit order with BIT_PACKED encoding</title>
-
- <conbody>
-
- <p>
- Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by Impala is LSB first. The parquet standard says it is MSB first.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-3006">IMPALA-3006</xref>
- </p>
-
- <p>
- <b>Severity:</b> High, but rare in practice because BIT_PACKED is infrequently used, is not written by Impala, and is deprecated
- in Parquet 2.0.
- </p>
-
- </conbody>
-
- </concept>
-
- <concept id="IMPALA-3082" rev="IMPALA-3082">
-
- <title>BST between 1972 and 1995</title>
-
- <conbody>
-
- <p>
- The calculation of start and end times for the BST (British Summer Time) time zone could be incorrect between 1972 and 1995.
- Between 1972 and 1995, BST began and ended at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
- third) and fourth Sunday in October. For example, both function calls should return 13, but actually return 12, in a query such
- as:
- </p>
-
-<codeblock>
-select
- extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 'Europe/London'), "hour") summer70start,
- extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 'Europe/London'), "hour") summer70end;
-</codeblock>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-3082">IMPALA-3082</xref>
- </p>
-
- <p>
- <b>Severity:</b> High
- </p>
-
- </conbody>
-
- </concept>
-
- <concept id="IMPALA-1170" rev="IMPALA-1170">
-
- <title>parse_url() returns incorrect result if @ character in URL</title>
-
- <conbody>
-
<p>
- If a URL contains an <codeph>@</codeph> character, the <codeph>parse_url()</codeph> function could return an incorrect value for
- the hostname field.
+ <b>Resolution:</b> After the upgrade, follow the instructions in
+ <xref keyref="convert_read_size"/> to check if your queries are affected by these
+ changes and to modify your configuration settings if so.
</p>
<p>
- <b>Bug:</b> <xref keyref="IMPALA-1170"></xref>IMPALA-1170
+ <b>Apache Issue:</b> <xref keyref="IMPALA-6028">IMPALA-6028</xref>
</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and <keyword keyref="impala234"/>.</p>
-
</conbody>
</concept>
- <concept id="IMPALA-2422" rev="IMPALA-2422">
+ <concept id="IMPALA-5605">
- <title>% escaping does not work correctly when occurs at the end in a LIKE clause</title>
+ <title>Configuration to prevent crashes caused by thread resource limits</title>
<conbody>
<p>
- If the final character in the RHS argument of a <codeph>LIKE</codeph> operator is an escaped <codeph>\%</codeph> character, it
- does not match a <codeph>%</codeph> final character of the LHS argument.
+ Impala could encounter a serious error due to resource usage under very high
+ concurrency. The error message is similar to:
</p>
+<codeblock><![CDATA[
+F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
+terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
+]]>
+</codeblock>
+
<p>
- <b>Bug:</b> <xref keyref="IMPALA-2422">IMPALA-2422</xref>
+ <b>Bug:</b> <xref keyref="IMPALA-5605">IMPALA-5605</xref>
</p>
- </conbody>
-
- </concept>
-
- <concept id="IMPALA-397" rev="IMPALA-397">
-
- <title>ORDER BY rand() does not work.</title>
-
- <conbody>
+ <p>
+ <b>Severity:</b> High
+ </p>
<p>
- Because the value for <codeph>rand()</codeph> is computed early in a query, using an <codeph>ORDER BY</codeph> expression
- involving a call to <codeph>rand()</codeph> does not actually randomize the results.
+ <b>Workaround:</b> To prevent such errors, configure each host running an
+ <cmdname>impalad</cmdname> daemon with the following settings:
</p>
+<codeblock>
+echo 2000000 > /proc/sys/kernel/threads-max
+echo 2000000 > /proc/sys/kernel/pid_max
+echo 8000000 > /proc/sys/vm/max_map_count
+</codeblock>
+
<p>
- <b>Bug:</b> <xref keyref="IMPALA-397">IMPALA-397</xref>
+ Add the following lines in <filepath>/etc/security/limits.conf</filepath>:
</p>
+<codeblock>
+impala soft nproc 262144
+impala hard nproc 262144
+</codeblock>
+
</conbody>
</concept>
- <concept id="IMPALA-2643" rev="IMPALA-2643">
+ <concept id="drop_table_purge_s3a">
- <title>Duplicated column in inline view causes dropping null slots during scan</title>
+ <title><b>Breakpad minidumps can be very large when the thread count is high</b></title>
<conbody>
<p>
- If the same column is queried twice within a view, <codeph>NULL</codeph> values for that column are omitted. For example, the
- result of <codeph>COUNT(*)</codeph> on the view could be less than expected.
+ The size of the breakpad minidump files grows linearly with the number of threads. By
+ default, each thread adds 8 KB to the minidump size. Minidump files could consume
+ significant disk space when the daemons have a high number of threads.
</p>
<p>
- <b>Bug:</b> <xref keyref="IMPALA-2643">IMPALA-2643</xref>
+ <b>Workaround:</b> Add
+ <systemoutput>--minidump_size_limit_hint_kb=size</systemoutput>
+ to set a soft upper limit on the size of each minidump file. If the minidump file
+ would exceed that limit, Impala reduces the amount of information for each thread from
+ 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB
+ per thread after that.) The minidump file can still grow larger than the "hinted"
+ size. For example, if you have 10,000 threads, the minidump file can be more than 20
+ MB.
</p>
<p>
- <b>Workaround:</b> Avoid selecting the same column twice within an inline view.
+ <b>Apache Issue:</b>
+ <xref href="https://issues.cloudera.org/browse/IMPALA-3509"
+ format="html" scope="external">IMPALA-3509</xref>
</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala2210"/>.</p>
-
</conbody>
</concept>
- <concept id="IMPALA-1459" rev="IMPALA-1459">
-
-<!-- Not part of Alex's spreadsheet -->
+ <concept id="IMPALA-691">
- <title>Incorrect assignment of predicates through an outer join in an inline view.</title>
+ <title><b>Process mem limit does not account for the JVM's memory usage</b></title>
<conbody>
<p>
- A query involving an <codeph>OUTER JOIN</codeph> clause where one of the table references is an inline view might apply predicates
- from the <codeph>ON</codeph> clause incorrectly.
+ Some memory allocated by the JVM used internally by Impala is not counted against the
+ memory limit for the impalad daemon.
</p>
<p>
- <b>Bug:</b> <xref keyref="IMPALA-1459">IMPALA-1459</xref>
+ <b>Workaround:</b> To monitor overall memory usage, use the top command, or add the
+ memory figures in the Impala web UI <b>/memz</b> tab to JVM memory usage shown on the
+ <b>/metrics</b> tab.
</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.</p>
+ <p>
+ <b>Apache Issue:</b>
+ <xref href="https://issues.cloudera.org/browse/IMPALA-691"
+ format="html" scope="external">IMPALA-691</xref>
+ </p>
</conbody>
</concept>
- <concept id="IMPALA-2603" rev="IMPALA-2603">
-
- <title>Crash: impala::Coordinator::ValidateCollectionSlots</title>
+ </concept>
- <conbody>
+ <concept id="known_issues_correctness">
- <p>
- A query could encounter a serious error if includes multiple nested levels of <codeph>INNER JOIN</codeph> clauses involving
- subqueries.
- </p>
+ <title id="ki_correctness">Impala Known Issues: Correctness</title>
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-2603">IMPALA-2603</xref>
- </p>
+ <conbody>
- </conbody>
+ <p>
+ These issues can cause incorrect or unexpected results from queries. They typically only
+ arise in very specific circumstances.
+ </p>
- </concept>
+ </conbody>
- <concept id="IMPALA-2665" rev="IMPALA-2665">
+ <concept id="IMPALA-3094" rev="IMPALA-3094">
- <title>Incorrect assignment of On-clause predicate inside inline view with an outer join.</title>
+ <title>Incorrect result due to constant evaluation in query with outer join</title>
<conbody>
<p>
- A query might return incorrect results due to wrong predicate assignment in the following scenario:
+ An <codeph>OUTER JOIN</codeph> query could omit some expected result rows due to a
+ constant such as <codeph>FALSE</codeph> in another join clause. For example:
</p>
- <ol>
- <li>
- There is an inline view that contains an outer join
- </li>
-
- <li>
- That inline view is joined with another table in the enclosing query block
- </li>
-
- <li>
- That join has an On-clause containing a predicate that only references columns originating from the outer-joined tables inside
- the inline view
- </li>
- </ol>
+<codeblock><![CDATA[
+explain SELECT 1 FROM alltypestiny a1
+ INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
+ RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
++---------------------------------------------------------+
+| Explain String |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
+| |
+| 00:EMPTYSET |
++---------------------------------------------------------+
+]]>
+</codeblock>
<p>
- <b>Bug:</b> <xref keyref="IMPALA-2665">IMPALA-2665</xref>
+ <b>Bug:</b> <xref keyref="IMPALA-3094">IMPALA-3094</xref>
</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.</p>
+ <p>
+ <b>Severity:</b> High
+ </p>
</conbody>
</concept>
- <concept id="IMPALA-2144" rev="IMPALA-2144">
+ <concept id="IMPALA-3006" rev="IMPALA-3006">
- <title>Wrong assignment of having clause predicate across outer join</title>
+ <title>Impala may use incorrect bit order with BIT_PACKED encoding</title>
<conbody>
<p>
- In an <codeph>OUTER JOIN</codeph> query with a <codeph>HAVING</codeph> clause, the comparison from the <codeph>HAVING</codeph>
- clause might be applied at the wrong stage of query processing, leading to incorrect results.
+ Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by Impala is LSB first.
+ The parquet standard says it is MSB first.
</p>
<p>
- <b>Bug:</b> <xref keyref="IMPALA-2144">IMPALA-2144</xref>
+ <b>Bug:</b> <xref keyref="IMPALA-3006">IMPALA-3006</xref>
</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
+ <p>
+ <b>Severity:</b> High, but rare in practice because BIT_PACKED is infrequently used,
+ is not written by Impala, and is deprecated in Parquet 2.0.
+ </p>
</conbody>
</concept>
- <concept id="IMPALA-2093" rev="IMPALA-2093">
+ <concept id="IMPALA-3082" rev="IMPALA-3082">
- <title>Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate</title>
+ <title>BST between 1972 and 1995</title>
<conbody>
<p>
- A <codeph>NOT IN</codeph> operator with a subquery that calls an aggregate function, such as <codeph>NOT IN (SELECT
- SUM(...))</codeph>, could return incorrect results.
+ The calculation of start and end times for the BST (British Summer Time) time zone
+ could be incorrect between 1972 and 1995. Between 1972 and 1995, BST began and ended
+ at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
+ third) and fourth Sunday in October. For example, both function calls should return
+ 13, but actually return 12, in a query such as:
</p>
+<codeblock>
+select
+ extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 'Europe/London'), "hour") summer70start,
+ extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 'Europe/London'), "hour") summer70end;
+</codeblock>
+
<p>
- <b>Bug:</b> <xref keyref="IMPALA-2093">IMPALA-2093</xref>
+ <b>Bug:</b> <xref keyref="IMPALA-3082">IMPALA-3082</xref>
</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and <keyword keyref="impala234"/>.</p>
+ <p>
+ <b>Severity:</b> High
+ </p>
</conbody>
</concept>
- </concept>
-
- <concept id="known_issues_metadata">
-
- <title id="ki_metadata">Impala Known Issues: Metadata</title>
-
- <conbody>
-
- <p>
- These issues affect how Impala interacts with metadata. They cover areas such as the metastore database, the <codeph>COMPUTE
- STATS</codeph> statement, and the Impala <cmdname>catalogd</cmdname> daemon.
- </p>
-
- </conbody>
-
- <concept id="IMPALA-2648" rev="IMPALA-2648">
+ <concept id="IMPALA-2422" rev="IMPALA-2422">
- <title>Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats</title>
+ <title>% escaping does not work correctly when occurs at the end in a LIKE clause</title>
<conbody>
<p>
- Incremental stats use up about 400 bytes per partition for each column. For example, for a table with 20K partitions and 100
- columns, the memory overhead from incremental statistics is about 800 MB. When serialized for transmission across the network,
- this metadata exceeds the 2 GB Java array size limit and leads to a <codeph>catalogd</codeph> crash.
+ If the final character in the RHS argument of a <codeph>LIKE</codeph> operator is an
+ escaped <codeph>\%</codeph> character, it does not match a <codeph>%</codeph> final
+ character of the LHS argument.
</p>
<p>
- <b>Bugs:</b> <xref keyref="IMPALA-2647">IMPALA-2647</xref>,
- <xref keyref="IMPALA-2648">IMPALA-2648</xref>,
- <xref keyref="IMPALA-2649">IMPALA-2649</xref>
- </p>
-
- <p>
- <b>Workaround:</b> If feasible, compute full stats periodically and avoid computing incremental stats for that table. The
- scalability of incremental stats computation is a continuing work item.
+ <b>Bug:</b> <xref keyref="IMPALA-2422">IMPALA-2422</xref>
</p>
</conbody>
</concept>
- <concept id="IMPALA-1420" rev="IMPALA-1420 2.0.0">
-
-<!-- Not part of Alex's spreadsheet -->
+ <concept id="IMPALA-2603" rev="IMPALA-2603">
- <title>Can't update stats manually via alter table after upgrading to <keyword keyref="impala20"/></title>
+ <title>Crash: impala::Coordinator::ValidateCollectionSlots</title>
<conbody>
- <p></p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-1420">IMPALA-1420</xref>
- </p>
-
<p>
- <b>Workaround:</b> On <keyword keyref="impala20"/>, when adjusting table statistics manually by setting the <codeph>numRows</codeph>, you must also
- enable the Boolean property <codeph>STATS_GENERATED_VIA_STATS_TASK</codeph>. For example, use a statement like the following to
- set both properties with a single <codeph>ALTER TABLE</codeph> statement:
+ A query could encounter a serious error if includes multiple nested levels of
+ <codeph>INNER JOIN</codeph> clauses involving subqueries.
</p>
-<codeblock>ALTER TABLE <varname>table_name</varname> SET TBLPROPERTIES('numRows'='<varname>new_value</varname>', 'STATS_GENERATED_VIA_STATS_TASK' = 'true');</codeblock>
-
<p>
- <b>Resolution:</b> The underlying cause is the issue
- <xref href="https://issues.apache.org/jira/browse/HIVE-8648" scope="external" format="html">HIVE-8648</xref> that affects the
- metastore in Hive 0.13. The workaround is only needed until the fix for this issue is incorporated into release of <keyword keyref="distro"/>.
+ <b>Bug:</b> <xref keyref="IMPALA-2603">IMPALA-2603</xref>
</p>
</conbody>
@@ -1666,6 +606,8 @@ select
</concept>
+<!--<concept id="known_issues_metadata"><title id="ki_metadata">Impala Known Issues: Metadata</title><conbody><p> These issues affect how Impala interacts with metadata. They cover areas such as the metastore database, the <codeph>COMPUTE STATS</codeph> statement, and the Impala <cmdname>catalogd</cmdname> daemon. </p></conbody></concept>-->
+
<concept id="known_issues_interop">
<title id="ki_interop">Impala Known Issues: Interoperability</title>
@@ -1673,8 +615,8 @@ select
<conbody>
<p>
- These issues affect the ability to interchange data between Impala and other database systems. They cover areas such as data types
- and file formats.
+ These issues affect the ability to interchange data between Impala and other database
+ systems. They cover areas such as data types and file formats.
</p>
</conbody>
@@ -1688,26 +630,32 @@ select
<conbody>
<p>
- This issue can occur either on old Avro tables (created prior to Hive 1.1) or when changing the Avro schema file by
- adding or removing columns. Columns added to the schema file will not show up in the output of the <codeph>DESCRIBE
- FORMATTED</codeph> command. Removing columns from the schema file will trigger a <codeph>NullPointerException</codeph>.
+ This issue can occur either on old Avro tables (created prior to Hive 1.1) or when
+ changing the Avro schema file by adding or removing columns. Columns added to the
+ schema file will not show up in the output of the <codeph>DESCRIBE FORMATTED</codeph>
+ command. Removing columns from the schema file will trigger a
+ <codeph>NullPointerException</codeph>.
</p>
<p>
- As a workaround, you can use the output of <codeph>SHOW CREATE TABLE</codeph> to drop and recreate the table. This will populate
- the Hive metastore database with the correct column definitions.
+ As a workaround, you can use the output of <codeph>SHOW CREATE TABLE</codeph> to drop
+ and recreate the table. This will populate the Hive metastore database with the
+ correct column definitions.
</p>
<note type="warning">
- <p>Only use this for external tables, or Impala will remove the data
- files. In case of an internal table, set it to external first:
+ <p>
+ Only use this for external tables, or Impala will remove the data files. In case of
+ an internal table, set it to external first:
<codeblock>
ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
</codeblock>
- (The part in parentheses is case sensitive.) Make sure to pick the
- right choice between internal and external when recreating the table.
- See <xref href="impala_tables.xml#tables"/> for the differences
- between internal and external tables. </p></note>
+ (The part in parentheses is case sensitive.) Make sure to pick the right choice
+ between internal and external when recreating the table. See
+ <xref href="impala_tables.xml#tables"/> for the differences between internal and
+ external tables.
+ </p>
+ </note>
<p>
<b>Severity:</b> High
@@ -1717,26 +665,6 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
</concept>
- <concept id="IMP-469">
-
-<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? -->
-
- <title>Deviation from Hive behavior: Impala does not do implicit casts between string and numeric and boolean types.</title>
-
- <conbody>
-
- <p>
- <b>Anticipated Resolution</b>: None
- </p>
-
- <p>
- <b>Workaround:</b> Use explicit casts.
- </p>
-
- </conbody>
-
- </concept>
-
<concept id="IMP-175">
<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? -->
@@ -1746,8 +674,8 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<conbody>
<p>
- Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum
- allowed value of type (Hive returns NULL).
+ Impala behavior differs from Hive with respect to out of range float/double values.
+ Out of range values are returned as maximum allowed value of type (Hive returns NULL).
</p>
<p>
@@ -1767,14 +695,16 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<conbody>
<p>
- For compatibility with Impala, the value for the Flume HDFS Sink <codeph>hdfs.writeFormat</codeph> must be set to
- <codeph>Text</codeph>, rather than its default value of <codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph> setting
- must be changed to <codeph>Text</codeph> before creating data files with Flume; otherwise, those files cannot be read by either
- Impala or Hive.
+ For compatibility with Impala, the value for the Flume HDFS Sink
+ <codeph>hdfs.writeFormat</codeph> must be set to <codeph>Text</codeph>, rather than
+ its default value of <codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph>
+ setting must be changed to <codeph>Text</codeph> before creating data files with
+ Flume; otherwise, those files cannot be read by either Impala or Hive.
</p>
<p>
- <b>Resolution:</b> This information has been requested to be added to the upstream Flume documentation.
+ <b>Resolution:</b> This information has been requested to be added to the upstream
+ Flume documentation.
</p>
</conbody>
@@ -1790,7 +720,8 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<conbody>
<p>
- Querying certain Avro tables could cause a crash or return no rows, even though Impala could <codeph>DESCRIBE</codeph> the table.
+ Querying certain Avro tables could cause a crash or return no rows, even though Impala
+ could <codeph>DESCRIBE</codeph> the table.
</p>
<p>
@@ -1798,13 +729,14 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
</p>
<p>
- <b>Workaround:</b> Swap the order of the fields in the schema specification. For example, <codeph>["null", "string"]</codeph>
- instead of <codeph>["string", "null"]</codeph>.
+ <b>Workaround:</b> Swap the order of the fields in the schema specification. For
+ example, <codeph>["null", "string"]</codeph> instead of <codeph>["string",
+ "null"]</codeph>.
</p>
<p>
- <b>Resolution:</b> Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the
- crashing issue is resolved.
+ <b>Resolution:</b> Not allowing this syntax agrees with the Avro specification, so it
+ may still cause an error even when the crashing issue is resolved.
</p>
</conbody>
@@ -1820,7 +752,8 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
<conbody>
<p>
- If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.
+ If an Avro table has a schema definition with a trailing semicolon, Impala encounters
+ an error when the table is queried.
</p>
<p>
@@ -1835,138 +768,80 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
</concept>
- <concept id="IMPALA-2154" rev="IMPALA-2154">
+ <concept id="IMPALA-1652" rev="IMPALA-1652">
-<!-- Not part of Alex's spreadsheet -->
+<!-- To do: Isn't this more a correctness issue? -->
- <title>Fix decompressor to allow parsing gzips with multiple streams</title>
+ <title>Incorrect results with basic predicate on CHAR typed column</title>
<conbody>
<p>
- Currently, Impala can only read gzipped files containing a single stream. If a gzipped file contains multiple concatenated
- streams, the Impala query only processes the data from the first stream.
+ When comparing a <codeph>CHAR</codeph> column value to a string literal, the literal
+ value is not blank-padded and so the comparison might fail when it should match.
</p>
<p>
- <b>Bug:</b> <xref keyref="IMPALA-2154">IMPALA-2154</xref>
+ <b>Bug:</b> <xref keyref="IMPALA-1652">IMPALA-1652</xref>
</p>
<p>
- <b>Workaround:</b> Use a different gzip tool to compress file to a single stream file.
+ <b>Workaround:</b> Use the <codeph>RPAD()</codeph> function to blank-pad literals
+ compared with <codeph>CHAR</codeph> columns to the expected length.
</p>
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
-
</conbody>
</concept>
- <concept id="IMPALA-1578" rev="IMPALA-1578">
-
-<!-- Not part of Alex's spreadsheet -->
-
- <title>Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block</title>
-
- <conbody>
-
- <p>
- If a carriage return / newline pair of characters in a text table is split between HDFS data blocks, Impala incorrectly processes
- the row following the <codeph>\n\r</codeph> pair twice.
- </p>
-
- <p>
- <b>Bug:</b> <xref keyref="IMPALA-1578">IMPALA-1578</xref>
- </p>
+ </concept>
- <p>
- <b>Workaround:</b> Use the Parquet format for large volumes of data where practical.
- </p>
+ <concept id="known_issues_limitations">
- <p><b>Resolution:</b> Fixed in <keyword keyref="impala260"/>.</p>
+ <title>Impala Known Issues: Limitations</title>
- </conbody>
+ <conbody>
- </concept>
+ <p>
+ These issues are current limitations of Impala that require evaluation as you plan how
+ to integrate Impala into your data management workflow.
+ </p>
- <concept id="IMPALA-1862" rev="IMPALA-1862">
+ </conbody>
-<!-- Not part of Alex's spreadsheet -->
+ <concept id="IMPALA-4551">
- <title>Invalid bool value not reported as a scanner error</title>
+ <title>Set limits on size of expression trees</title>
<conbody>
<p>
- In some cases, an invalid <codeph>BOOLEAN</codeph> value read from a table does not produce a warning message about the bad value.
- The result is still <codeph>NULL</codeph> as expected. Therefore, this is not a query correctness issue, but it could lead to
- overlooking the presence of invalid data.
+ Very deeply nested expressions within queries can exceed internal Impala limits,
+ leading to excessive memory usage.
</p>
<p>
- <b>Bug:</b> <xref keyref="IMPALA-1862">IMPALA-1862</xref>
+ <b>Bug:</b> <xref keyref="IMPALA-4551">IMPALA-4551</xref>
</p>
- </conbody>
-
- </concept>
-
- <concept id="IMPALA-1652" rev="IMPALA-1652">
-
-<!-- To do: Isn't this more a correctness issue? -->
-
- <title>Incorrect results with basic predicate on CHAR typed column.</title>
-
- <conbody>
-
<p>
-
<TRUNCATED>
[2/4] impala git commit: Revert "IMPALA-6869: [DOCS] Update Known
Issues doc for 2.12"
Posted by ar...@apache.org.
Revert "IMPALA-6869: [DOCS] Update Known Issues doc for 2.12"
This reverts commit b9271ccf0e2e2e8cfbe7c6538aca0109c68acbef, mistakenly
put into master. Prevent 2.x from getting this since it's the correct
set of known issues for 2.12.
Cherry-picks: not for 2.x.
Change-Id: Ifb9fd5bb702840e225a9922e8b9090b142f914df
Reviewed-on: http://gerrit.cloudera.org:8080/10164
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/62eed0d5
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/62eed0d5
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/62eed0d5
Branch: refs/heads/master
Commit: 62eed0d5d582a3bc7ffe0fe97a800e314ade25c2
Parents: da363a9
Author: Michael Brown <mi...@cloudera.com>
Authored: Mon Apr 23 15:50:23 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Mon Apr 23 23:13:49 2018 +0000
----------------------------------------------------------------------
docs/topics/impala_known_issues.xml | 1335 ++++++++++--------------------
1 file changed, 441 insertions(+), 894 deletions(-)
----------------------------------------------------------------------
[4/4] impala git commit: IMPALA-6872: [DOCS] Known Issues in Impala
3.0
Posted by ar...@apache.org.
IMPALA-6872: [DOCS] Known Issues in Impala 3.0
Cherry-picks: not for 2.x.
Change-Id: I0d89198106136d2e4abced1ee794def19e23ecbf
Reviewed-on: http://gerrit.cloudera.org:8080/10139
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/c84764d5
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/c84764d5
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/c84764d5
Branch: refs/heads/master
Commit: c84764d571021b7a6653a70f28ce04978aa8993d
Parents: 62eed0d
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Fri Apr 20 13:38:13 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Mon Apr 23 23:39:46 2018 +0000
----------------------------------------------------------------------
docs/topics/impala_known_issues.xml | 1851 ++++++------------------------
1 file changed, 375 insertions(+), 1476 deletions(-)
----------------------------------------------------------------------