You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2018/07/04 01:07:43 UTC
[1/6] impala git commit: IMPALA-5981: [DOCS] Documented SET=""
Repository: impala
Updated Branches:
refs/heads/master 2b6d71fee -> 61e6a4777
IMPALA-5981: [DOCS] Documented SET=""
Also, refactored the Impala SET doc and moved the command SET to
the Impala Shell Commands doc.
Change-Id: I7211405d5cc0a548c05ea5218798591873c14417
Reviewed-on: http://gerrit.cloudera.org:8080/10816
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/d03a2d63
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/d03a2d63
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/d03a2d63
Branch: refs/heads/master
Commit: d03a2d63fef2d29083fa5ee85b89b85891e923fc
Parents: 2b6d71f
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Mon Jun 25 16:30:14 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue Jul 3 16:52:42 2018 +0000
----------------------------------------------------------------------
docs/topics/impala_set.xml | 305 +++++++++--------------------
docs/topics/impala_shell_commands.xml | 35 ++--
2 files changed, 110 insertions(+), 230 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/impala/blob/d03a2d63/docs/topics/impala_set.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_set.xml b/docs/topics/impala_set.xml
index 0020188..ddf89da 100644
--- a/docs/topics/impala_set.xml
+++ b/docs/topics/impala_set.xml
@@ -21,7 +21,13 @@ under the License.
<concept rev="2.0.0" id="set">
<title>SET Statement</title>
- <titlealts audience="PDF"><navtitle>SET</navtitle></titlealts>
+
+ <titlealts audience="PDF">
+
+ <navtitle>SET</navtitle>
+
+ </titlealts>
+
<prolog>
<metadata>
<data name="Category" value="Impala"/>
@@ -36,255 +42,130 @@ under the License.
<conbody>
<p rev="2.0.0">
- <indexterm audience="hidden">SET statement</indexterm>
- Specifies values for query options that control the runtime behavior of other statements within the same
- session.
+ The <codeph>SET</codeph> statement specifies values for query options that control the
+ runtime behavior of other statements within the same session.
</p>
- <p rev="2.5.0 IMPALA-2180">
- In <keyword keyref="impala25_full"/> and higher, <codeph>SET</codeph> also defines user-specified substitution variables for
- the <cmdname>impala-shell</cmdname> interpreter. This feature uses the <codeph>SET</codeph> command
- built into <cmdname>impala-shell</cmdname> instead of the SQL <codeph>SET</codeph> statement.
- Therefore the substitution mechanism only works with queries processed by <cmdname>impala-shell</cmdname>,
- not with queries submitted through JDBC or ODBC.
+ <p>
+ When issued in <codeph>impala-shell</codeph>, the <codeph>SET</codeph> command is
+ interpreted as an <codeph>impala-shell</codeph> command that has differences from the SQL
+ <codeph>SET</codeph> statement. See
+ <xref
+ href="impala_shell_commands.xml#shell_commands/set_cmd"/> for the
+ information about the <codeph>SET</codeph> command in <codeph>impala-shell</codeph>.
</p>
- <note type="important" rev="2.11.0 IMPALA-2181">
- <p>
- In <keyword keyref="impala211_full"/> and higher, the output of the <codeph>SET</codeph>
- statement changes in some important ways:
- </p>
- <ul>
- <li>
- <p>
- The options are divided into groups: <codeph>Regular Query Options</codeph>,
- <codeph>Advanced Query Options</codeph>, <codeph>Development Query Options</codeph>, and
- <codeph>Deprecated Query Options</codeph>.
- </p>
- </li>
- <li>
- <p>
- The advanced options are intended for use in specific
- kinds of performance tuning and debugging scenarios. The development options are
- related to internal development of Impala or features that are not yet finalized;
- these options might be changed or removed without notice.
- The deprecated options are related to features that are removed or changed so that
- the options no longer have any purpose; these options might be removed in future
- versions.
- </p>
- </li>
- <li>
- <p>
- By default, only the first two groups (regular and advanced) are
- displayed by the <codeph>SET</codeph> command. Use the syntax <codeph>SET ALL</codeph>
- to see all groups of options.
- </p>
- </li>
- <li>
- <p>
- <cmdname>impala-shell</cmdname> options and user-specified variables are always displayed
- at the end of the list of query options, after all appropriate option groups.
- </p>
- </li>
- <li>
- <p>
- When the <codeph>SET</codeph> command is run through the JDBC or ODBC interfaces,
- the result set has a new third column, <codeph>level</codeph>, indicating which
- group each option belongs to. The same distinction of <codeph>SET</codeph>
- returning the regular and advanced options, and <codeph>SET ALL</codeph>
- returning all option groups, applies to JDBC and ODBC also.
- </p>
- </li>
- </ul>
- </note>
-
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
-<codeblock>SET [<varname>query_option</varname>=<varname>option_value</varname>]
+<codeblock>SET
<ph rev="2.11.0 IMPALA-2181">SET ALL</ph>
+SET <varname>query_option</varname>=<varname>option_value</varname>
+SET <varname>query_option</varname>=""
</codeblock>
<p rev="2.11.0 IMPALA-2181">
- <codeph>SET</codeph> and <codeph>SET ALL</codeph> with no arguments return a
- result set consisting of all the applicable query options and their current values.
+ <codeph>SET</codeph> and <codeph>SET ALL</codeph> with no arguments return a result set
+ consisting of all the applicable query options and their current values.
</p>
<p>
- The query option name and any string argument values are case-insensitive.
+ The <varname>query_option</varname> and <varname>option_value</varname> are
+ case-insensitive.
</p>
<p>
- Each query option has a specific allowed notation for its arguments. Boolean options can be enabled and
- disabled by assigning values of either <codeph>true</codeph> and <codeph>false</codeph>, or
- <codeph>1</codeph> and <codeph>0</codeph>. Some numeric options accept a final character signifying the unit,
- such as <codeph>2g</codeph> for 2 gigabytes or <codeph>100m</codeph> for 100 megabytes. See
- <xref href="impala_query_options.xml#query_options"/> for the details of each query option.
+ Unlike the <codeph>impala-shell</codeph> command version of <codeph>SET</codeph>, when
+ used as a SQL statement, the string values for <varname>option_value</varname> need to be
+ quoted, e.g. <codeph>SET option="new_value"</codeph>.
</p>
<p>
- <b>Setting query options during impala-shell invocation:</b>
- </p>
-
- <p rev="2.11.0 IMPALA-5736">
- In <keyword keyref="impala211_full"/> and higher, you can use one or more command-line options
- of the form <codeph>--query_option=<varname>option</varname>=<varname>value</varname></codeph>
- when running the <cmdname>impala-shell</cmdname> command. The corresponding query option settings
- take effect for that <cmdname>impala-shell</cmdname> session.
+ The <codeph>SET <varname>query_option</varname> = ""</codeph> statement unsets the value
+ of the <varname>query_option</varname> in the current session, reverting it to the default
+ state. In <codeph>impala-shell</codeph>, use the <codeph>UNSET</codeph> command to set a
+ query option to it default.
</p>
<p>
- <b>User-specified substitution variables:</b>
- </p>
-
- <p rev="2.5.0 IMPALA-2180">
- In <keyword keyref="impala25_full"/> and higher, you can specify your own names and string substitution values
- within the <cmdname>impala-shell</cmdname> interpreter. Once a substitution variable is set up,
- its value is inserted into any SQL statement in that same <cmdname>impala-shell</cmdname> session
- that contains the notation <codeph>${var:<varname>varname</varname>}</codeph>.
- Using <codeph>SET</codeph> in an interactive <cmdname>impala-shell</cmdname> session overrides
- any value for that same variable passed in through the <codeph>--var=<varname>varname</varname>=<varname>value</varname></codeph>
- command-line option.
- </p>
-
- <p rev="2.5.0 IMPALA-2180">
- For example, to set up some default parameters for report queries, but then override those default
- within an <cmdname>impala-shell</cmdname> session, you might issue commands and statements such as
- the following:
- </p>
-
-<codeblock rev="2.5.0 IMPALA-2180">
--- Initial setup for this example.
-create table staging_table (s string);
-insert into staging_table values ('foo'), ('bar'), ('bletch');
-
-create table production_table (s string);
-insert into production_table values ('North America'), ('EMEA'), ('Asia');
-quit;
-
--- Start impala-shell with user-specified substitution variables,
--- run a query, then override the variables with SET and run the query again.
-$ impala-shell --var=table_name=staging_table --var=cutoff=2
-... <varname>banner message</varname> ...
-[localhost:21000] > select s from ${var:table_name} order by s limit ${var:cutoff};
-Query: select s from staging_table order by s limit 2
-+--------+
-| s |
-+--------+
-| bar |
-| bletch |
-+--------+
-Fetched 2 row(s) in 1.06s
-
-[localhost:21000] > set var:table_name=production_table;
-Variable TABLE_NAME set to production_table
-[localhost:21000] > set var:cutoff=3;
-Variable CUTOFF set to 3
-
-[localhost:21000] > select s from ${var:table_name} order by s limit ${var:cutoff};
-Query: select s from production_table order by s limit 3
-+---------------+
-| s |
-+---------------+
-| Asia |
-| EMEA |
-| North America |
-+---------------+
-</codeblock>
-
- <p rev="2.5.0 IMPALA-2180">
- The following example shows how <codeph>SET ALL</codeph> with no parameters displays
- all user-specified substitution variables, and how <codeph>UNSET</codeph> removes
- the substitution variable entirely:
- </p>
-
-<codeblock rev="2.11.0 IMPALA-2181">
-[localhost:21000] > set all;
-Query options (defaults shown in []):
-ABORT_ON_ERROR: [0]
-COMPRESSION_CODEC: []
-DISABLE_CODEGEN: [0]
-...
-
-Advanced Query Options:
-APPX_COUNT_DISTINCT: [0]
-BUFFER_POOL_LIMIT: []
-DEFAULT_JOIN_DISTRIBUTION_MODE: [0]
-...
-
-Development Query Options:
-BATCH_SIZE: [0]
-DEBUG_ACTION: []
-DECIMAL_V2: [0]
-...
-
-Deprecated Query Options:
-ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0]
-ALLOW_UNSUPPORTED_FORMATS: [0]
-DEFAULT_ORDER_BY_LIMIT: [-1]
-...
-
-Shell Options
- LIVE_PROGRESS: False
- LIVE_SUMMARY: False
-
-Variables:
- CUTOFF: 3
- TABLE_NAME: staging_table
-
-[localhost:21000] > unset var:cutoff;
-Unsetting variable CUTOFF
-[localhost:21000] > select s from ${var:table_name} order by s limit ${var:cutoff};
-Error: Unknown variable CUTOFF
-</codeblock>
-
- <p rev="2.5.0 IMPALA-2180">
- See <xref href="impala_shell_running_commands.xml"/> for more examples of using the
- <codeph>--var</codeph>, <codeph>SET</codeph>, and <codeph>${var:<varname>varname</varname>}</codeph>
- substitution technique in <cmdname>impala-shell</cmdname>.
+ Each query option has a specific allowed notation for its arguments. See
+ <xref href="impala_query_options.xml#query_options"/> for the details of each query
+ option.
</p>
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
<p>
- <codeph>MEM_LIMIT</codeph> is probably the most commonly used query option. You can specify a high value to
- allow a resource-intensive query to complete. For testing how queries would work on memory-constrained
- systems, you might specify an artificially low value.
- </p>
-
- <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
-
- <p conref="../shared/impala_common.xml#common/example_blurb"/>
-
- <p>
- The following example sets some numeric and some Boolean query options to control usage of memory, disk
- space, and timeout periods, then runs a query whose success could depend on the options in effect:
- </p>
-
-<codeblock>set mem_limit=64g;
-set DISABLE_UNSAFE_SPILLS=true;
-set parquet_file_size=400m;
-set RESERVATION_REQUEST_TIMEOUT=900000;
-insert overwrite parquet_table select c1, c2, count(c3) from text_table group by c1, c2, c3;
-</codeblock>
+ In <keyword keyref="impala211_full"/> and higher, the outputs of the <codeph>SET</codeph>
+ and <codeph>SET ALL</codeph> statements were reorganized as below:
+ </p>
+
+ <ul>
+ <li>
+ <p>
+ The options are divided into groups: <codeph>Regular Query Options</codeph>,
+ <codeph>Advanced Query Options</codeph>, <codeph>Development Query Options</codeph>,
+ and <codeph>Deprecated Query Options</codeph>.
+ </p>
+ <ul>
+ <li>
+ <p>
+ The advanced options are intended for use in specific kinds of performance tuning
+ and debugging scenarios.
+ </p>
+ </li>
+
+ <li>
+ <p>
+ The development options are related to internal development of Impala or features
+ that are not yet finalized. These options might be changed or removed without
+ notice.
+ </p>
+ </li>
+
+ <li>
+ <p>
+ The deprecated options are related to features that are removed or changed so that
+ the options no longer have any purpose. These options might be removed in future
+ versions.
+ </p>
+ </li>
+ </ul>
+ </li>
+
+ <li>
+ <p>
+ By default, only the first two groups, regular and advanced, are displayed by the
+ <codeph>SET</codeph> command. Use <codeph>SET ALL</codeph> to see all groups of
+ options.
+ </p>
+ </li>
+
+ <li>
+ <p>
+ <cmdname>impala-shell</cmdname> options and user-specified variables are always
+ displayed at the end of the list of query options, after all appropriate option
+ groups.
+ </p>
+ </li>
+ </ul>
<p conref="../shared/impala_common.xml#common/added_in_20"/>
<p>
- <codeph>SET</codeph> has always been available as an <cmdname>impala-shell</cmdname> command. Promoting it to
- a SQL statement lets you use this feature in client applications through the JDBC and ODBC APIs.
+ <codeph>SET</codeph> has always been available as an <cmdname>impala-shell</cmdname>
+ command. Promoting it to a SQL statement lets you use this feature in client applications
+ through the JDBC and ODBC APIs.
</p>
-<!-- <p conref="../shared/impala_common.xml#common/jdbc_blurb"/> -->
-
- <p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
-
<p conref="../shared/impala_common.xml#common/permissions_blurb_no"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>
- See <xref href="impala_query_options.xml#query_options"/> for the query options you can adjust using this
- statement.
+ See <xref href="impala_query_options.xml#query_options"/> for the query options you can
+ adjust using this statement.
</p>
+
</conbody>
+
</concept>
http://git-wip-us.apache.org/repos/asf/impala/blob/d03a2d63/docs/topics/impala_shell_commands.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_shell_commands.xml b/docs/topics/impala_shell_commands.xml
index f9a48d9..91c6d13 100644
--- a/docs/topics/impala_shell_commands.xml
+++ b/docs/topics/impala_shell_commands.xml
@@ -291,29 +291,28 @@ under the License.
</entry>
<entry>
<p>
- Manages query options for an <cmdname>impala-shell</cmdname> session. The available options are the
- ones listed in <xref href="impala_query_options.xml#query_options"/>. These options are used for
- query tuning and troubleshooting. Issue <codeph>SET</codeph> with no arguments to see the current
- query options, either based on the <cmdname>impalad</cmdname> defaults, as specified by you at
- <cmdname>impalad</cmdname> startup, or based on earlier <codeph>SET</codeph> statements in the same
- session. To modify option values, issue commands with the syntax <codeph>set
- <varname>option</varname>=<varname>value</varname></codeph>. To restore an option to its default,
- use the <codeph>unset</codeph> command. Some options take Boolean values of <codeph>true</codeph>
- and <codeph>false</codeph>. Others take numeric arguments, or quoted string values.
+ Manages query options for an <cmdname>impala-shell</cmdname>
+ session. The available options are the ones listed in <xref
+ href="impala_query_options.xml#query_options"/>. These options
+ are used for query tuning and troubleshooting. Issue
+ <codeph>SET</codeph> with no arguments to see the current
+ query options, either based on the <cmdname>impalad</cmdname>
+ defaults, as specified by you at <cmdname>impalad</cmdname>
+ startup, or based on earlier <codeph>SET</codeph> statements in
+ the same session. To modify option values, issue commands with
+ the syntax <codeph>set
+ <varname>option</varname>=<varname>value</varname></codeph>.
+ To restore an option to its default, use the
+ <codeph>unset</codeph> command.
</p>
<p conref="../shared/impala_common.xml#common/set_vs_connect"/>
<p rev="2.0.0">
- In Impala 2.0 and later, <codeph>SET</codeph> is available as a SQL statement for any kind of
- application, not only through <cmdname>impala-shell</cmdname>. See
- <xref href="impala_set.xml#set"/> for details.
- </p>
-
- <p rev="2.5.0 IMPALA-2180">
- In Impala 2.5 and later, you can use <codeph>SET</codeph> to define your own substitution variables
- within an <cmdname>impala-shell</cmdname> session.
- Within a SQL statement, you substitute the value by using the notation <codeph>${var:<varname>variable_name</varname>}</codeph>.
+ In Impala 2.0 and later, <codeph>SET</codeph> is
+ available as a SQL statement for any kind of application as well
+ as in <cmdname>impala-shell</cmdname>. See <xref
+ href="impala_set.xml#set"/> for details.
</p>
</entry>
</row>
[5/6] impala git commit: IMPALA-6883: [DOCS] Refactor
impala_authorization doc
Posted by ta...@apache.org.
IMPALA-6883: [DOCS] Refactor impala_authorization doc
Change-Id: I3df72adb25dcdcbc286934b048645f47d876b33d
Reviewed-on: http://gerrit.cloudera.org:8080/10786
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/4ff9f5f3
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/4ff9f5f3
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/4ff9f5f3
Branch: refs/heads/master
Commit: 4ff9f5f3d280607ca523652319c8691803c5db57
Parents: 30e82c6
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Thu Jun 21 13:44:38 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue Jul 3 23:21:07 2018 +0000
----------------------------------------------------------------------
docs/shared/impala_common.xml | 867 ++++++++++++++----------------
docs/topics/impala_authorization.xml | 266 ++++-----
docs/topics/impala_grant.xml | 111 +---
3 files changed, 543 insertions(+), 701 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/impala/blob/4ff9f5f3/docs/shared/impala_common.xml
----------------------------------------------------------------------
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index 6faa9c1..4dcfffb 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -115,451 +115,388 @@ under the License.
nested topics at the end of this file.
</p>
- <table id="sentry_privileges_objects">
- <title>Valid privilege types and objects they apply to</title>
- <tgroup cols="2">
- <colspec colnum="1" colname="col1" colwidth="1*"/>
- <colspec colnum="2" colname="col2" colwidth="2*"/>
- <thead>
- <row>
- <entry><b>Privilege</b></entry>
- <entry><b>Object</b></entry>
- </row>
- </thead>
- <tbody>
- <row>
- <entry>INSERT</entry>
- <entry>DB, TABLE</entry>
- </row>
- <row>
- <entry>SELECT</entry>
- <entry>DB, TABLE, COLUMN</entry>
- </row>
- <row>
- <entry>ALL</entry>
- <entry>SERVER, TABLE, DB, URI</entry>
- </row>
- </tbody>
- </tgroup>
- </table>
-
- <table id="privileges_sql">
- <title>Privilege table for Hive & Impala operations</title>
- <tgroup cols="4">
- <colspec colnum="1" colname="col1" colwidth="1.31*"/>
- <colspec colnum="2" colname="col2" colwidth="1.17*"/>
- <colspec colnum="3" colname="col3" colwidth="1*"/>
- <colspec colname="newCol4" colnum="4" colwidth="1*"/>
- <thead>
- <row>
- <entry>Operation</entry>
- <entry>Scope</entry>
- <entry>Privileges Required</entry>
- <entry>URI</entry>
- </row>
- </thead>
- <tbody>
- <row id="create_database_privs">
- <entry>CREATE DATABASE</entry>
- <entry>SERVER</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="drop_database_privs">
- <entry>DROP DATABASE</entry>
- <entry>DATABASE</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="create_table_privs">
- <entry>CREATE TABLE</entry>
- <entry>DATABASE</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="drop_table_privs">
- <entry>DROP TABLE</entry>
- <entry>TABLE</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="create_view_privs">
- <entry>CREATE VIEW<p>-This operation is allowed if you have
- column-level <codeph>SELECT</codeph> access to the columns
- being used.</p></entry>
- <entry>DATABASE; SELECT on TABLE; </entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row>
- <entry>ALTER VIEW<p>-This operation is allowed if you have
- column-level <codeph>SELECT</codeph> access to the columns
- being used.</p></entry>
- <entry>VIEW/TABLE</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="drop_view_privs">
- <entry>DROP VIEW</entry>
- <entry>VIEW/TABLE</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="alter_table_add_columns_privs">
- <entry>ALTER TABLE .. ADD COLUMNS</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_replace_columns_privs">
- <entry>ALTER TABLE .. REPLACE COLUMNS</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_change_column_privs">
- <entry>ALTER TABLE .. CHANGE column</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_rename_privs">
- <entry>ALTER TABLE .. RENAME</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_set_tblproperties_privs">
- <entry>ALTER TABLE .. SET TBLPROPERTIES</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_set_fileformat_privs">
- <entry>ALTER TABLE .. SET FILEFORMAT</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_set_location_privs">
- <entry>ALTER TABLE .. SET LOCATION</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry>URI</entry>
- </row>
- <row id="alter_table_add_partition_privs">
- <entry>ALTER TABLE .. ADD PARTITION</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_add_partition_location_privs">
- <entry>ALTER TABLE .. ADD PARTITION location</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry>URI</entry>
- </row>
- <row id="alter_table_drop_partition_privs">
- <entry>ALTER TABLE .. DROP PARTITION</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_partition_set_fileformat_privs">
- <entry>ALTER TABLE .. PARTITION SET FILEFORMAT</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="show_create_table_privs">
- <entry>SHOW CREATE TABLE</entry>
- <entry>TABLE</entry>
- <entry>SELECT/INSERT</entry>
- <entry/>
- </row>
- <row id="show_partitions_privs">
- <entry>SHOW PARTITIONS</entry>
- <entry>TABLE</entry>
- <entry>SELECT/INSERT</entry>
- <entry/>
- </row>
- <row>
- <entry>SHOW TABLES<p>-Output includes all the tables for which
- the user has table-level privileges and all the tables for
- which the user has some column-level privileges.</p></entry>
- <entry>TABLE</entry>
- <entry>SELECT/INSERT</entry>
- <entry/>
- </row>
- <row>
- <entry>SHOW GRANT ROLE<p>-Output includes an additional field
- for any column-level privileges.</p></entry>
- <entry>TABLE</entry>
- <entry>SELECT/INSERT</entry>
- <entry/>
- </row>
- <row id="describe_table_privs">
- <entry>DESCRIBE TABLE<p>-Output shows <i>all</i> columns if the
- user has table level-privileges or <codeph>SELECT</codeph>
- privilege on at least one table column</p></entry>
- <entry>TABLE</entry>
- <entry>SELECT/INSERT</entry>
- <entry/>
- </row>
- <row id="load_data_privs">
- <entry>LOAD DATA</entry>
- <entry>TABLE</entry>
- <entry>INSERT</entry>
- <entry>URI</entry>
- </row>
- <row id="select_privs">
- <entry>SELECT<p>-You can grant the SELECT privilege on a view to
- give users access to specific columns of a table they do not
- otherwise have access to.</p><p>-See
- <xref keyref="sg_hive_sql"/>
- for details on allowed column-level
- operations.</p></entry>
- <entry>VIEW/TABLE; COLUMN</entry>
- <entry>SELECT</entry>
- <entry/>
- </row>
- <row id="insert_overwrite_table_privs">
- <entry>INSERT OVERWRITE TABLE</entry>
- <entry>TABLE</entry>
- <entry>INSERT</entry>
- <entry/>
- </row>
- <row id="create_table_as_select_privs">
- <entry>CREATE TABLE .. AS SELECT<p>-This operation is allowed if
- you have column-level <codeph>SELECT</codeph> access to the
- columns being used.</p></entry>
- <entry>DATABASE; SELECT on TABLE</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="use_privs">
- <entry>USE <dbName></entry>
- <entry>Any</entry>
- <entry/>
- <entry/>
- </row>
- <row id="create_function_privs">
- <entry>CREATE FUNCTION</entry>
- <entry>SERVER</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="alter_table_set_serdeproperties_privs">
- <entry>ALTER TABLE .. SET SERDEPROPERTIES</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row>
- <entry>ALTER TABLE .. PARTITION SET SERDEPROPERTIES</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="_privs">
- <entry namest="col1" nameend="newCol4"><b>Hive-Only
- Operations</b></entry>
- </row>
- <row id="insert_overwrite_directory_privs">
- <entry>INSERT OVERWRITE DIRECTORY</entry>
- <entry>TABLE</entry>
- <entry>INSERT</entry>
- <entry>URI</entry>
- </row>
- <row id="analyze_table_privs">
- <entry>Analyze TABLE</entry>
- <entry>TABLE</entry>
- <entry>SELECT + INSERT</entry>
- <entry/>
- </row>
- <row id="import_table_privs">
- <entry>IMPORT TABLE</entry>
- <entry>DATABASE</entry>
- <entry>ALL</entry>
- <entry>URI</entry>
- </row>
- <row id="export_table_privs">
- <entry>EXPORT TABLE</entry>
- <entry>TABLE</entry>
- <entry>SELECT</entry>
- <entry>URI</entry>
- </row>
- <row id="alter_table_touch_privs">
- <entry>ALTER TABLE TOUCH</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_touch_partition_privs">
- <entry>ALTER TABLE TOUCH PARTITION</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_clustered_by_privs">
- <entry>ALTER TABLE .. CLUSTERED BY SORTED BY</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_enable_privs">
- <entry>ALTER TABLE .. ENABLE/DISABLE</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_partition_enable_privs">
- <entry>ALTER TABLE .. PARTITION ENABLE/DISABLE</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row id="alter_table_partition_rename_privs">
- <entry>ALTER TABLE .. PARTITION.. RENAME TO PARTITION</entry>
- <entry>TABLE</entry>
- <entry>ALL on DATABASE</entry>
- <entry/>
- </row>
- <row>
- <entry>MSCK REPAIR TABLE</entry>
- <entry>TABLE</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="alter_database_privs">
- <entry>ALTER DATABASE</entry>
- <entry>DATABASE</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="describe_database_privs">
- <entry>DESCRIBE DATABASE</entry>
- <entry>DATABASE</entry>
- <entry>SELECT/INSERT</entry>
- <entry/>
- </row>
- <row id="show_columns_privs">
- <entry>SHOW COLUMNS<p>-Output for this operation filters columns
- to which the user does not have explicit
- <codeph>SELECT</codeph> access </p></entry>
- <entry>TABLE</entry>
- <entry>SELECT/INSERT</entry>
- <entry/>
- </row>
- <row id="create_index_privs">
- <entry>CREATE INDEX</entry>
- <entry>TABLE</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="drop_index_privs">
- <entry>DROP INDEX</entry>
- <entry>TABLE</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="show_indexes_privs">
- <entry>SHOW INDEXES</entry>
- <entry>TABLE</entry>
- <entry>SELECT/INSERT</entry>
- <entry/>
- </row>
- <row id="grant_privs">
- <entry>GRANT PRIVILEGE</entry>
- <entry>Allowed only for Sentry admin users</entry>
- <entry/>
- <entry/>
- </row>
- <row id="revoke_privs">
- <entry>REVOKE PRIVILEGE</entry>
- <entry>Allowed only for Sentry admin users</entry>
- <entry/>
- <entry/>
- </row>
- <row id="show_grants_privs">
- <entry>SHOW GRANTS</entry>
- <entry>Allowed only for Sentry admin users</entry>
- <entry/>
- <entry/>
- </row>
- <row id="show_tblproperties_privs">
- <entry>SHOW TBLPROPERTIES</entry>
- <entry>TABLE</entry>
- <entry>SELECT/INSERT</entry>
- <entry/>
- </row>
- <row id="describe_table_partition_privs">
- <entry>DESCRIBE TABLE .. PARTITION</entry>
- <entry>TABLE</entry>
- <entry>SELECT/INSERT</entry>
- <entry/>
- </row>
- <row id="add_jar_privs">
- <entry>ADD JAR</entry>
- <entry>Not Allowed</entry>
- <entry/>
- <entry/>
- </row>
- <row id="add_file_privs">
- <entry>ADD FILE</entry>
- <entry>Not Allowed</entry>
- <entry/>
- <entry/>
- </row>
- <row id="dfs_privs">
- <entry>DFS</entry>
- <entry>Not Allowed</entry>
- <entry/>
- <entry/>
- </row>
- <row>
- <entry namest="col1" nameend="newCol4"><b>Impala-Only
- Operations</b></entry>
- </row>
- <row id="explain_privs">
- <entry>EXPLAIN</entry>
- <entry>TABLE; COLUMN</entry>
- <entry>SELECT</entry>
- <entry/>
- </row>
- <row id="invalidate_metadata_privs">
- <entry>INVALIDATE METADATA</entry>
- <entry>SERVER</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="invalidate_metadata_table_privs">
- <entry>INVALIDATE METADATA <table name></entry>
- <entry>TABLE</entry>
- <entry>SELECT/INSERT</entry>
- <entry/>
- </row>
- <row id="refresh_privs">
- <entry>REFRESH <table name> or REFRESH <table name> PARTITION (<partition_spec>)</entry>
- <entry>TABLE</entry>
- <entry>SELECT/INSERT</entry>
- <entry/>
- </row>
- <row id="drop_function_privs">
- <entry>DROP FUNCTION</entry>
- <entry>SERVER</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- <row id="compute_stats_privs">
- <entry>COMPUTE STATS</entry>
- <entry>TABLE</entry>
- <entry>ALL</entry>
- <entry/>
- </row>
- </tbody>
- </tgroup>
- </table>
+ <p id="sentry_privileges_objects">The table below lists the minimum level
+ of privileges and the scope required to execute SQL statements in
+ <keyword keyref="impala30_full"/> and higher. The following notations
+ are used:<ul>
+ <li><b>ANY</b> denotes the <codeph>SELECT</codeph>,
+ <codeph>INSERT</codeph>, <codeph>CREATE</codeph>,
+ <codeph>ALTER</codeph>, <codeph>DROP</codeph>, <b><i>or</i></b>
+ <codeph>REFRESH</codeph> privilege.</li>
+ <li><b>ALL</b> privilege denotes the <codeph>SELECT</codeph>,
+ <codeph>INSERT</codeph>, <codeph>CREATE</codeph>,
+ <codeph>ALTER</codeph>, <codeph>DROP</codeph>, <b><i>and</i></b>
+ <codeph>REFRESH</codeph> privileges.</li>
+ <li>The parent levels of the specified scope are implicitly supported.
+ For example, if a privilege is listed with the
+ <codeph>TABLE</codeph> scope, the same privilege granted on
+ <codeph>DATABASE</codeph> and <codeph>SERVER</codeph> will allow
+ the user to execute the specified SQL statement.</li>
+ </ul><table id="sentry_privileges_objects_tab" frame="all" colsep="1"
+ rowsep="1">
+ <tgroup cols="3">
+ <colspec colnum="1" colname="col1"/>
+ <colspec colnum="2" colname="col2"/>
+ <colspec colnum="3" colname="col3"/>
+ <tbody>
+ <row>
+ <entry><b>SQL Statement</b></entry>
+ <entry><b>Privileges</b></entry>
+ <entry><b>Scope</b></entry>
+ </row>
+ <row>
+ <entry>SELECT</entry>
+ <entry>SELECT</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>WITH SELECT</entry>
+ <entry>SELECT</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>EXPLAIN SELECT</entry>
+ <entry>SELECT</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>INSERT</entry>
+ <entry>INSERT</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>EXPLAIN INSERT</entry>
+ <entry>INSERT</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>TRUNCATE</entry>
+ <entry>INSERT</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>LOAD</entry>
+ <entry>INSERT</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>ALL</entry>
+ <entry>URI</entry>
+ </row>
+ <row>
+ <entry>CREATE DATABASE</entry>
+ <entry>CREATE</entry>
+ <entry>SERVER</entry>
+ </row>
+ <row>
+ <entry>CREATE DATABASE LOCATION</entry>
+ <entry>CREATE</entry>
+ <entry>SERVER</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>ALL</entry>
+ <entry>URI</entry>
+ </row>
+ <row>
+ <entry>CREATE TABLE</entry>
+ <entry>CREATE</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry>CREATE TABLE LIKE</entry>
+ <entry>CREATE</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>SELECT, INSERT, <b><i>or</i></b> REFRESH</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>CREATE TABLE AS SELECT</entry>
+ <entry>CREATE</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>INSERT</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>SELECT</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>EXPLAIN CREATE TABLE AS SELECT</entry>
+ <entry>CREATE</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>INSERT</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>SELECT</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>CREATE TABLE LOCATION</entry>
+ <entry>CREATE</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>ALL</entry>
+ <entry>URI</entry>
+ </row>
+ <row>
+ <entry>CREATE VIEW</entry>
+ <entry>CREATE</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>SELECT</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>ALTER DATABASE</entry>
+ <entry>ALTER</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry>ALTER TABLE</entry>
+ <entry>ALTER</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>ALTER TABLE SET LOCATION</entry>
+ <entry>ALTER</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>ALL</entry>
+ <entry>URI</entry>
+ </row>
+ <row>
+ <entry>ALTER TABLE RENAME</entry>
+ <entry>CREATE</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>ALTER</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>ALTER VIEW</entry>
+ <entry>ALTER</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>SELECT</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>ALTER VIEW RENAME</entry>
+ <entry>CREATE</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>ALTER</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>DROP DATABASE</entry>
+ <entry>DROP</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry>DROP TABLE</entry>
+ <entry>DROP</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>DROP VIEW</entry>
+ <entry>DROP</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>CREATE FUNCTION</entry>
+ <entry>CREATE</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry/>
+ <entry>ALL</entry>
+ <entry>URI</entry>
+ </row>
+ <row>
+ <entry>DROP FUNCTION</entry>
+ <entry>DROP</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry>COMPUTE STATS</entry>
+ <entry>ALTER and SELECT</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>DROP STATS</entry>
+ <entry>ALTER</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>INVALIDATE METADATA</entry>
+ <entry>REFRESH</entry>
+ <entry>SERVER</entry>
+ </row>
+ <row>
+ <entry>INVALIDATE METADATA <table></entry>
+ <entry>REFRESH</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>REFRESH <table></entry>
+ <entry>REFRESH</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>REFRESH FUNCTIONS</entry>
+ <entry>REFRESH</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry>COMMENT ON DATABASE</entry>
+ <entry>ALTER</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry>COMMENT ON TABLE</entry>
+ <entry>ALTER</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>COMMENT ON VIEW</entry>
+ <entry>ALTER</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>DESCRIBE DATABASE</entry>
+ <entry>SELECT, INSERT, <b><i>or</i></b> REFRESH</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry>DESCRIBE <table/view></entry>
+ <entry>SELECT, INSERT, <b><i>or</i></b> REFRESH</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>USE</entry>
+ <entry>ANY</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>SHOW DATABASES</entry>
+ <entry>ANY</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>SHOW TABLES</entry>
+ <entry>ANY</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>SHOW FUNCTIONS</entry>
+ <entry>SELECT, INSERT, <b><i>or</i></b> REFRESH</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry>SHOW PARTITIONS</entry>
+ <entry>SELECT, INSERT, <b><i>or</i></b> REFRESH</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>SHOW TABLE STATS</entry>
+ <entry>SELECT, INSERT, <b><i>or</i></b> REFRESH</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>SHOW COLUMN STATS</entry>
+ <entry>SELECT, INSERT, <b><i>or</i></b> REFRESH</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>SHOW FILES</entry>
+ <entry>SELECT, INSERT, <b><i>or</i></b> REFRESH</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>SHOW CREATE TABLE</entry>
+ <entry>SELECT, INSERT, <b><i>or</i></b> REFRESH</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>SHOW CREATE VIEW</entry>
+ <entry>SELECT, INSERT, <b><i>or</i></b> REFRESH</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>SHOW CREATE FUNCTION</entry>
+ <entry>SELECT, INSERT, <b><i>or</i></b> REFRESH</entry>
+ <entry>DATABASE</entry>
+ </row>
+ <row>
+ <entry>SHOW RANGE PARTITIONS (Kudu only)</entry>
+ <entry>SELECT, INSERT, <b><i>or</i></b> REFRESH</entry>
+ <entry>TABLE</entry>
+ </row>
+ <row>
+ <entry>UPDATE (Kudu only)</entry>
+ <entry>ALL</entry>
+ <entry>SERVER</entry>
+ </row>
+ <row>
+ <entry>EXPLAIN UPDATE (Kudu only)</entry>
+ <entry>ALL</entry>
+ <entry>SERVER</entry>
+ </row>
+ <row>
+ <entry>UPSERT (Kudu only)</entry>
+ <entry>ALL</entry>
+ <entry>SERVER</entry>
+ </row>
+ <row>
+ <entry>WITH UPSERT (Kudu only)</entry>
+ <entry>ALL</entry>
+ <entry>SERVER</entry>
+ </row>
+ <row>
+ <entry>EXPLAIN UPSERT (Kudu only)</entry>
+ <entry>ALL</entry>
+ <entry>SERVER</entry>
+ </row>
+ <row>
+ <entry>DELETE (Kudu only)</entry>
+ <entry>ALL</entry>
+ <entry>SERVER</entry>
+ </row>
+ <row>
+ <entry>EXPLAIN DELETE (Kudu only)</entry>
+ <entry>ALL</entry>
+ <entry>SERVER</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table></p>
<p rev="IMPALA-2660" id="auth_to_local_instructions">
In <keyword keyref="impala26_full"/> and higher, Impala recognizes the <codeph>auth_to_local</codeph> setting,
@@ -590,29 +527,23 @@ under the License.
<b><ph id="title_sentry_debug">Debugging Failed Sentry Authorization Requests</ph></b>
</p>
- <p id="sentry_debug">
- Sentry logs all facts that lead up to authorization decisions at the debug level. If you do not understand
- why Sentry is denying access, the best way to debug is to temporarily turn on debug logging:
- <ul>
- <li>
- Add <codeph>log4j.logger.org.apache.sentry=DEBUG</codeph> to the <filepath>log4j.properties</filepath>
- file on each host in the cluster, in the appropriate configuration directory for each service.
- </li>
- </ul>
- Specifically, look for exceptions and messages such as:
-<codeblock xml:space="preserve">FilePermission server..., RequestPermission server...., result [true|false]</codeblock>
- which indicate each evaluation Sentry makes. The <codeph>FilePermission</codeph> is from the policy file,
- while <codeph>RequestPermission</codeph> is the privilege required for the query. A
- <codeph>RequestPermission</codeph> will iterate over all appropriate <codeph>FilePermission</codeph>
- settings until a match is found. If no matching privilege is found, Sentry returns <codeph>false</codeph>
- indicating <q>Access Denied</q> .
-<!--
-[1]
-Impala: Impala Daemon -> Advanced -> Impala Daemon Logging Safety Valve
-Hive: Hive Server 2 -> Advanced -> HiveServer2 Logging Safety Valve
-Search: Solr Server -> Advanced -> HiveServer2 Logging Safety Valve
--->
- </p>
+ <p id="sentry_debug"> Sentry logs all facts that lead up to authorization
+ decisions at the debug level. If you do not understand why Sentry is
+ denying access, the best way to debug is to temporarily turn on debug
+ logging: <ul>
+ <li> Add <codeph>log4j.logger.org.apache.sentry=DEBUG</codeph> to the
+ <filepath>log4j.properties</filepath> file on each host in the
+ cluster, in the appropriate configuration directory for each
+ service. </li>
+ </ul> Specifically, look for exceptions and messages such as:
+ <codeblock xml:space="preserve">FilePermission server..., RequestPermission server...., result [true|false]</codeblock>
+ which indicate each evaluation Sentry makes. The
+ <codeph>FilePermission</codeph> is from the policy file, while
+ <codeph>RequestPermission</codeph> is the privilege required for the
+ query. A <codeph>RequestPermission</codeph> will iterate over all
+ appropriate <codeph>FilePermission</codeph> settings until a match is
+ found. If no matching privilege is found, Sentry returns
+ <codeph>false</codeph> indicating <q>Access Denied</q>.</p>
</section>
http://git-wip-us.apache.org/repos/asf/impala/blob/4ff9f5f3/docs/topics/impala_authorization.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_authorization.xml b/docs/topics/impala_authorization.xml
index 4e4a243..39932f6 100644
--- a/docs/topics/impala_authorization.xml
+++ b/docs/topics/impala_authorization.xml
@@ -65,12 +65,23 @@ under the License.
<conbody>
<p>
- Privileges can be granted on different objects in the schema. Any privilege that can be granted is
- associated with a level in the object hierarchy. If a privilege is granted on a container object in the
- hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other
- database systems such as MySQL.
+ Privileges can be granted on different objects in the schema. Any privilege that can be
+ granted is associated with a level in the object hierarchy. If a privilege is granted on
+ a parent object in the hierarchy, the child object automatically inherits it. This is
+ the same privilege model as Hive and other database systems.
+ </p>
+
+ <p>
+ The objects in the Impala schema hierarchy are:
</p>
+<codeblock>Server
+ URI
+ Database
+ Table
+ Column
+</codeblock>
+
<p rev="2.3.0 collevelauth">
The object hierarchy for Impala covers Server, URI, Database, Table, and Column. (The Table privileges apply to views as well;
anywhere you specify a table name, you can specify a view name instead.)
@@ -81,63 +92,7 @@ under the License.
in a table.
</p>
- <p>
- A restricted set of privileges determines what you can do with each object:
- </p>
-
- <dl>
- <dlentry id="select_priv">
-
- <dt>
- SELECT privilege
- </dt>
-
- <dd>
- Lets you read data from a table or view, for example with the <codeph>SELECT</codeph> statement, the
- <codeph>INSERT...SELECT</codeph> syntax, or <codeph>CREATE TABLE...LIKE</codeph>. Also required to
- issue the <codeph>DESCRIBE</codeph> statement or the <codeph>EXPLAIN</codeph> statement for a query
- against a particular table. Only objects for which a user has this privilege are shown in the output
- for <codeph>SHOW DATABASES</codeph> and <codeph>SHOW TABLES</codeph> statements. The
- <codeph>REFRESH</codeph> statement and <codeph>INVALIDATE METADATA</codeph> statements only access
- metadata for tables for which the user has this privilege.
- </dd>
-
- </dlentry>
-
- <dlentry id="insert_priv">
-
- <dt>
- INSERT privilege
- </dt>
-
- <dd>
- Lets you write data to a table. Applies to the <codeph>INSERT</codeph> and <codeph>LOAD DATA</codeph>
- statements.
- </dd>
-
- </dlentry>
-
- <dlentry id="all_priv">
-
- <dt>
- ALL privilege
- </dt>
-
- <dd>
- Lets you create or modify the object. Required to run DDL statements such as <codeph>CREATE
- TABLE</codeph>, <codeph>ALTER TABLE</codeph>, or <codeph>DROP TABLE</codeph> for a table,
- <codeph>CREATE DATABASE</codeph> or <codeph>DROP DATABASE</codeph> for a database, or <codeph>CREATE
- VIEW</codeph>, <codeph>ALTER VIEW</codeph>, or <codeph>DROP VIEW</codeph> for a view. Also required for
- the URI of the <q>location</q> parameter for the <codeph>CREATE EXTERNAL TABLE</codeph> and
- <codeph>LOAD DATA</codeph> statements.
-<!-- Have to think about the best wording, how often to repeat, how best to conref this caveat.
- You do not actually code the keyword <codeph>ALL</codeph> in the policy file; instead you use
- <codeph>action=*</codeph> or shorten the right-hand portion of the rule.
- -->
- </dd>
-
- </dlentry>
- </dl>
+ <p conref="../shared/impala_common.xml#common/sentry_privileges_objects"/>
<p>
Privileges can be specified for a table or view before that object actually exists. If you do not have
@@ -145,6 +100,30 @@ under the License.
not.
</p>
+ <note>
+ <p>
+ Although this document refers to the <codeph>ALL</codeph> privilege, currently if you
+ use the policy file mode, you do not use the actual keyword <codeph>ALL</codeph> in
+ the policy file. When you code role entries in the policy file:
+ </p>
+ <ul>
+ <li>
+ To specify the <codeph>ALL</codeph> privilege for a server, use a role like
+ <codeph>server=<varname>server_name</varname></codeph>.
+ </li>
+
+ <li>
+ To specify the <codeph>ALL</codeph> privilege for a database, use a role like
+ <codeph>server=<varname>server_name</varname>->db=<varname>database_name</varname></codeph>.
+ </li>
+
+ <li>
+ To specify the <codeph>ALL</codeph> privilege for a table, use a role like
+ <codeph>server=<varname>server_name</varname>->db=<varname>database_name</varname>->table=<varname>table_name</varname>->action=*</codeph>.
+ </li>
+ </ul>
+ </note>
+
<p>
Originally, privileges were encoded in a policy file, stored in HDFS. This mode of operation is still an
option, but the emphasis of privilege management is moving towards being SQL-based. Although currently
@@ -176,17 +155,21 @@ under the License.
<ul>
<li>
- The <codeph>-server_name</codeph> option turns on Sentry authorization for Impala. The authorization
- rules refer to a symbolic server name, and you specify the name to use as the argument to the
- <codeph>-server_name</codeph> option.
+ <codeph>-server_name</codeph>: Turns on Sentry authorization for Impala. The
+ authorization rules refer to a symbolic server name, and you specify the same name to
+ use as the argument to the <codeph>-server_name</codeph> option for all
+ <cmdname>impalad</cmdname> nodes in the cluster.
+ <p>
+ Starting in Impala 1.4.0 and higher, if you specify just
+ <codeph>-server_name</codeph> without <codeph>-authorization_policy_file</codeph>,
+ Impala uses the Sentry service for authorization.
+ </p>
</li>
- <li rev="1.4.0">
- If you specify just <codeph>-server_name</codeph>, Impala uses the Sentry service for authorization,
- relying on the results of <codeph>GRANT</codeph> and <codeph>REVOKE</codeph> statements issued through
- Hive. (This mode of operation is available in Impala 1.4.0 and higher.) Prior to Impala 1.4.0, or if you
- want to continue storing privilege rules in the policy file, also specify the
- <codeph>-authorization_policy_file</codeph> option as in the following item.
+ <li>
+ <codeph>-sentry_config</codeph>: Specifies the local path to the
+ <codeph>sentry-site.xml</codeph> configuration file. This setting is required to
+ enable authorization.
</li>
<li>
@@ -218,6 +201,14 @@ under the License.
</codeblock>
<p>
+ The preceding examples set up a symbolic name of <codeph>server1</codeph> to refer to
+ the current instance of Impala. Specify the symbolic name for the
+ <codeph>sentry.hive.server</codeph> property in the <filepath>sentry-site.xml</filepath>
+ configuration file for Hive, as well as in the <codeph>-server_name</codeph> option for
+ <cmdname>impalad</cmdname>.
+ </p>
+
+ <p>
The preceding examples set up a symbolic name of <codeph>server1</codeph> to refer to the current instance
of Impala. This symbolic name is used in the following ways:
</p>
@@ -307,7 +298,44 @@ report_generator = server=server1->db=reporting_db->table=*->action=SEL
to security policies, restart all Impala daemons to pick up the changes immediately.
</p>
- <p outputclass="toc inpage"/>
+ <p>
+ URIs represent the file paths you specify as part of statements such as <codeph>CREATE
+ EXTERNAL TABLE</codeph> and <codeph>LOAD DATA</codeph>. Typically, you specify what look
+ like UNIX paths, but these locations can also be prefixed with <codeph>hdfs://</codeph>
+ to make clear that they are really URIs. To set privileges for a URI, specify the name
+ of a directory, and the privilege applies to all the files in that directory and any
+ directories underneath it.
+ </p>
+
+ <p>
+ URIs must start with <codeph>hdfs://</codeph>, <codeph>s3a://</codeph>,
+ <codeph>adl://</codeph>, or <codeph>file://</codeph>. If a URI starts with an absolute
+ path, the path will be appended to the default filesystem prefix. For example, if you
+ specify:
+<codeblock>
+GRANT ALL ON URI '/tmp';
+</codeblock>
+ The above statement effectively becomes the following where the default filesystem is
+ HDFS.
+<codeblock>
+GRANT ALL ON URI 'hdfs://localhost:20500/tmp';
+</codeblock>
+ </p>
+
+ <p>
+ When defining URIs for HDFS, you must also specify the NameNode. For example:
+<codeblock>GRANT ALL ON URI file:///path/to/dir TO <role>
+GRANT ALL ON URI hdfs://namenode:port/path/to/dir TO <role></codeblock>
+ <note type="warning">
+ <p>
+ Because the NameNode host and port must be specified, it is strongly recommended
+ that you use High Availability (HA). This ensures that the URI will remain constant
+ even if the NameNode changes. For example:
+ </p>
+<codeblock>GRANT ALL ON URI hdfs://ha-nn-uri/path/to/dir TO <role></codeblock>
+ </note>
+ </p>
+
</conbody>
<concept id="security_policy_file_details">
@@ -520,14 +548,15 @@ student = server=server1->db=training->table=lesson_*->action=SELECT
<codeblock></codeblock>
-</example>
-
-<example id="sec_ex_superuser_single_table">
-<title>A User with Full Privileges for a Specific Table</title>
- <p>
- If a user has <codeph>SELECT</codeph> privilege for a table, they can query, describe, or explain queries for
- that table.
- </p>
+ <li>
+ The <codeph>staging_dir</codeph> role can specify the HDFS path
+ <filepath>/user/impala-user/external_data</filepath> with the <codeph>LOAD
+ DATA</codeph> statement. When Impala queries or loads data files, it operates on
+ all the files in that directory, not just a single file, so any Impala
+ <codeph>LOCATION</codeph> parameters refer to a directory rather than an
+ individual file.
+ </li>
+ </ul>
<codeblock></codeblock>
</example>
@@ -564,33 +593,10 @@ student = server=server1->db=training->table=lesson_*->action=SELECT
</li>
<li>
- The <codeph>staging_dir</codeph> role lets us specify the HDFS path
- <filepath>/user/username/external_data</filepath> with the <codeph>LOAD DATA</codeph> statement.
- Remember, when Impala queries or loads data files, it operates on all the files in that directory,
- not just a single file, so any Impala <codeph>LOCATION</codeph> parameters refer to a directory
- rather than an individual file.
- </li>
-
- <li>
- We included the IP address and port of the Hadoop name node in the HDFS URI of the
- <codeph>staging_dir</codeph> rule. We found those details in
- <filepath>/etc/hadoop/conf/core-site.xml</filepath>, under the <codeph>fs.default.name</codeph>
- element. That is what we use in any roles that specify URIs (that is, the locations of directories in
- HDFS).
- </li>
-
- <li>
- We start this example after the table <codeph>external_table.sample</codeph> is already created. In
- the policy file for the example, we have already taken away the <codeph>external_table_admin</codeph>
- role from the <codeph>username</codeph> group, and replaced it with the lesser-privileged
- <codeph>external_table</codeph> role.
- </li>
-
- <li>
- We assign privileges to a subdirectory underneath <filepath>/user/username</filepath> in HDFS,
- because such privileges also apply to any subdirectories underneath. If we had assigned privileges to
- the parent directory <filepath>/user/username</filepath>, it would be too likely to mess up other
- files by specifying a wrong location by mistake.
+ Members of the <codeph>impala_users</codeph> group have the
+ <codeph>instructor</codeph> role and so can create, insert into, and query any
+ tables in the <codeph>training</codeph> database, but cannot create or drop the
+ database itself.
</li>
<li>
@@ -705,15 +711,14 @@ ERROR: AuthorizationException: User 'username' does not have privileges to acces
with sensitive information, then create a view that only exposes the non-confidential columns.
</p>
-<codeblock>[localhost:21000] > create table sensitive_info
- > (
- > name string,
- > address string,
- > credit_card string,
- > taxpayer_id string
- > );
-[localhost:21000] > create view name_address_view as select name, address from sensitive_info;
-</codeblock>
+ <note rev="1.4.0">
+ In <ph rev="upstream">CDH 5</ph> and higher, <ph
+ rev="upstream">Cloudera</ph>
+ recommends managing privileges through SQL statements, as described in
+ <xref
+ href="impala_authorization.xml#sentry_service"/>. If you are still using
+ policy files, plan to migrate to the new approach some time in the future.
+ </note>
<p>
Then the following policy file specifies read-only privilege for that view, without authorizing access
@@ -771,15 +776,28 @@ view_only_privs = server=server1->db=reports->table=name_address_view->
</li>
</ul>
-<codeblock>[groups]
-supergroup = training_sysadmin
-employee = instructor
-visitor = student
-
-[roles]
-training_sysadmin = server=server1->db=training
-instructor = server=server1->db=training->table=*->action=*
-student = server=server1->db=training->table=*->action=SELECT
+ <p>
+ In the <codeph>[roles]</codeph> section, you a set of roles. For each role, you
+ specify precisely the set of privileges is available. That is, which objects users
+ with that role can access, and what operations they can perform on those objects. This
+ is the lowest-level category of security information; the other sections in the policy
+ file map the privileges to higher-level divisions of groups and users. In the
+ <codeph>[groups]</codeph> section, you specify which roles are associated with which
+ groups. The group and usernames correspond to Linux groups and users on the server
+ where the <cmdname>impalad</cmdname> daemon runs. The privileges are specified using
+ patterns like:
+<codeblock>server=<varname>server_name</varname>->db=<varname>database_name</varname>->table=<varname>table_name</varname>->action=SELECT
+server=<varname>server_name</varname>->db=<varname>database_name</varname>->table=t<varname>able_name</varname>->action=CREATE
+server=<varname>server_name</varname>->db=<varname>database_name</varname>->table=<varname>table_name</varname>->action=ALL
+</codeblock>
+ For the <varname>server_name</varname> value, substitute the same symbolic name you
+ specify with the <cmdname>impalad</cmdname> <codeph>-server_name</codeph> option. You
+ can use <codeph>*</codeph> wildcard characters at each level of the privilege
+ specification to allow access to all such objects. For example:
+<codeblock>server=impala-host.example.com->db=default->table=t1->action=SELECT
+server=impala-host.example.com->db=*->table=*->action=CREATE
+server=impala-host.example.com->db=*->table=audit_log->action=SELECT
+server=impala-host.example.com->db=default->table=t1->action=*
</codeblock>
</example>
http://git-wip-us.apache.org/repos/asf/impala/blob/4ff9f5f3/docs/topics/impala_grant.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_grant.xml b/docs/topics/impala_grant.xml
index 956a458..bdc71db 100644
--- a/docs/topics/impala_grant.xml
+++ b/docs/topics/impala_grant.xml
@@ -74,7 +74,7 @@ GRANT <varname>privilege</varname> ON <varname>object_type</varname> <varname>ob
<p> The <codeph>WITH GRANT OPTION</codeph> clause allows members of the
specified role to issue <codeph>GRANT</codeph> and <codeph>REVOKE</codeph>
- statements for those same privileges Hence, if a role has the
+ statements for those same privileges. Hence, if a role has the
<codeph>ALL</codeph> privilege on a database and the <codeph>WITH GRANT
OPTION</codeph> set, users granted that role can execute
<codeph>GRANT</codeph>/<codeph>REVOKE</codeph> statements only for that
@@ -100,114 +100,7 @@ GRANT <varname>privilege</varname> ON <varname>object_type</varname> <varname>ob
<codeph>URI</codeph> object. Finer-grained privileges mentioned below on
a <codeph>URI</codeph> are not supported.
</p>
-
- <p>
- Starting in <keyword keyref="impala30_full"/>, finer grained privileges
- are enforced as below.<simpletable frame="all" relcolwidth="1* 1* 1*"
- id="simpletable_kmb_ppn_ndb">
- <sthead>
- <stentry>Privilege</stentry>
- <stentry>Scope</stentry>
- <stentry>SQL Allowed to Execute</stentry>
- </sthead>
- <strow>
- <stentry><codeph>REFRESH</codeph></stentry>
- <stentry><codeph>SERVER</codeph></stentry>
- <stentry><codeph>INVALIDATE METADATA</codeph> on all tables in all
- databases<p><codeph>REFRESH</codeph> on all tables and functions
- in all databases</p></stentry>
- </strow>
- <strow>
- <stentry><codeph>REFRESH</codeph></stentry>
- <stentry><codeph>DATABASE</codeph></stentry>
- <stentry><codeph>INVALIDATE METADATA</codeph> on all tables in the
- named database<p><codeph>REFRESH</codeph> on all tables and
- functions in the named database</p></stentry>
- </strow>
- <strow>
- <stentry><codeph>REFRESH</codeph></stentry>
- <stentry><codeph>TABLE</codeph></stentry>
- <stentry><codeph>INVALIDATE METADATA</codeph> on the named
- table<p><codeph>REFRESH</codeph> on the named
- table</p></stentry>
- </strow>
- <strow>
- <stentry><codeph>CREATE</codeph></stentry>
- <stentry><codeph>SERVER</codeph></stentry>
- <stentry><codeph>CREATE DATABASE</codeph> on all
- databases<p><codeph>CREATE TABLE</codeph> on all
- tables</p></stentry>
- </strow>
- <strow>
- <stentry><codeph>CREATE</codeph></stentry>
- <stentry><codeph>DATABASE</codeph></stentry>
- <stentry><codeph>CREATE TABLE</codeph> on all tables in the named
- database</stentry>
- </strow>
- <strow>
- <stentry><codeph>DROP</codeph></stentry>
- <stentry><codeph>SERVER</codeph></stentry>
- <stentry><codeph>DROP DATBASE</codeph> on all databases<p><codeph>DROP
- TABLE</codeph> on all tables</p></stentry>
- </strow>
- <strow>
- <stentry><codeph>DROP</codeph></stentry>
- <stentry><codeph>DATABASE</codeph></stentry>
- <stentry><codeph>DROP DATABASE</codeph> on the named
- database<p><codeph>DROP TABLE</codeph> on all tables in the
- named database</p></stentry>
- </strow>
- <strow>
- <stentry><codeph>DROP</codeph></stentry>
- <stentry><codeph>TABLE</codeph></stentry>
- <stentry><codeph>DROP TABLE</codeph> on the named table</stentry>
- </strow>
- <strow>
- <stentry><codeph>ALTER</codeph></stentry>
- <stentry><codeph>SERVER</codeph></stentry>
- <stentry><codeph>ALTER TABLE</codeph> on all tables</stentry>
- </strow>
- <strow>
- <stentry><codeph>ALTER</codeph></stentry>
- <stentry><codeph>DATABASE</codeph></stentry>
- <stentry><codeph>ALTER TABLE</codeph> on the tables in the named
- database</stentry>
- </strow>
- <strow>
- <stentry><codeph>ALTER</codeph></stentry>
- <stentry><codeph>TABLE</codeph></stentry>
- <stentry><codeph>ALTER TABLE</codeph> on the named table</stentry>
- </strow>
- </simpletable>
- </p>
-
- <p>
- <note>
- <p>
- <ul>
- <li>
- <codeph>ALTER TABLE RENAME</codeph> requires the
- <codeph>ALTER</codeph> privilege at the <codeph>TABLE</codeph>
- level and the <codeph>CREATE</codeph> privilege at the
- <codeph>DATABASE</codeph> level.
- </li>
-
- <li>
- <codeph>CREATE TABLE AS SELECT</codeph> requires the
- <codeph>CREATE</codeph> privilege on the database that should
- contain the new table and the <codeph>SELECT</codeph> privilege on
- the tables referenced in the query portion of the statement.
- </li>
-
- <li>
- <codeph>COMPUTE STATS</codeph> requires the
- <codeph>ALTER</codeph> and <codeph>SELECT</codeph> privileges on
- the target table.
- </li>
- </ul>
- </p>
- </note>
- </p>
+ <p conref="../shared/impala_common.xml#common/sentry_privileges_objects"/>
<p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
[6/6] impala git commit: IMPALA-7236: Fix the parsing of
ALLOW_ERASURE_CODED_FILES
Posted by ta...@apache.org.
IMPALA-7236: Fix the parsing of ALLOW_ERASURE_CODED_FILES
This patch adds a missing "break" statement in a switch statement
changed by IMPALA-7102.
Also fixes an non-deterministic test case.
Change-Id: Ife1e791541e3f4fed6bec00945390c7d7681e824
Reviewed-on: http://gerrit.cloudera.org:8080/10857
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/61e6a477
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/61e6a477
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/61e6a477
Branch: refs/heads/master
Commit: 61e6a47776ba7f14139b69f91a49d2072a76178b
Parents: 4ff9f5f
Author: Tianyi Wang <ti...@apache.org>
Authored: Mon Jul 2 19:03:19 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue Jul 3 23:49:44 2018 +0000
----------------------------------------------------------------------
be/src/service/query-options.cc | 1 +
.../functional-query/queries/QueryTest/hdfs-erasure-coding.test | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/impala/blob/61e6a477/be/src/service/query-options.cc
----------------------------------------------------------------------
diff --git a/be/src/service/query-options.cc b/be/src/service/query-options.cc
index 1063fef..5d61664 100644
--- a/be/src/service/query-options.cc
+++ b/be/src/service/query-options.cc
@@ -674,6 +674,7 @@ Status impala::SetQueryOption(const string& key, const string& value,
return Status(Substitute("Invalid kudu_read_mode '$0'. Valid values are "
"DEFAULT, READ_LATEST, and READ_AT_SNAPSHOT.", value));
}
+ break;
}
case TImpalaQueryOptions::ALLOW_ERASURE_CODED_FILES: {
query_options->__set_allow_erasure_coded_files(
http://git-wip-us.apache.org/repos/asf/impala/blob/61e6a477/testdata/workloads/functional-query/queries/QueryTest/hdfs-erasure-coding.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/hdfs-erasure-coding.test b/testdata/workloads/functional-query/queries/QueryTest/hdfs-erasure-coding.test
index 0c773b4..fbe05c6 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/hdfs-erasure-coding.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/hdfs-erasure-coding.test
@@ -3,7 +3,7 @@
set allow_erasure_coded_files=false;
select count(*) from functional.alltypes;
---- CATCH
-ImpalaRuntimeException: Scanning of HDFS erasure-coded file (hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=1/090101.txt) is not supported
+ImpalaRuntimeException: Scanning of HDFS erasure-coded file
====
---- QUERY
set allow_erasure_coded_files=true;
[4/6] impala git commit: IMPALA-7190: Remove unsupported format
writer support
Posted by ta...@apache.org.
IMPALA-7190: Remove unsupported format writer support
This patch removes write support for unsupported formats like Sequence,
Avro and compressed text. Also, the related query options
ALLOW_UNSUPPORTED_FORMATS and SEQ_COMPRESSION_MODE have been migrated
to the REMOVED query options type.
Testing:
Ran exhaustive build.
Change-Id: I821dc7495a901f1658daa500daf3791b386c7185
Reviewed-on: http://gerrit.cloudera.org:8080/10823
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/30e82c63
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/30e82c63
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/30e82c63
Branch: refs/heads/master
Commit: 30e82c63ecdd56ded10fed931d95ab6d994b9244
Parents: 6f52ce1
Author: Bikramjeet Vig <bi...@cloudera.com>
Authored: Mon Jun 25 18:11:08 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue Jul 3 20:34:27 2018 +0000
----------------------------------------------------------------------
be/src/exec/CMakeLists.txt | 2 -
be/src/exec/hdfs-avro-table-writer.cc | 295 ---------------
be/src/exec/hdfs-avro-table-writer.h | 121 -------
be/src/exec/hdfs-sequence-table-writer.cc | 361 -------------------
be/src/exec/hdfs-sequence-table-writer.h | 194 ----------
be/src/exec/hdfs-table-sink.cc | 48 +--
be/src/exec/hdfs-text-table-writer.cc | 61 +---
be/src/exec/hdfs-text-table-writer.h | 9 -
be/src/service/query-options-test.cc | 2 -
be/src/service/query-options.cc | 16 -
be/src/service/query-options.h | 5 +-
common/thrift/ImpalaInternalService.thrift | 6 -
common/thrift/ImpalaService.thrift | 6 +-
.../apache/impala/planner/PlannerTestBase.java | 1 -
testdata/bad_avro_snap/README | 4 +-
.../queries/QueryTest/avro-writer.test | 43 ---
.../queries/QueryTest/seq-writer.test | 308 ----------------
.../functional-query/queries/QueryTest/set.test | 3 -
.../queries/QueryTest/text-writer.test | 47 ---
.../queries/QueryTest/unsupported-writers.test | 77 ++++
tests/common/test_dimensions.py | 13 -
tests/hs2/test_hs2.py | 8 +-
tests/metadata/test_partition_metadata.py | 26 +-
tests/query_test/test_compressed_formats.py | 62 +---
tests/shell/test_shell_interactive.py | 10 +-
25 files changed, 121 insertions(+), 1607 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/be/src/exec/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/be/src/exec/CMakeLists.txt b/be/src/exec/CMakeLists.txt
index 1753cb0..4544b95 100644
--- a/be/src/exec/CMakeLists.txt
+++ b/be/src/exec/CMakeLists.txt
@@ -56,12 +56,10 @@ add_library(Exec
hdfs-rcfile-scanner.cc
hdfs-sequence-scanner.cc
hdfs-avro-scanner.cc
- hdfs-avro-table-writer.cc
hdfs-avro-scanner-ir.cc
hdfs-plugin-text-scanner.cc
hdfs-text-scanner.cc
hdfs-text-table-writer.cc
- hdfs-sequence-table-writer.cc
hdfs-parquet-scanner.cc
hdfs-parquet-scanner-ir.cc
hdfs-parquet-table-writer.cc
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/be/src/exec/hdfs-avro-table-writer.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-avro-table-writer.cc b/be/src/exec/hdfs-avro-table-writer.cc
deleted file mode 100644
index 3ce296d..0000000
--- a/be/src/exec/hdfs-avro-table-writer.cc
+++ /dev/null
@@ -1,295 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements. See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership. The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License. You may obtain a copy of the License at
-//
-// http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied. See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-#include "exec/hdfs-avro-table-writer.h"
-
-#include <vector>
-#include <hdfs.h>
-#include <boost/scoped_ptr.hpp>
-#include <stdlib.h>
-#include <gutil/strings/substitute.h>
-
-#include "exec/exec-node.h"
-#include "exec/hdfs-table-sink.h"
-#include "util/compress.h"
-#include "util/hdfs-util.h"
-#include "util/uid-util.h"
-#include "exprs/scalar-expr.h"
-#include "exprs/scalar-expr-evaluator.h"
-#include "runtime/mem-pool.h"
-#include "runtime/mem-tracker.h"
-#include "runtime/raw-value.h"
-#include "runtime/row-batch.h"
-#include "runtime/runtime-state.h"
-#include "runtime/hdfs-fs-cache.h"
-#include "runtime/types.h"
-#include "util/runtime-profile-counters.h"
-#include "write-stream.inline.h"
-
-#include "common/names.h"
-
-using namespace strings;
-using namespace impala;
-
-const uint8_t OBJ1[4] = {'O', 'b', 'j', 1};
-const char* AVRO_SCHEMA_STR = "avro.schema";
-const char* AVRO_CODEC_STR = "avro.codec";
-const THdfsCompression::type AVRO_DEFAULT_CODEC = THdfsCompression::SNAPPY;
-// Desired size of each Avro block (bytes); actual block size will vary +/- the
-// size of a row. This is approximate size of the block before compression.
-const int DEFAULT_AVRO_BLOCK_SIZE = 64 * 1024;
-
-HdfsAvroTableWriter::HdfsAvroTableWriter(HdfsTableSink* parent,
- RuntimeState* state, OutputPartition* output,
- const HdfsPartitionDescriptor* partition, const HdfsTableDescriptor* table_desc)
- : HdfsTableWriter(parent, state, output, partition, table_desc),
- unflushed_rows_(0) {
- mem_pool_.reset(new MemPool(parent->mem_tracker()));
-}
-
-void HdfsAvroTableWriter::ConsumeRow(TupleRow* row) {
- ++unflushed_rows_;
- int num_non_partition_cols =
- table_desc_->num_cols() - table_desc_->num_clustering_cols();
- for (int j = 0; j < num_non_partition_cols; ++j) {
- void* value = output_expr_evals_[j]->GetValue(row);
- AppendField(output_expr_evals_[j]->root().type(), value);
- }
-}
-
-inline void HdfsAvroTableWriter::AppendField(const ColumnType& type, const void* value) {
- // Each avro field is written as union, which is a ZLong indicating the union
- // field followed by the encoded value. Impala/Hive always stores values as
- // a union of [ColumnType, NULL].
- // TODO: the writer may be asked to write [NULL, ColumnType] unions. It is wrong
- // for us to assume [ColumnType, NULL].
-
- if (value == NULL) {
- // indicate the second field of the union
- out_.WriteZLong(1);
- // No bytes are written for a null value.
- return;
- }
-
- // indicate that we are using the first field of the union
- out_.WriteZLong(0);
-
- switch (type.type) {
- case TYPE_BOOLEAN:
- out_.WriteByte(*reinterpret_cast<const char*>(value));
- break;
- case TYPE_TINYINT:
- out_.WriteZInt(*reinterpret_cast<const int8_t*>(value));
- break;
- case TYPE_SMALLINT:
- out_.WriteZInt(*reinterpret_cast<const int16_t*>(value));
- break;
- case TYPE_INT:
- out_.WriteZInt(*reinterpret_cast<const int32_t*>(value));
- break;
- case TYPE_BIGINT:
- out_.WriteZLong(*reinterpret_cast<const int64_t*>(value));
- break;
- case TYPE_FLOAT:
- out_.WriteBytes(4, reinterpret_cast<const char*>(value));
- break;
- case TYPE_DOUBLE:
- out_.WriteBytes(8, reinterpret_cast<const char*>(value));
- break;
- case TYPE_STRING: {
- const StringValue& sv = *reinterpret_cast<const StringValue*>(value);
- out_.WriteZLong(sv.len);
- out_.WriteBytes(sv.len, sv.ptr);
- break;
- }
- case TYPE_DECIMAL: {
- int byte_size = ColumnType::GetDecimalByteSize(type.precision);
- out_.WriteZLong(byte_size);
-#if __BYTE_ORDER == __LITTLE_ENDIAN
- char tmp[16];
- BitUtil::ByteSwap(tmp, value, byte_size);
- out_.WriteBytes(byte_size, tmp);
-#else
- out_.WriteBytes(byte_size, reinterpret_cast<const char*>(value));
-#endif
- break;
- }
- case TYPE_TIMESTAMP:
- case TYPE_BINARY:
- case INVALID_TYPE:
- case TYPE_NULL:
- case TYPE_DATE:
- case TYPE_DATETIME:
- default:
- DCHECK(false);
- }
-}
-
-Status HdfsAvroTableWriter::Init() {
- // create the Sync marker
- sync_marker_ = GenerateUUIDString();
-
- THdfsCompression::type codec = AVRO_DEFAULT_CODEC;
- if (state_->query_options().__isset.compression_codec) {
- codec = state_->query_options().compression_codec;
- }
-
- // sets codec_name_ and compressor_
- codec_type_ = codec;
- switch (codec) {
- case THdfsCompression::SNAPPY:
- codec_name_ = "snappy";
- break;
- case THdfsCompression::DEFLATE:
- codec_name_ = "deflate";
- break;
- case THdfsCompression::NONE:
- codec_name_ = "null";
- return Status::OK();
- default:
- const char* name = _THdfsCompression_VALUES_TO_NAMES.find(codec)->second;
- return Status(Substitute(
- "Avro only supports NONE, DEFLATE, and SNAPPY codecs; unsupported codec $0",
- name));
- }
- RETURN_IF_ERROR(Codec::CreateCompressor(mem_pool_.get(), true, codec, &compressor_));
- DCHECK(compressor_.get() != NULL);
-
- return Status::OK();
-}
-
-void HdfsAvroTableWriter::Close() {
- mem_pool_->FreeAll();
-}
-
-Status HdfsAvroTableWriter::AppendRows(
- RowBatch* batch, const vector<int32_t>& row_group_indices, bool* new_file) {
- int32_t limit;
- bool all_rows = row_group_indices.empty();
- if (all_rows) {
- limit = batch->num_rows();
- } else {
- limit = row_group_indices.size();
- }
- COUNTER_ADD(parent_->rows_inserted_counter(), limit);
-
- {
- SCOPED_TIMER(parent_->encode_timer());
- for (int row_idx = 0; row_idx < limit; ++row_idx) {
- TupleRow* row = all_rows ?
- batch->GetRow(row_idx) : batch->GetRow(row_group_indices[row_idx]);
- ConsumeRow(row);
- }
- }
-
- if (out_.Size() > DEFAULT_AVRO_BLOCK_SIZE) RETURN_IF_ERROR(Flush());
- *new_file = false;
- return Status::OK();
-}
-
-Status HdfsAvroTableWriter::WriteFileHeader() {
- out_.Clear();
- out_.WriteBytes(4, reinterpret_cast<const uint8_t*>(OBJ1));
-
- // Write 'File Metadata' as an encoded avro map
- // number of key/value pairs in the map
- out_.WriteZLong(2);
-
- // Schema information
- out_.WriteZLong(strlen(AVRO_SCHEMA_STR));
- out_.WriteBytes(strlen(AVRO_SCHEMA_STR), AVRO_SCHEMA_STR);
- const string& avro_schema = table_desc_->avro_schema();
- out_.WriteZLong(avro_schema.size());
- out_.WriteBytes(avro_schema.size(), avro_schema.data());
-
- // codec information
- out_.WriteZLong(strlen(AVRO_CODEC_STR));
- out_.WriteBytes(strlen(AVRO_CODEC_STR), AVRO_CODEC_STR);
- out_.WriteZLong(codec_name_.size());
- out_.WriteBytes(codec_name_.size(), codec_name_.data());
-
- // Write end of map marker
- out_.WriteZLong(0);
-
- out_.WriteBytes(sync_marker_.size(), sync_marker_.data());
-
- const string& text = out_.String();
- RETURN_IF_ERROR(Write(reinterpret_cast<const uint8_t*>(text.c_str()),
- text.size()));
- out_.Clear();
- return Status::OK();
-}
-
-Status HdfsAvroTableWriter::Flush() {
- if (unflushed_rows_ == 0) return Status::OK();
-
- WriteStream header;
- // 1. Count of objects in this block
- header.WriteZLong(unflushed_rows_);
-
- const uint8_t* output;
- int64_t output_length;
- // Snappy format requires a CRC after the compressed data
- uint32_t crc;
- const string& text = out_.String();
-
- if (codec_type_ != THdfsCompression::NONE) {
- SCOPED_TIMER(parent_->compress_timer());
- uint8_t* temp;
- RETURN_IF_ERROR(compressor_->ProcessBlock(false, text.size(),
- reinterpret_cast<const uint8_t*>(text.data()), &output_length, &temp));
- output = temp;
- if (codec_type_ == THdfsCompression::SNAPPY) {
- crc = SnappyCompressor::ComputeChecksum(
- text.size(), reinterpret_cast<const uint8_t*>(text.data()));
- }
- } else {
- output = reinterpret_cast<const uint8_t*>(text.data());
- output_length = out_.Size();
- }
-
- // 2. length of serialized objects
- if (codec_type_ == THdfsCompression::SNAPPY) {
- // + 4 for the CRC checksum at the end of the compressed block
- header.WriteZLong(output_length + 4);
- } else {
- header.WriteZLong(output_length);
- }
-
- const string& head = header.String();
- {
- SCOPED_TIMER(parent_->hdfs_write_timer());
- // Flush (1) and (2) to HDFS
- RETURN_IF_ERROR(
- Write(reinterpret_cast<const uint8_t*>(head.data()), head.size()));
- // 3. serialized objects
- RETURN_IF_ERROR(Write(output, output_length));
-
- // Write CRC checksum
- if (codec_type_ == THdfsCompression::SNAPPY) {
- RETURN_IF_ERROR(Write(reinterpret_cast<const uint8_t*>(&crc), sizeof(uint32_t)));
- }
- }
-
- // 4. sync marker
- RETURN_IF_ERROR(
- Write(reinterpret_cast<const uint8_t*>(sync_marker_.data()), sync_marker_.size()));
-
- out_.Clear();
- unflushed_rows_ = 0;
- return Status::OK();
-}
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/be/src/exec/hdfs-avro-table-writer.h
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-avro-table-writer.h b/be/src/exec/hdfs-avro-table-writer.h
deleted file mode 100644
index 6966860..0000000
--- a/be/src/exec/hdfs-avro-table-writer.h
+++ /dev/null
@@ -1,121 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements. See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership. The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License. You may obtain a copy of the License at
-//
-// http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied. See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-#ifndef IMPALA_EXEC_HDFS_AVRO_WRITER_H
-#define IMPALA_EXEC_HDFS_AVRO_WRITER_H
-
-#include <hdfs.h>
-#include <sstream>
-#include <string>
-
-#include "common/status.h"
-#include "exec/hdfs-table-writer.h"
-#include "runtime/mem-pool.h"
-#include "util/codec.h"
-#include "exec/write-stream.h"
-
-namespace impala {
-
-struct ColumnType;
-class HdfsTableSink;
-class RuntimeState;
-class ScalarExprEvaluator;
-class TupleDescriptor;
-class TupleRow;
-struct OutputPartition;
-struct StringValue;
-
-/// Consumes rows and outputs the rows into an Avro file in HDFS
-/// Each Avro file contains a block of records (rows). The file metadata specifies the
-/// schema of the records in addition to the name of the codec, if any, used to compress
-/// blocks. The structure is:
-/// [ Metadata ]
-/// [ Sync Marker ]
-/// [ Data Block ]
-/// ...
-/// [ Data Block ]
-//
-/// Each Data Block consists of:
-/// [ Number of Rows in Block ]
-/// [ Size of serialized objects, after compression ]
-/// [ Serialized objects, compressed ]
-/// [ Sync Marker ]
-//
-/// If compression is used, each block is compressed individually. The block size defaults
-/// to about 64KB before compression.
-/// This writer implements the Avro 1.7.7 spec:
-/// http://avro.apache.org/docs/1.7.7/spec.html
-class HdfsAvroTableWriter : public HdfsTableWriter {
- public:
- HdfsAvroTableWriter(HdfsTableSink* parent,
- RuntimeState* state, OutputPartition* output,
- const HdfsPartitionDescriptor* partition,
- const HdfsTableDescriptor* table_desc);
-
- virtual ~HdfsAvroTableWriter() { }
-
- virtual Status Init() override;
- virtual Status Finalize() override { return Flush(); }
- virtual Status InitNewFile() override { return WriteFileHeader(); }
- virtual void Close() override;
- virtual uint64_t default_block_size() const override { return 0; }
- virtual std::string file_extension() const override { return "avro"; }
-
- /// Outputs the given rows into an HDFS sequence file. The rows are buffered
- /// to fill a sequence file block.
- virtual Status AppendRows(RowBatch* rows,
- const std::vector<int32_t>& row_group_indices, bool* new_file) override;
-
- private:
- /// Processes a single row, appending to out_
- void ConsumeRow(TupleRow* row);
-
- /// Adds an encoded field to out_
- inline void AppendField(const ColumnType& type, const void* value);
-
- /// Writes the Avro file header to HDFS
- Status WriteFileHeader() WARN_UNUSED_RESULT;
-
- /// Writes the contents of out_ to HDFS as a single Avro file block.
- /// Returns an error if write to HDFS fails.
- Status Flush() WARN_UNUSED_RESULT;
-
- /// Buffer which holds accumulated output
- WriteStream out_;
-
- /// Memory pool used by codec to allocate output buffer.
- /// Owned by this class. Initialized using parent's memtracker.
- boost::scoped_ptr<MemPool> mem_pool_;
-
- /// Number of rows consumed since last flush
- uint64_t unflushed_rows_;
-
- /// Name of codec, only set if codec_type_ != NONE
- std::string codec_name_;
-
- /// Type of the codec, will be NONE if no compression is used
- THdfsCompression::type codec_type_;
-
- /// The codec for compressing, only set if codec_type_ != NONE
- boost::scoped_ptr<Codec> compressor_;
-
- /// 16 byte sync marker (a uuid)
- std::string sync_marker_;
-};
-
-} // namespace impala
-#endif
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/be/src/exec/hdfs-sequence-table-writer.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-sequence-table-writer.cc b/be/src/exec/hdfs-sequence-table-writer.cc
deleted file mode 100644
index 42a70f0..0000000
--- a/be/src/exec/hdfs-sequence-table-writer.cc
+++ /dev/null
@@ -1,361 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements. See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership. The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License. You may obtain a copy of the License at
-//
-// http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied. See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-#include "exec/hdfs-sequence-table-writer.h"
-#include "exec/write-stream.inline.h"
-#include "exec/exec-node.h"
-#include "util/hdfs-util.h"
-#include "util/uid-util.h"
-#include "exprs/scalar-expr.h"
-#include "exprs/scalar-expr-evaluator.h"
-#include "runtime/mem-tracker.h"
-#include "runtime/raw-value.h"
-#include "runtime/row-batch.h"
-#include "runtime/runtime-state.h"
-#include "runtime/hdfs-fs-cache.h"
-#include "util/runtime-profile-counters.h"
-
-#include <vector>
-#include <hdfs.h>
-#include <boost/scoped_ptr.hpp>
-#include <stdlib.h>
-
-#include "common/names.h"
-
-namespace impala {
-
-const uint8_t HdfsSequenceTableWriter::SEQ6_CODE[4] = {'S', 'E', 'Q', 6};
-const char* HdfsSequenceTableWriter::VALUE_CLASS_NAME = "org.apache.hadoop.io.Text";
-const char* HdfsSequenceTableWriter::KEY_CLASS_NAME =
- "org.apache.hadoop.io.BytesWritable";
-
-HdfsSequenceTableWriter::HdfsSequenceTableWriter(HdfsTableSink* parent,
- RuntimeState* state, OutputPartition* output,
- const HdfsPartitionDescriptor* partition, const HdfsTableDescriptor* table_desc)
- : HdfsTableWriter(parent, state, output, partition, table_desc),
- mem_pool_(new MemPool(parent->mem_tracker())), compress_flag_(false),
- unflushed_rows_(0), record_compression_(false) {
- approx_block_size_ = 64 * 1024 * 1024;
- parent->mem_tracker()->Consume(approx_block_size_);
- field_delim_ = partition->field_delim();
- escape_char_ = partition->escape_char();
-}
-
-Status HdfsSequenceTableWriter::Init() {
- THdfsCompression::type codec = THdfsCompression::SNAPPY_BLOCKED;
- const TQueryOptions& query_options = state_->query_options();
- if (query_options.__isset.compression_codec) {
- codec = query_options.compression_codec;
- if (codec == THdfsCompression::SNAPPY) {
- // Seq file (and in general things that use hadoop.io.codec) always
- // mean snappy_blocked.
- codec = THdfsCompression::SNAPPY_BLOCKED;
- }
- }
- if (codec != THdfsCompression::NONE) {
- compress_flag_ = true;
- if (query_options.__isset.seq_compression_mode) {
- record_compression_ =
- query_options.seq_compression_mode == THdfsSeqCompressionMode::RECORD;
- }
- RETURN_IF_ERROR(Codec::GetHadoopCodecClassName(codec, &codec_name_));
- RETURN_IF_ERROR(Codec::CreateCompressor(
- mem_pool_.get(), true, codec_name_, &compressor_));
- DCHECK(compressor_.get() != NULL);
- }
-
- // create the Sync marker
- string uuid = GenerateUUIDString();
- uint8_t sync_neg1[20];
-
- ReadWriteUtil::PutInt(sync_neg1, static_cast<uint32_t>(-1));
- DCHECK(uuid.size() == 16);
- memcpy(sync_neg1 + sizeof(int32_t), uuid.data(), uuid.size());
- neg1_sync_marker_ = string(reinterpret_cast<char*>(sync_neg1), 20);
- sync_marker_ = uuid;
-
- return Status::OK();
-}
-
-Status HdfsSequenceTableWriter::AppendRows(
- RowBatch* batch, const vector<int32_t>& row_group_indices, bool* new_file) {
- int32_t limit;
- if (row_group_indices.empty()) {
- limit = batch->num_rows();
- } else {
- limit = row_group_indices.size();
- }
- COUNTER_ADD(parent_->rows_inserted_counter(), limit);
-
- bool all_rows = row_group_indices.empty();
- int num_non_partition_cols =
- table_desc_->num_cols() - table_desc_->num_clustering_cols();
- DCHECK_GE(output_expr_evals_.size(), num_non_partition_cols)
- << parent_->DebugString();
-
- {
- SCOPED_TIMER(parent_->encode_timer());
- if (all_rows) {
- for (int row_idx = 0; row_idx < limit; ++row_idx) {
- RETURN_IF_ERROR(ConsumeRow(batch->GetRow(row_idx)));
- }
- } else {
- for (int row_idx = 0; row_idx < limit; ++row_idx) {
- TupleRow* row = batch->GetRow(row_group_indices[row_idx]);
- RETURN_IF_ERROR(ConsumeRow(row));
- }
- }
- }
-
- if (!compress_flag_) {
- out_.WriteBytes(neg1_sync_marker_.size(), neg1_sync_marker_.data());
- }
-
- if (out_.Size() >= approx_block_size_) RETURN_IF_ERROR(Flush());
- *new_file = false;
- return Status::OK();
-}
-
-Status HdfsSequenceTableWriter::WriteFileHeader() {
- out_.WriteBytes(sizeof(SEQ6_CODE), SEQ6_CODE);
-
- // Setup to be correct key class
- out_.WriteText(strlen(KEY_CLASS_NAME),
- reinterpret_cast<const uint8_t*>(KEY_CLASS_NAME));
-
- // Setup to be correct value class
- out_.WriteText(strlen(VALUE_CLASS_NAME),
- reinterpret_cast<const uint8_t*>(VALUE_CLASS_NAME));
-
- // Flag for if compression is used
- out_.WriteBoolean(compress_flag_);
- // Only valid if compression is used. Indicates if block compression is used.
- out_.WriteBoolean(compress_flag_ && !record_compression_);
-
- // Output the name of our compression codec, parsed by readers
- if (compress_flag_) {
- out_.WriteText(codec_name_.size(),
- reinterpret_cast<const uint8_t*>(codec_name_.data()));
- }
-
- // Meta data is formated as an integer N followed by N*2 strings,
- // which are key-value pairs. Hive does not write meta data, so neither does Impala
- out_.WriteInt(0);
-
- // write the sync marker
- out_.WriteBytes(sync_marker_.size(), sync_marker_.data());
-
- string text = out_.String();
- RETURN_IF_ERROR(Write(reinterpret_cast<const uint8_t*>(text.c_str()), text.size()));
- out_.Clear();
- return Status::OK();
-}
-
-Status HdfsSequenceTableWriter::WriteCompressedBlock() {
- WriteStream record;
- uint8_t *output;
- int64_t output_length;
- DCHECK(compress_flag_);
-
- // Add a sync marker to start of the block
- record.WriteBytes(neg1_sync_marker_.size(), neg1_sync_marker_.data());
-
- // Output the number of rows in this block
- record.WriteVLong(unflushed_rows_);
-
- // Output compressed key-lengths block-size & compressed key-lengths block.
- // The key-lengths block contains byte value of 4 as a key length for each row (this is
- // what Hive does).
- string key_lengths_text(unflushed_rows_, '\x04');
- {
- SCOPED_TIMER(parent_->compress_timer());
- RETURN_IF_ERROR(compressor_->ProcessBlock(false, key_lengths_text.size(),
- reinterpret_cast<const uint8_t*>(key_lengths_text.data()), &output_length,
- &output));
- }
- record.WriteVInt(output_length);
- record.WriteBytes(output_length, output);
-
- // Output compressed keys block-size & compressed keys block.
- // The keys block contains "\0\0\0\0" byte sequence as a key for each row (this is what
- // Hive does).
- string keys_text(unflushed_rows_ * 4, '\0');
- {
- SCOPED_TIMER(parent_->compress_timer());
- RETURN_IF_ERROR(compressor_->ProcessBlock(false, keys_text.size(),
- reinterpret_cast<const uint8_t*>(keys_text.data()), &output_length, &output));
- }
- record.WriteVInt(output_length);
- record.WriteBytes(output_length, output);
-
- // Output compressed value-lengths block-size & compressed value-lengths block
- string value_lengths_text = out_value_lengths_block_.String();
- {
- SCOPED_TIMER(parent_->compress_timer());
- RETURN_IF_ERROR(compressor_->ProcessBlock(false, value_lengths_text.size(),
- reinterpret_cast<const uint8_t*>(value_lengths_text.data()), &output_length, &output));
- }
- record.WriteVInt(output_length);
- record.WriteBytes(output_length, output);
-
- // Output compressed values block-size & compressed values block
- string text = out_.String();
- {
- SCOPED_TIMER(parent_->compress_timer());
- RETURN_IF_ERROR(compressor_->ProcessBlock(false, text.size(),
- reinterpret_cast<const uint8_t*>(text.data()), &output_length, &output));
- }
- record.WriteVInt(output_length);
- record.WriteBytes(output_length, output);
-
- string rec = record.String();
- RETURN_IF_ERROR(Write(reinterpret_cast<const uint8_t*>(rec.data()), rec.size()));
- return Status::OK();
-}
-
-inline void HdfsSequenceTableWriter::WriteEscapedString(const StringValue* str_val,
- WriteStream* buf) {
- for (int i = 0; i < str_val->len; ++i) {
- if (str_val->ptr[i] == field_delim_ || str_val->ptr[i] == escape_char_) {
- buf->WriteByte(escape_char_);
- }
- buf->WriteByte(str_val->ptr[i]);
- }
-}
-
-void HdfsSequenceTableWriter::EncodeRow(TupleRow* row, WriteStream* buf) {
- // TODO Unify with text table writer
- int num_non_partition_cols =
- table_desc_->num_cols() - table_desc_->num_clustering_cols();
- DCHECK_GE(output_expr_evals_.size(), num_non_partition_cols)
- << parent_->DebugString();
- for (int j = 0; j < num_non_partition_cols; ++j) {
- void* value = output_expr_evals_[j]->GetValue(row);
- if (value != NULL) {
- if (output_expr_evals_[j]->root().type().type == TYPE_STRING) {
- WriteEscapedString(reinterpret_cast<const StringValue*>(value), &row_buf_);
- } else {
- string str;
- output_expr_evals_[j]->PrintValue(value, &str);
- buf->WriteBytes(str.size(), str.data());
- }
- } else {
- // NULLs in hive are encoded based on the 'serialization.null.format' property.
- const string& null_val = table_desc_->null_column_value();
- buf->WriteBytes(null_val.size(), null_val.data());
- }
- // Append field delimiter.
- if (j + 1 < num_non_partition_cols) {
- buf->WriteByte(field_delim_);
- }
- }
-}
-
-inline Status HdfsSequenceTableWriter::ConsumeRow(TupleRow* row) {
- ++unflushed_rows_;
- row_buf_.Clear();
- if (compress_flag_ && !record_compression_) {
- // Output row for a block compressed sequence file.
- // Value block: Write the length as a vlong and then write the contents.
- EncodeRow(row, &row_buf_);
- out_.WriteVLong(row_buf_.Size());
- out_.WriteBytes(row_buf_.Size(), row_buf_.String().data());
- // Value-lengths block: Write the number of bytes we have just written to out_ as
- // vlong
- out_value_lengths_block_.WriteVLong(
- ReadWriteUtil::VLongRequiredBytes(row_buf_.Size()) + row_buf_.Size());
- return Status::OK();
- }
-
- EncodeRow(row, &row_buf_);
-
- const uint8_t* value_bytes;
- int64_t value_length;
- string text = row_buf_.String();
- if (compress_flag_) {
- // apply compression to row_buf_
- // the length of the buffer must be prefixed to the buffer prior to compression
- //
- // TODO this incurs copy overhead to place the length in front of the
- // buffer prior to compression. We may want to rewrite to avoid copying.
- row_buf_.Clear();
- // encoding as "Text" writes the length before the text
- row_buf_.WriteText(text.size(), reinterpret_cast<const uint8_t*>(&text.data()[0]));
- text = row_buf_.String();
- uint8_t *tmp;
- {
- SCOPED_TIMER(parent_->compress_timer());
- RETURN_IF_ERROR(compressor_->ProcessBlock(false, text.size(),
- reinterpret_cast<const uint8_t*>(text.data()), &value_length, &tmp));
- }
- value_bytes = tmp;
- } else {
- value_length = text.size();
- DCHECK_EQ(value_length, row_buf_.Size());
- value_bytes = reinterpret_cast<const uint8_t*>(text.data());
- }
-
- int rec_len = value_length;
- // if the record is compressed, the length is part of the compressed text
- // if not, then we need to write the length (below) and account for it's size
- if (!compress_flag_) {
- rec_len += ReadWriteUtil::VLongRequiredBytes(value_length);
- }
- // The record contains the key, account for it's size (we use "\0\0\0\0" byte sequence
- // as a key just like Hive).
- rec_len += 4;
-
- // Length of the record (incl. key and value length)
- out_.WriteInt(rec_len);
-
- // Write length of the key and the key
- out_.WriteInt(4);
- out_.WriteBytes(4, "\0\0\0\0");
-
- // if the record is compressed, the length is part of the compressed text
- if (!compress_flag_) out_.WriteVLong(value_length);
-
- // write out the value (possibly compressed)
- out_.WriteBytes(value_length, value_bytes);
- return Status::OK();
-}
-
-Status HdfsSequenceTableWriter::Flush() {
- if (unflushed_rows_ == 0) return Status::OK();
-
- SCOPED_TIMER(parent_->hdfs_write_timer());
-
- if (compress_flag_ && !record_compression_) {
- RETURN_IF_ERROR(WriteCompressedBlock());
- } else {
- string out_str = out_.String();
- RETURN_IF_ERROR(
- Write(reinterpret_cast<const uint8_t*>(out_str.data()), out_str.size()));
- }
- out_.Clear();
- out_value_lengths_block_.Clear();
- unflushed_rows_ = 0;
- return Status::OK();
-}
-
-void HdfsSequenceTableWriter::Close() {
- // TODO: double check there is no memory leak.
- parent_->mem_tracker()->Release(approx_block_size_);
- mem_pool_->FreeAll();
-}
-
-} // namespace impala
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/be/src/exec/hdfs-sequence-table-writer.h
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-sequence-table-writer.h b/be/src/exec/hdfs-sequence-table-writer.h
deleted file mode 100644
index f315920..0000000
--- a/be/src/exec/hdfs-sequence-table-writer.h
+++ /dev/null
@@ -1,194 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one
-// or more contributor license agreements. See the NOTICE file
-// distributed with this work for additional information
-// regarding copyright ownership. The ASF licenses this file
-// to you under the Apache License, Version 2.0 (the
-// "License"); you may not use this file except in compliance
-// with the License. You may obtain a copy of the License at
-//
-// http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing,
-// software distributed under the License is distributed on an
-// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-// KIND, either express or implied. See the License for the
-// specific language governing permissions and limitations
-// under the License.
-
-#ifndef IMPALA_EXEC_HDFS_SEQUENCE_WRITER_H
-#define IMPALA_EXEC_HDFS_SEQUENCE_WRITER_H
-
-#include <hdfs.h>
-#include <sstream>
-
-#include "runtime/descriptors.h"
-#include "exec/hdfs-table-sink.h"
-#include "exec/hdfs-table-writer.h"
-#include "util/codec.h"
-#include "write-stream.h"
-
-namespace impala {
-
-class Expr;
-class TupleDescriptor;
-class TupleRow;
-class RuntimeState;
-struct StringValue;
-struct OutputPartition;
-
-/// Sequence files are flat files consisting of binary key/value pairs. Essentially there
-/// are 3 different formats for sequence files depending on the 'compression_codec' and
-/// 'seq_compression_mode' query options:
-/// - Uncompressed sequence file format
-/// - Record-compressed sequence file format
-/// - Block-compressed sequence file format
-/// All of them share a common header described below.
-///
-/// Sequence File Header
-/// --------------------
-/// - version - 3 bytes of magic header SEQ, followed by 1 byte of actual version number
-/// (e.g. SEQ4 or SEQ6)
-/// - keyClassName - key class
-/// - valueClassName - value class
-/// - compression - A boolean which specifies if compression is turned on for keys/values
-/// in this file.
-/// - blockCompression - A boolean which specifies if block-compression is turned on for
-/// keys/values in this file.
-/// - compression codec - compression codec class which is used for compression of keys
-/// and/or values (if compression is enabled).
-/// - metadata - SequenceFile.Metadata for this file.
-/// - sync - A 16 byte sync marker to denote end of the header.
-///
-/// Uncompressed Sequence File Format
-/// ---------------------------------
-/// - Header
-/// - Record
-/// - Record length
-/// - Key length
-/// - Key
-/// - Value
-/// - "\xFF\xFF\xFF\xFF" followed by a sync-marker every few 100 bytes or so.
-///
-/// Record-Compressed Sequence File Format
-/// --------------------------------------
-/// - Header
-/// - Record
-/// - Record length
-/// - Key length
-/// - Key
-/// - Compressed Value
-/// - "\xFF\xFF\xFF\xFF" followed by a sync-marker every few 100 bytes or so.
-///
-/// Block-Compressed Sequence File Format
-/// -------------------------------------
-/// - Header
-/// - Record Block
-/// - Uncompressed number of records in the block
-/// - Compressed key-lengths block-size
-/// - Compressed key-lengths block
-/// - Compressed keys block-size
-/// - Compressed keys block
-/// - Compressed value-lengths block-size
-/// - Compressed value-lengths block
-/// - Compressed values block-size
-/// - Compressed values block
-/// - "\xFF\xFF\xFF\xFF" followed by a sync-marker every block.
-/// The compressed blocks of key lengths and value lengths consist of the actual lengths
-/// of individual keys/values encoded in zero-compressed integer format.
-
-/// Consumes rows and outputs the rows into a sequence file in HDFS
-/// Output is buffered to fill sequence file blocks.
-class HdfsSequenceTableWriter : public HdfsTableWriter {
- public:
- HdfsSequenceTableWriter(HdfsTableSink* parent, RuntimeState* state,
- OutputPartition* output, const HdfsPartitionDescriptor* partition,
- const HdfsTableDescriptor* table_desc);
-
- ~HdfsSequenceTableWriter() { }
-
- virtual Status Init();
- virtual Status Finalize() { return Flush(); }
- virtual Status InitNewFile() { return WriteFileHeader(); }
- virtual void Close();
- virtual uint64_t default_block_size() const { return 0; }
- virtual std::string file_extension() const { return "seq"; }
-
- /// Outputs the given rows into an HDFS sequence file. The rows are buffered
- /// to fill a sequence file block.
- virtual Status AppendRows(
- RowBatch* rows, const std::vector<int32_t>& row_group_indices, bool* new_file);
-
- private:
- /// processes a single row, delegates to Compress or NoCompress ConsumeRow().
- inline Status ConsumeRow(TupleRow* row);
-
- /// writes the SEQ file header to HDFS
- Status WriteFileHeader();
-
- /// writes the contents of out_value_lengths_block_ and out_ as a single
- /// block-compressed record.
- Status WriteCompressedBlock();
-
- /// writes the tuple row to the given buffer; separates fields by field_delim_,
- /// escapes string.
- inline void EncodeRow(TupleRow* row, WriteStream* buf);
-
- /// writes the str_val to the buffer, escaping special characters
- inline void WriteEscapedString(const StringValue* str_val, WriteStream* buf);
-
- /// flushes the output -- clearing out_ and writing to HDFS
- /// if compress_flag_, will write contents of out_ as a single compressed block
- Status Flush();
-
- /// desired size of each block (bytes); actual block size will vary +/- the
- /// size of a row; this is before compression is applied.
- uint64_t approx_block_size_;
-
- /// buffer which holds accumulated output
- WriteStream out_;
-
- /// buffer which holds accumulated value-lengths output (used with block-compressed
- /// sequence files)
- WriteStream out_value_lengths_block_;
-
- /// Temporary Buffer for a single row
- WriteStream row_buf_;
-
- /// memory pool used by codec to allocate output buffer
- boost::scoped_ptr<MemPool> mem_pool_;
-
- /// true if compression is enabled
- bool compress_flag_;
-
- /// number of rows consumed since last flush
- uint64_t unflushed_rows_;
-
- /// name of codec, only set if compress_flag_
- std::string codec_name_;
- /// the codec for compressing, only set if compress_flag_
- boost::scoped_ptr<Codec> compressor_;
-
- /// true if compression is applied on each record individually
- bool record_compression_;
-
- /// Character delimiting fields
- char field_delim_;
-
- /// Escape character for text encoding
- char escape_char_;
-
- /// 16 byte sync marker (a uuid)
- std::string sync_marker_;
- /// A -1 infront of the sync marker, used in decompressed formats
- std::string neg1_sync_marker_;
-
- /// Name of java class to use when reading the keys
- static const char* KEY_CLASS_NAME;
- /// Name of java class to use when reading the values
- static const char* VALUE_CLASS_NAME;
- /// Magic characters used to identify the file type
- static const uint8_t SEQ6_CODE[4];
-};
-
-} // namespace impala
-#endif
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/be/src/exec/hdfs-table-sink.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-table-sink.cc b/be/src/exec/hdfs-table-sink.cc
index b6de7cf..9c46638 100644
--- a/be/src/exec/hdfs-table-sink.cc
+++ b/be/src/exec/hdfs-table-sink.cc
@@ -18,8 +18,6 @@
#include "exec/hdfs-table-sink.h"
#include "exec/hdfs-table-writer.h"
#include "exec/hdfs-text-table-writer.h"
-#include "exec/hdfs-sequence-table-writer.h"
-#include "exec/hdfs-avro-table-writer.h"
#include "exec/hdfs-parquet-table-writer.h"
#include "exec/exec-node.h"
#include "gen-cpp/ImpalaInternalService_constants.h"
@@ -469,28 +467,20 @@ Status HdfsTableSink::InitOutputPartition(RuntimeState* state,
output_partition->partition_descriptor = &partition_descriptor;
- bool allow_unsupported_formats =
- state->query_options().__isset.allow_unsupported_formats &&
- state->query_options().allow_unsupported_formats;
- if (!allow_unsupported_formats) {
- if (partition_descriptor.file_format() == THdfsFileFormat::SEQUENCE_FILE ||
- partition_descriptor.file_format() == THdfsFileFormat::AVRO) {
- stringstream error_msg;
- map<int, const char*>::const_iterator i =
- _THdfsFileFormat_VALUES_TO_NAMES.find(partition_descriptor.file_format());
- error_msg << "Writing to table format " << i->second
- << " is not supported. Use query option ALLOW_UNSUPPORTED_FORMATS"
- " to override.";
- return Status(error_msg.str());
- }
- if (partition_descriptor.file_format() == THdfsFileFormat::TEXT &&
- state->query_options().__isset.compression_codec &&
- state->query_options().compression_codec != THdfsCompression::NONE) {
- stringstream error_msg;
- error_msg << "Writing to compressed text table is not supported. "
- "Use query option ALLOW_UNSUPPORTED_FORMATS to override.";
- return Status(error_msg.str());
- }
+ if (partition_descriptor.file_format() == THdfsFileFormat::SEQUENCE_FILE ||
+ partition_descriptor.file_format() == THdfsFileFormat::AVRO) {
+ stringstream error_msg;
+ map<int, const char*>::const_iterator i =
+ _THdfsFileFormat_VALUES_TO_NAMES.find(partition_descriptor.file_format());
+ error_msg << "Writing to table format " << i->second << " is not supported.";
+ return Status(error_msg.str());
+ }
+ if (partition_descriptor.file_format() == THdfsFileFormat::TEXT &&
+ state->query_options().__isset.compression_codec &&
+ state->query_options().compression_codec != THdfsCompression::NONE) {
+ stringstream error_msg;
+ error_msg << "Writing to compressed text table is not supported. ";
+ return Status(error_msg.str());
}
// It is incorrect to initialize a writer if there are no rows to feed it. The writer
@@ -508,16 +498,6 @@ Status HdfsTableSink::InitOutputPartition(RuntimeState* state,
new HdfsParquetTableWriter(
this, state, output_partition, &partition_descriptor, table_desc_));
break;
- case THdfsFileFormat::SEQUENCE_FILE:
- output_partition->writer.reset(
- new HdfsSequenceTableWriter(
- this, state, output_partition, &partition_descriptor, table_desc_));
- break;
- case THdfsFileFormat::AVRO:
- output_partition->writer.reset(
- new HdfsAvroTableWriter(
- this, state, output_partition, &partition_descriptor, table_desc_));
- break;
default:
stringstream error_msg;
map<int, const char*>::const_iterator i =
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/be/src/exec/hdfs-text-table-writer.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-text-table-writer.cc b/be/src/exec/hdfs-text-table-writer.cc
index aaee773..f09b161 100644
--- a/be/src/exec/hdfs-text-table-writer.cc
+++ b/be/src/exec/hdfs-text-table-writer.cc
@@ -25,8 +25,6 @@
#include "runtime/row-batch.h"
#include "runtime/runtime-state.h"
#include "runtime/string-value.inline.h"
-#include "util/codec.h"
-#include "util/compress.h"
#include "util/hdfs-util.h"
#include "util/runtime-profile-counters.h"
@@ -35,13 +33,6 @@
#include "common/names.h"
-// Hdfs block size for compressed text.
-static const int64_t COMPRESSED_BLOCK_SIZE = 64 * 1024 * 1024;
-
-// Size to buffer before compression. We want this to be less than the block size
-// (compressed text is not splittable).
-static const int64_t COMPRESSED_BUFFERED_SIZE = 60 * 1024 * 1024;
-
namespace impala {
HdfsTextTableWriter::HdfsTextTableWriter(HdfsTableSink* parent,
@@ -61,41 +52,17 @@ HdfsTextTableWriter::HdfsTextTableWriter(HdfsTableSink* parent,
}
Status HdfsTextTableWriter::Init() {
- const TQueryOptions& query_options = state_->query_options();
- codec_ = THdfsCompression::NONE;
- if (query_options.__isset.compression_codec) {
- codec_ = query_options.compression_codec;
- if (codec_ == THdfsCompression::SNAPPY) {
- // hadoop.io.codec always means SNAPPY_BLOCKED. Alias the two.
- codec_ = THdfsCompression::SNAPPY_BLOCKED;
- }
- }
-
- if (codec_ != THdfsCompression::NONE) {
- mem_pool_.reset(new MemPool(parent_->mem_tracker()));
- RETURN_IF_ERROR(Codec::CreateCompressor(
- mem_pool_.get(), true, codec_, &compressor_));
- flush_size_ = COMPRESSED_BUFFERED_SIZE;
- } else {
- flush_size_ = HDFS_FLUSH_WRITE_SIZE;
- }
parent_->mem_tracker()->Consume(flush_size_);
return Status::OK();
}
void HdfsTextTableWriter::Close() {
parent_->mem_tracker()->Release(flush_size_);
- if (mem_pool_.get() != NULL) mem_pool_->FreeAll();
}
-uint64_t HdfsTextTableWriter::default_block_size() const {
- return compressor_.get() == NULL ? 0 : COMPRESSED_BLOCK_SIZE;
-}
+uint64_t HdfsTextTableWriter::default_block_size() const { return 0; }
-string HdfsTextTableWriter::file_extension() const {
- if (compressor_.get() == NULL) return "";
- return compressor_->file_extension();
-}
+string HdfsTextTableWriter::file_extension() const { return ""; }
Status HdfsTextTableWriter::AppendRows(
RowBatch* batch, const vector<int32_t>& row_group_indices, bool* new_file) {
@@ -152,12 +119,7 @@ Status HdfsTextTableWriter::AppendRows(
}
*new_file = false;
- if (rowbatch_stringstream_.tellp() >= flush_size_) {
- RETURN_IF_ERROR(Flush());
-
- // If compressed, start a new file (compressed data is not splittable).
- *new_file = compressor_.get() != NULL;
- }
+ if (rowbatch_stringstream_.tellp() >= flush_size_) RETURN_IF_ERROR(Flush());
return Status::OK();
}
@@ -178,22 +140,9 @@ Status HdfsTextTableWriter::InitNewFile() {
Status HdfsTextTableWriter::Flush() {
string rowbatch_string = rowbatch_stringstream_.str();
rowbatch_stringstream_.str(string());
- const uint8_t* uncompressed_data =
+ const uint8_t* data =
reinterpret_cast<const uint8_t*>(rowbatch_string.data());
- int64_t uncompressed_len = rowbatch_string.size();
- const uint8_t* data = uncompressed_data;
- int64_t len = uncompressed_len;
-
- if (compressor_.get() != NULL) {
- SCOPED_TIMER(parent_->compress_timer());
- uint8_t* compressed_data;
- int64_t compressed_len;
- RETURN_IF_ERROR(compressor_->ProcessBlock(false,
- uncompressed_len, uncompressed_data,
- &compressed_len, &compressed_data));
- data = compressed_data;
- len = compressed_len;
- }
+ int64_t len = rowbatch_string.size();
{
SCOPED_TIMER(parent_->hdfs_write_timer());
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/be/src/exec/hdfs-text-table-writer.h
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-text-table-writer.h b/be/src/exec/hdfs-text-table-writer.h
index 589ed23..e2f6135 100644
--- a/be/src/exec/hdfs-text-table-writer.h
+++ b/be/src/exec/hdfs-text-table-writer.h
@@ -87,15 +87,6 @@ class HdfsTextTableWriter : public HdfsTableWriter {
/// Stringstream to buffer output. The stream is cleared between HDFS
/// Write calls to allow for the internal buffers to be reused.
std::stringstream rowbatch_stringstream_;
-
- /// Compression codec.
- THdfsCompression::type codec_;
-
- /// Compressor if compression is enabled.
- boost::scoped_ptr<Codec> compressor_;
-
- /// Memory pool to use with compressor_.
- boost::scoped_ptr<MemPool> mem_pool_;
};
}
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/be/src/service/query-options-test.cc
----------------------------------------------------------------------
diff --git a/be/src/service/query-options-test.cc b/be/src/service/query-options-test.cc
index e5bc48d..b9bda60 100644
--- a/be/src/service/query-options-test.cc
+++ b/be/src/service/query-options-test.cc
@@ -208,8 +208,6 @@ TEST(QueryOptions, SetEnumOptions) {
TParquetFallbackSchemaResolution, (POSITION, NAME)), true);
TestEnumCase(options, CASE(parquet_array_resolution, TParquetArrayResolution,
(THREE_LEVEL, TWO_LEVEL, TWO_LEVEL_THEN_THREE_LEVEL)), true);
- TestEnumCase(options, CASE(seq_compression_mode, THdfsSeqCompressionMode,
- (BLOCK, RECORD)), false);
TestEnumCase(options, CASE(compression_codec, THdfsCompression,
(NONE, GZIP, BZIP2, DEFAULT, SNAPPY, SNAPPY_BLOCKED)), false);
#undef CASE
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/be/src/service/query-options.cc
----------------------------------------------------------------------
diff --git a/be/src/service/query-options.cc b/be/src/service/query-options.cc
index 2e3415f..1063fef 100644
--- a/be/src/service/query-options.cc
+++ b/be/src/service/query-options.cc
@@ -226,25 +226,9 @@ Status impala::SetQueryOption(const string& key, const string& value,
case TImpalaQueryOptions::NUM_SCANNER_THREADS:
query_options->__set_num_scanner_threads(atoi(value.c_str()));
break;
- case TImpalaQueryOptions::ALLOW_UNSUPPORTED_FORMATS:
- query_options->__set_allow_unsupported_formats(
- iequals(value, "true") || iequals(value, "1"));
- break;
case TImpalaQueryOptions::DEBUG_ACTION:
query_options->__set_debug_action(value.c_str());
break;
- case TImpalaQueryOptions::SEQ_COMPRESSION_MODE: {
- if (iequals(value, "block")) {
- query_options->__set_seq_compression_mode(THdfsSeqCompressionMode::BLOCK);
- } else if (iequals(value, "record")) {
- query_options->__set_seq_compression_mode(THdfsSeqCompressionMode::RECORD);
- } else {
- stringstream ss;
- ss << "Invalid sequence file compression mode: " << value;
- return Status(ss.str());
- }
- break;
- }
case TImpalaQueryOptions::COMPRESSION_CODEC: {
if (iequals(value, "none")) {
query_options->__set_compression_codec(THdfsCompression::NONE);
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/be/src/service/query-options.h
----------------------------------------------------------------------
diff --git a/be/src/service/query-options.h b/be/src/service/query-options.h
index fce042c..01f6e74 100644
--- a/be/src/service/query-options.h
+++ b/be/src/service/query-options.h
@@ -44,8 +44,7 @@ typedef std::unordered_map<string, beeswax::TQueryOptionLevel::type>
TImpalaQueryOptions::ALLOW_ERASURE_CODED_FILES + 1);\
REMOVED_QUERY_OPT_FN(abort_on_default_limit_exceeded, ABORT_ON_DEFAULT_LIMIT_EXCEEDED)\
QUERY_OPT_FN(abort_on_error, ABORT_ON_ERROR, TQueryOptionLevel::REGULAR)\
- QUERY_OPT_FN(allow_unsupported_formats, ALLOW_UNSUPPORTED_FORMATS,\
- TQueryOptionLevel::DEPRECATED)\
+ REMOVED_QUERY_OPT_FN(allow_unsupported_formats, ALLOW_UNSUPPORTED_FORMATS)\
QUERY_OPT_FN(batch_size, BATCH_SIZE, TQueryOptionLevel::DEVELOPMENT)\
QUERY_OPT_FN(debug_action, DEBUG_ACTION, TQueryOptionLevel::DEVELOPMENT)\
REMOVED_QUERY_OPT_FN(default_order_by_limit, DEFAULT_ORDER_BY_LIMIT)\
@@ -74,7 +73,7 @@ typedef std::unordered_map<string, beeswax::TQueryOptionLevel::type>
QUERY_OPT_FN(buffer_pool_limit, BUFFER_POOL_LIMIT, TQueryOptionLevel::ADVANCED)\
QUERY_OPT_FN(appx_count_distinct, APPX_COUNT_DISTINCT, TQueryOptionLevel::ADVANCED)\
QUERY_OPT_FN(disable_unsafe_spills, DISABLE_UNSAFE_SPILLS, TQueryOptionLevel::REGULAR)\
- QUERY_OPT_FN(seq_compression_mode, SEQ_COMPRESSION_MODE, TQueryOptionLevel::REGULAR)\
+ REMOVED_QUERY_OPT_FN(seq_compression_mode, SEQ_COMPRESSION_MODE)\
QUERY_OPT_FN(exec_single_node_rows_threshold, EXEC_SINGLE_NODE_ROWS_THRESHOLD,\
TQueryOptionLevel::ADVANCED)\
QUERY_OPT_FN(optimize_partition_key_scans, OPTIMIZE_PARTITION_KEY_SCANS,\
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/common/thrift/ImpalaInternalService.thrift
----------------------------------------------------------------------
diff --git a/common/thrift/ImpalaInternalService.thrift b/common/thrift/ImpalaInternalService.thrift
index 6780138..120aebc 100644
--- a/common/thrift/ImpalaInternalService.thrift
+++ b/common/thrift/ImpalaInternalService.thrift
@@ -101,7 +101,6 @@ struct TQueryOptions {
5: optional i32 num_nodes = NUM_NODES_ALL
6: optional i64 max_scan_range_length = 0
7: optional i32 num_scanner_threads = 0
- 9: optional bool allow_unsupported_formats = 0
11: optional string debug_action = ""
12: optional i64 mem_limit = 0
14: optional CatalogObjects.THdfsCompression compression_codec
@@ -133,11 +132,6 @@ struct TQueryOptions {
// has no plan hints, and at least one table is missing relevant stats.
29: optional bool disable_unsafe_spills = 0
- // Mode for compression; RECORD, or BLOCK
- // This field only applies for certain file types and is ignored
- // by all other file types.
- 30: optional CatalogObjects.THdfsSeqCompressionMode seq_compression_mode
-
// If the number of rows that are processed for a single query is below the
// threshold, it will be executed on the coordinator only with codegen disabled
31: optional i32 exec_single_node_rows_threshold = 100
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/common/thrift/ImpalaService.thrift
----------------------------------------------------------------------
diff --git a/common/thrift/ImpalaService.thrift b/common/thrift/ImpalaService.thrift
index 529af04..665144f 100644
--- a/common/thrift/ImpalaService.thrift
+++ b/common/thrift/ImpalaService.thrift
@@ -72,8 +72,7 @@ enum TImpalaQueryOptions {
// Number of scanner threads.
NUM_SCANNER_THREADS,
- // If true, Impala will try to execute on file formats that are not fully supported yet
- ALLOW_UNSUPPORTED_FORMATS,
+ ALLOW_UNSUPPORTED_FORMATS, // Removed
DEFAULT_ORDER_BY_LIMIT, // Removed
@@ -110,8 +109,7 @@ enum TImpalaQueryOptions {
// Leave blank to use default.
COMPRESSION_CODEC,
- // Mode for compressing sequence files; either BLOCK, RECORD, or DEFAULT
- SEQ_COMPRESSION_MODE,
+ SEQ_COMPRESSION_MODE, // Removed
// HBase scan query option. If set and > 0, HBASE_CACHING is the value for
// "hbase.client.Scan.setCaching()" when querying HBase table. Otherwise, use backend
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
----------------------------------------------------------------------
diff --git a/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java b/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
index 54ad57f..b671a1e 100644
--- a/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
+++ b/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
@@ -390,7 +390,6 @@ public class PlannerTestBase extends FrontendTestBase {
protected TQueryOptions defaultQueryOptions() {
TQueryOptions options = new TQueryOptions();
options.setExplain_level(TExplainLevel.STANDARD);
- options.setAllow_unsupported_formats(true);
options.setExec_single_node_rows_threshold(0);
return options;
}
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/testdata/bad_avro_snap/README
----------------------------------------------------------------------
diff --git a/testdata/bad_avro_snap/README b/testdata/bad_avro_snap/README
index 6271967..71eb398 100644
--- a/testdata/bad_avro_snap/README
+++ b/testdata/bad_avro_snap/README
@@ -1,6 +1,6 @@
String Data
-----------
-Created by modifying Impala's HdfsAvroTableWriter.
+Created by modifying Impala's HdfsAvroTableWriter(removed).
These files' schemas have a single nullable string column 's'.
@@ -14,7 +14,7 @@ truncated_string.avro: contains one value, which is missing the last byte.
Float Data
----------
-Created by modifying Impala's HdfsAvroTableWriter.
+Created by modifying Impala's HdfsAvroTableWriter(removed).
These files' schemas have a single nullable float column 'c1'.
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/testdata/workloads/functional-query/queries/QueryTest/avro-writer.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/avro-writer.test b/testdata/workloads/functional-query/queries/QueryTest/avro-writer.test
deleted file mode 100644
index 6dc0899..0000000
--- a/testdata/workloads/functional-query/queries/QueryTest/avro-writer.test
+++ /dev/null
@@ -1,43 +0,0 @@
-====
----- QUERY
-drop table if exists __avro_write;
-====
----- QUERY
-SET COMPRESSION_CODEC=NONE;
-create table __avro_write (i int, s string, d double)
-stored as AVRO
-TBLPROPERTIES ('avro.schema.literal'='{
- "name": "my_record",
- "type": "record",
- "fields": [
- {"name":"i", "type":["int", "null"]},
- {"name":"s", "type":["string", "null"]},
- {"name":"d", "type":["double", "null"]}]}');
-====
----- QUERY
-SET COMPRESSION_CODEC=NONE;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __avro_write select 0, "a", 1.1;
-====
----- QUERY
-SET COMPRESSION_CODEC=SNAPPY;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __avro_write select 1, "b", 2.2;
-====
----- QUERY
-select * from __avro_write;
----- RESULTS
-0,'a',1.1
-1,'b',2.2
----- TYPES
-INT,STRING,DOUBLE
-====
----- QUERY
-SET ALLOW_UNSUPPORTED_FORMATS=0;
-insert into __avro_write select 1, "b", 2.2;
----- CATCH
-Writing to table format AVRO is not supported. Use query option ALLOW_UNSUPPORTED_FORMATS
-====
----- QUERY
-drop table __avro_write;
-====
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test b/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
deleted file mode 100644
index 7e2363f..0000000
--- a/testdata/workloads/functional-query/queries/QueryTest/seq-writer.test
+++ /dev/null
@@ -1,308 +0,0 @@
-====
----- QUERY
-SET COMPRESSION_CODEC=NONE;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-SET SEQ_COMPRESSION_MODE=BLOCK;
-create table __seq_write (i int, s string, d double)
-stored as SEQUENCEFILE;
-====
----- QUERY
-SET COMPRESSION_CODEC=NONE;
-SET SEQ_COMPRESSION_MODE=BLOCK;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __seq_write select 0, "a", 1.1;
-====
----- QUERY
-SET COMPRESSION_CODEC=DEFAULT;
-SET SEQ_COMPRESSION_MODE=BLOCK;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __seq_write values (1, "b", 2.2);
-====
----- QUERY
-SET COMPRESSION_CODEC=SNAPPY;
-SET SEQ_COMPRESSION_MODE=BLOCK;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __seq_write values (2, "c", 3.3);
-====
----- QUERY
-SET COMPRESSION_CODEC=SNAPPY_BLOCKED;
-SET SEQ_COMPRESSION_MODE=BLOCK;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __seq_write values (3, "d", 4.4);
-====
----- QUERY
-SET COMPRESSION_CODEC=GZIP;
-SET SEQ_COMPRESSION_MODE=BLOCK;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __seq_write values (4, "e", 5.5);
-====
----- QUERY
-SET COMPRESSION_CODEC=NONE;
-SET SEQ_COMPRESSION_MODE=RECORD;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __seq_write select 5, "a", 1.1;
-====
----- QUERY
-SET COMPRESSION_CODEC=DEFAULT;
-SET SEQ_COMPRESSION_MODE=RECORD;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __seq_write values (6, "b", 2.2);
-====
----- QUERY
-SET COMPRESSION_CODEC=SNAPPY;
-SET SEQ_COMPRESSION_MODE=RECORD;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __seq_write values (7, "c", 3.3);
-====
----- QUERY
-SET COMPRESSION_CODEC=SNAPPY_BLOCKED;
-SET SEQ_COMPRESSION_MODE=RECORD;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __seq_write values (8, "d", 4.4);
-====
----- QUERY
-SET COMPRESSION_CODEC=GZIP;
-SET SEQ_COMPRESSION_MODE=RECORD;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __seq_write values (9, "e", 5.5);
-====
----- QUERY
-SET ALLOW_UNSUPPORTED_FORMATS=0;
-insert into __seq_write values (4, "e", 5.5);
----- CATCH
-Writing to table format SEQUENCE_FILE is not supported. Use query option
-====
----- QUERY
-select * from __seq_write;
----- RESULTS
-0,'a',1.1
-1,'b',2.2
-2,'c',3.3
-3,'d',4.4
-4,'e',5.5
-5,'a',1.1
-6,'b',2.2
-7,'c',3.3
-8,'d',4.4
-9,'e',5.5
----- TYPES
-INT,STRING,DOUBLE
-====
----- QUERY
-# IMPALA-3079: Create a table containing larger seq files with NONE+RECORD and then read
-# it back
-SET COMPRESSION_CODEC=NONE;
-SET SEQ_COMPRESSION_MODE=RECORD;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-create table store_sales_seq_none_rec like tpcds_parquet.store_sales
-stored as SEQUENCEFILE;
-insert into store_sales_seq_none_rec partition(ss_sold_date_sk)
-select * from tpcds_parquet.store_sales
-where (ss_sold_date_sk between 2451175 and 2451200) or
- (ss_sold_date_sk is null and ss_sold_time_sk > 60000);
-====
----- QUERY
-select count(*) from store_sales_seq_none_rec;
----- RESULTS
-60091
----- TYPES
-BIGINT
-====
----- QUERY
-# IMPALA-3079: Create a table containing larger seq files with DEFAULT+RECORD and then
-# read it back
-SET COMPRESSION_CODEC=DEFAULT;
-SET SEQ_COMPRESSION_MODE=RECORD;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-create table store_sales_seq_def_rec like tpcds_parquet.store_sales
-stored as SEQUENCEFILE;
-insert into store_sales_seq_def_rec partition(ss_sold_date_sk)
-select * from tpcds_parquet.store_sales
-where (ss_sold_date_sk between 2451175 and 2451200) or
- (ss_sold_date_sk is null and ss_sold_time_sk > 60000);
-====
----- QUERY
-select count(*) from store_sales_seq_def_rec;
----- RESULTS
-60091
----- TYPES
-BIGINT
-====
----- QUERY
-# IMPALA-3079: Create a table containing larger seq files with SNAPPY_BLOCKED+RECORD and
-# then read it back
-SET COMPRESSION_CODEC=SNAPPY_BLOCKED;
-SET SEQ_COMPRESSION_MODE=RECORD;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-create table store_sales_seq_snapb_rec like tpcds_parquet.store_sales
-stored as SEQUENCEFILE;
-insert into store_sales_seq_snapb_rec partition(ss_sold_date_sk)
-select * from tpcds_parquet.store_sales
-where (ss_sold_date_sk between 2451175 and 2451200) or
- (ss_sold_date_sk is null and ss_sold_time_sk > 60000);
-====
----- QUERY
-select count(*) from store_sales_seq_snapb_rec;
----- RESULTS
-60091
----- TYPES
-BIGINT
-====
----- QUERY
-# IMPALA-3079: Create a table containing larger seq files with SNAPPY+RECORD and then read
-# it back
-SET COMPRESSION_CODEC=SNAPPY;
-SET SEQ_COMPRESSION_MODE=RECORD;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-create table store_sales_seq_snap_rec like tpcds_parquet.store_sales
-stored as SEQUENCEFILE;
-insert into store_sales_seq_snap_rec partition(ss_sold_date_sk)
-select * from tpcds_parquet.store_sales
-where (ss_sold_date_sk between 2451175 and 2451200) or
- (ss_sold_date_sk is null and ss_sold_time_sk > 60000);
-====
----- QUERY
-select count(*) from store_sales_seq_snap_rec;
----- RESULTS
-60091
----- TYPES
-BIGINT
-====
----- QUERY
-# IMPALA-3079: Create a table containing larger seq files with GZIP+RECORD and then read
-# it back
-SET COMPRESSION_CODEC=GZIP;
-SET SEQ_COMPRESSION_MODE=RECORD;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-create table store_sales_seq_gzip_rec like tpcds_parquet.store_sales
-stored as SEQUENCEFILE;
-insert into store_sales_seq_gzip_rec partition(ss_sold_date_sk)
-select * from tpcds_parquet.store_sales
-where (ss_sold_date_sk between 2451175 and 2451200) or
- (ss_sold_date_sk is null and ss_sold_time_sk > 60000);
-====
----- QUERY
-select count(*) from store_sales_seq_gzip_rec;
----- RESULTS
-60091
----- TYPES
-BIGINT
-====
----- QUERY
-# IMPALA-3079: Create a table containing larger seq files with NONE+BLOCK and then read it
-# back
-SET COMPRESSION_CODEC=NONE;
-SET SEQ_COMPRESSION_MODE=BLOCK;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-create table store_sales_seq_none_block like tpcds_parquet.store_sales
-stored as SEQUENCEFILE;
-insert into store_sales_seq_none_block partition(ss_sold_date_sk)
-select * from tpcds_parquet.store_sales
-where (ss_sold_date_sk between 2451175 and 2451200) or
- (ss_sold_date_sk is null and ss_sold_time_sk > 60000);
-====
----- QUERY
-select count(*) from store_sales_seq_none_block;
----- RESULTS
-60091
----- TYPES
-BIGINT
-====
----- QUERY
-# IMPALA-3079: Create a table containing larger seq files with DEFAULT+BLOCK and then read
-# it back
-SET COMPRESSION_CODEC=DEFAULT;
-SET SEQ_COMPRESSION_MODE=BLOCK;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-create table store_sales_seq_def_block like tpcds_parquet.store_sales
-stored as SEQUENCEFILE;
-insert into store_sales_seq_def_block partition(ss_sold_date_sk)
-select * from tpcds_parquet.store_sales
-where (ss_sold_date_sk between 2451175 and 2451200) or
- (ss_sold_date_sk is null and ss_sold_time_sk > 60000);
-====
----- QUERY
-select count(*) from store_sales_seq_def_block;
----- RESULTS
-60091
----- TYPES
-BIGINT
-====
----- QUERY
-# IMPALA-3079: Create a table containing larger seq files with SNAPPY_BLOCKED+BLOCK and
-# then read it back
-SET COMPRESSION_CODEC=SNAPPY_BLOCKED;
-SET SEQ_COMPRESSION_MODE=BLOCK;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-create table store_sales_seq_snapb_block like tpcds_parquet.store_sales
-stored as SEQUENCEFILE;
-insert into store_sales_seq_snapb_block partition(ss_sold_date_sk)
-select * from tpcds_parquet.store_sales
-where (ss_sold_date_sk between 2451175 and 2451200) or
- (ss_sold_date_sk is null and ss_sold_time_sk > 60000);
-====
----- QUERY
-select count(*) from store_sales_seq_snapb_block;
----- RESULTS
-60091
----- TYPES
-BIGINT
-====
----- QUERY
-# IMPALA-3079: Create a table containing larger seq files with SNAPPY+BLOCK and then read
-# it back
-SET COMPRESSION_CODEC=SNAPPY;
-SET SEQ_COMPRESSION_MODE=BLOCK;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-create table store_sales_seq_snap_block like tpcds_parquet.store_sales
-stored as SEQUENCEFILE;
-insert into store_sales_seq_snap_block partition(ss_sold_date_sk)
-select * from tpcds_parquet.store_sales
-where (ss_sold_date_sk between 2451175 and 2451200) or
- (ss_sold_date_sk is null and ss_sold_time_sk > 60000);
-====
----- QUERY
-select count(*) from store_sales_seq_snap_block;
----- RESULTS
-60091
----- TYPES
-BIGINT
-====
----- QUERY
-# IMPALA-3079: Create a table containing larger seq files with GZIP+BLOCK and then read it
-# back
-SET COMPRESSION_CODEC=GZIP;
-SET SEQ_COMPRESSION_MODE=BLOCK;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-create table store_sales_seq_gzip_block like tpcds_parquet.store_sales
-stored as SEQUENCEFILE;
-insert into store_sales_seq_gzip_block partition(ss_sold_date_sk)
-select * from tpcds_parquet.store_sales
-where (ss_sold_date_sk between 2451175 and 2451200) or
- (ss_sold_date_sk is null and ss_sold_time_sk > 60000);
-====
----- QUERY
-select count(*) from store_sales_seq_gzip_block;
----- RESULTS
-60091
----- TYPES
-BIGINT
-====
----- QUERY
-# IMPALA-5407: Create a table containing seq files with GZIP+RECORD. If the number of
-# impalad workers is three, three files will be created, two of which are large enough
-# (> 64MB) to force multiple flushes. Make sure that the files have been created
-# successfully.
-SET COMPRESSION_CODEC=GZIP;
-SET SEQ_COMPRESSION_MODE=RECORD;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-create table catalog_sales_seq_gzip_rec like tpcds.catalog_sales stored as SEQUENCEFILE;
-insert into catalog_sales_seq_gzip_rec select * from tpcds.catalog_sales;
-====
----- QUERY
-select count(*) from catalog_sales_seq_gzip_rec;
----- RESULTS
-1441548
----- TYPES
-BIGINT
-====
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/testdata/workloads/functional-query/queries/QueryTest/set.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/set.test b/testdata/workloads/functional-query/queries/QueryTest/set.test
index 5a2c56a..ffb53a1 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/set.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/set.test
@@ -8,7 +8,6 @@ set buffer_pool_limit=7;
set all;
---- RESULTS: VERIFY_IS_SUBSET
'ABORT_ON_ERROR','0','REGULAR'
-'ALLOW_UNSUPPORTED_FORMATS','0','DEPRECATED'
'BATCH_SIZE','0','DEVELOPMENT'
'BUFFER_POOL_LIMIT','','ADVANCED'
'DEBUG_ACTION','','DEVELOPMENT'
@@ -34,7 +33,6 @@ set explain_level=3;
set all;
---- RESULTS: VERIFY_IS_SUBSET
'ABORT_ON_ERROR','0','REGULAR'
-'ALLOW_UNSUPPORTED_FORMATS','0','DEPRECATED'
'BATCH_SIZE','0','DEVELOPMENT'
'BUFFER_POOL_LIMIT','','ADVANCED'
'DEBUG_ACTION','','DEVELOPMENT'
@@ -60,7 +58,6 @@ set explain_level='0';
set all;
---- RESULTS: VERIFY_IS_SUBSET
'ABORT_ON_ERROR','0','REGULAR'
-'ALLOW_UNSUPPORTED_FORMATS','0','DEPRECATED'
'BATCH_SIZE','0','DEVELOPMENT'
'BUFFER_POOL_LIMIT','','ADVANCED'
'DEBUG_ACTION','','DEVELOPMENT'
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/testdata/workloads/functional-query/queries/QueryTest/text-writer.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/text-writer.test b/testdata/workloads/functional-query/queries/QueryTest/text-writer.test
deleted file mode 100644
index 89cd730..0000000
--- a/testdata/workloads/functional-query/queries/QueryTest/text-writer.test
+++ /dev/null
@@ -1,47 +0,0 @@
-====
----- QUERY
-drop table if exists __text_write;
-====
----- QUERY
-create table __text_write (i int, s string, d double);
-====
----- QUERY
-SET COMPRESSION_CODEC=NONE;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __text_write select 0, "a", 1.1;
-====
----- QUERY
-SET COMPRESSION_CODEC=DEFAULT;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __text_write values (1, "b", 2.2);
-====
----- QUERY
-SET COMPRESSION_CODEC=SNAPPY;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __text_write values (2, "c", 3.3);
-====
----- QUERY
-SET COMPRESSION_CODEC=GZIP;
-SET ALLOW_UNSUPPORTED_FORMATS=1;
-insert into __text_write values (3, "d", 4.4);
-====
----- QUERY
-SET COMPRESSION_CODEC=GZIP;
-SET ALLOW_UNSUPPORTED_FORMATS=0;
-insert into __text_write values (3, "d", 4.4);
----- CATCH
-Writing to compressed text table is not supported.
-====
----- QUERY
-select * from __text_write;
----- RESULTS
-0,'a',1.1
-1,'b',2.2
-2,'c',3.3
-3,'d',4.4
----- TYPES
-INT,STRING,DOUBLE
-====
----- QUERY
-drop table __text_write;
-====
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/testdata/workloads/functional-query/queries/QueryTest/unsupported-writers.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/unsupported-writers.test b/testdata/workloads/functional-query/queries/QueryTest/unsupported-writers.test
new file mode 100644
index 0000000..68f355f
--- /dev/null
+++ b/testdata/workloads/functional-query/queries/QueryTest/unsupported-writers.test
@@ -0,0 +1,77 @@
+====
+---- QUERY
+create table __text_write (i int, s string, d double);
+====
+---- QUERY
+SET COMPRESSION_CODEC=NONE;
+insert into __text_write select 0, "a", 1.1;
+====
+---- QUERY
+SET COMPRESSION_CODEC=GZIP;
+insert into __text_write values (3, "d", 4.4);
+---- CATCH
+Writing to compressed text table is not supported.
+====
+---- QUERY
+select * from __text_write;
+---- RESULTS
+0,'a',1.1
+---- TYPES
+INT,STRING,DOUBLE
+====
+---- QUERY
+create table __avro_write (i int, s string, d double)
+stored as AVRO
+TBLPROPERTIES ('avro.schema.literal'='{
+ "name": "my_record",
+ "type": "record",
+ "fields": [
+ {"name":"i", "type":["int", "null"]},
+ {"name":"s", "type":["string", "null"]},
+ {"name":"d", "type":["double", "null"]}]}');
+====
+---- QUERY
+insert into __avro_write select 1, "b", 2.2;
+---- CATCH
+Writing to table format AVRO is not supported.
+====
+---- QUERY
+create table __seq_write (i int, s string, d double)
+stored as SEQUENCEFILE;
+====
+---- QUERY
+insert into __seq_write values (4, "e", 5.5);
+---- CATCH
+Writing to table format SEQUENCE_FILE is not supported.
+====
+---- QUERY
+# Test writing to mixed format table containing partitions in both supported and
+# unsupported formats where writing to the partition with supported format should succeed.
+# Create a table containing both text(supported) and avro(unsupported) partitions.
+create table __mixed_format_write (id int) partitioned by (part int);
+====
+---- QUERY
+insert into __mixed_format_write partition(part=2000) values(1);
+====
+---- QUERY
+insert into __mixed_format_write partition(part=2001) values(2);
+====
+---- QUERY
+alter table __mixed_format_write partition (part=2001) set fileformat AVRO;
+====
+---- QUERY
+insert into __mixed_format_write partition(part=2000) values(3);
+====
+---- QUERY
+insert into __mixed_format_write partition(part=2001) values(4);
+---- CATCH
+Writing to table format AVRO is not supported.
+====
+---- QUERY
+select id, part from __mixed_format_write where part = 2000;
+---- RESULTS
+1,2000
+3,2000
+---- TYPES
+INT,INT
+====
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/tests/common/test_dimensions.py
----------------------------------------------------------------------
diff --git a/tests/common/test_dimensions.py b/tests/common/test_dimensions.py
index 785cfa9..0460ea7 100644
--- a/tests/common/test_dimensions.py
+++ b/tests/common/test_dimensions.py
@@ -108,19 +108,6 @@ def create_parquet_dimension(workload):
return ImpalaTestDimension('table_format',
TableFormatInfo.create_from_string(dataset, 'parquet/none'))
-# Available Exec Options:
-#01: abort_on_error (bool)
-#02 max_errors (i32)
-#03: disable_codegen (bool)
-#04: batch_size (i32)
-#05: return_as_ascii (bool)
-#06: num_nodes (i32)
-#07: max_scan_range_length (i64)
-#08: num_scanner_threads (i32)
-#09: max_io_buffers (i32)
-#10: allow_unsupported_formats (bool)
-#11: partition_agg (bool)
-
# Common sets of values for the exec option vectors
ALL_BATCH_SIZES = [0]
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/tests/hs2/test_hs2.py
----------------------------------------------------------------------
diff --git a/tests/hs2/test_hs2.py b/tests/hs2/test_hs2.py
index cd861e9..795f45c 100644
--- a/tests/hs2/test_hs2.py
+++ b/tests/hs2/test_hs2.py
@@ -89,11 +89,10 @@ class TestHS2(HS2TestSuite):
# Should be unchanged
assert vals2["SYNC_DDL"] == "0"
- # Verify that 'DEVELOPMENT' and 'DEPRECATED' options are not returned.
assert "MAX_ERRORS" in vals2
assert levels["MAX_ERRORS"] == "ADVANCED"
+ # Verify that 'DEVELOPMENT' options are not returned.
assert "DEBUG_ACTION" not in vals2
- assert "ALLOW_UNSUPPORTED_FORMATS" not in vals2
# Removed options should not be returned.
assert "MAX_IO_BUFFERS" not in vals2
@@ -101,7 +100,8 @@ class TestHS2(HS2TestSuite):
@needs_session()
def test_session_option_levels_via_set_all(self):
"""
- Tests the level of session options returned by a SET ALL query.
+ Tests the level of session options returned by a SET ALL query except DEPRECATED as we
+ currently do not have any of those left.
"""
vals, levels = self.get_session_options("SET ALL")
@@ -109,12 +109,10 @@ class TestHS2(HS2TestSuite):
assert "SYNC_DDL" in vals
assert "MAX_ERRORS" in vals
assert "DEBUG_ACTION" in vals
- assert "ALLOW_UNSUPPORTED_FORMATS" in vals
assert levels["COMPRESSION_CODEC"] == "REGULAR"
assert levels["SYNC_DDL"] == "REGULAR"
assert levels["MAX_ERRORS"] == "ADVANCED"
assert levels["DEBUG_ACTION"] == "DEVELOPMENT"
- assert levels["ALLOW_UNSUPPORTED_FORMATS"] == "DEPRECATED"
# Removed options should not be returned.
assert "MAX_IO_BUFFERS" not in vals
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/tests/metadata/test_partition_metadata.py
----------------------------------------------------------------------
diff --git a/tests/metadata/test_partition_metadata.py b/tests/metadata/test_partition_metadata.py
index 1d77aa5..d23e3f0 100644
--- a/tests/metadata/test_partition_metadata.py
+++ b/tests/metadata/test_partition_metadata.py
@@ -45,10 +45,7 @@ class TestPartitionMetadata(ImpalaTestSuite):
# compression codecs.
cls.ImpalaTestMatrix.add_constraint(lambda v:
(v.get_value('table_format').file_format in ('text', 'parquet') and
- v.get_value('table_format').compression_codec == 'none') or
- (v.get_value('table_format').file_format in ('seq', 'avro') and
- v.get_value('table_format').compression_codec == 'snap' and
- v.get_value('table_format').compression_type == 'block'))
+ v.get_value('table_format').compression_codec == 'none'))
@SkipIfLocal.hdfs_client # TODO: this dependency might not exist anymore
def test_multiple_partitions_same_location(self, vector, unique_database):
@@ -70,9 +67,6 @@ class TestPartitionMetadata(ImpalaTestSuite):
self.client.execute("alter table %s add partition (j=2) location '%s/p'"
% (FQ_TBL_NAME, TBL_LOCATION))
- # Allow unsupported avro and sequence file writer.
- self.client.execute("set allow_unsupported_formats=true")
-
# Insert some data. This will only update partition j=1 (IMPALA-1480).
self.client.execute("insert into table %s partition(j=1) select 1" % FQ_TBL_NAME)
# Refresh to update file metadata of both partitions
@@ -80,31 +74,19 @@ class TestPartitionMetadata(ImpalaTestSuite):
# The data will be read twice because each partition points to the same location.
data = self.execute_scalar("select sum(i), sum(j) from %s" % FQ_TBL_NAME)
- if file_format == 'avro':
- # Avro writer is broken and produces nulls. Only check partition column.
- assert data.split('\t')[1] == '3'
- else:
- assert data.split('\t') == ['2', '3']
+ assert data.split('\t') == ['2', '3']
self.client.execute("insert into %s partition(j) select 1, 1" % FQ_TBL_NAME)
self.client.execute("insert into %s partition(j) select 1, 2" % FQ_TBL_NAME)
self.client.execute("refresh %s" % FQ_TBL_NAME)
data = self.execute_scalar("select sum(i), sum(j) from %s" % FQ_TBL_NAME)
- if file_format == 'avro':
- # Avro writer is broken and produces nulls. Only check partition column.
- assert data.split('\t')[1] == '9'
- else:
- assert data.split('\t') == ['6', '9']
+ assert data.split('\t') == ['6', '9']
# Force all scan ranges to be on the same node. It should produce the same
# result as above. See IMPALA-5412.
self.client.execute("set num_nodes=1")
data = self.execute_scalar("select sum(i), sum(j) from %s" % FQ_TBL_NAME)
- if file_format == 'avro':
- # Avro writer is broken and produces nulls. Only check partition column.
- assert data.split('\t')[1] == '9'
- else:
- assert data.split('\t') == ['6', '9']
+ assert data.split('\t') == ['6', '9']
@SkipIfS3.hive
@SkipIfADLS.hive
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/tests/query_test/test_compressed_formats.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_compressed_formats.py b/tests/query_test/test_compressed_formats.py
index 694cfe9..2896632 100644
--- a/tests/query_test/test_compressed_formats.py
+++ b/tests/query_test/test_compressed_formats.py
@@ -129,71 +129,25 @@ class TestCompressedFormats(ImpalaTestSuite):
finally:
call(["hive", "-e", drop_cmd]);
-class TestTableWriters(ImpalaTestSuite):
+class TestUnsupportedTableWriters(ImpalaTestSuite):
@classmethod
def get_workload(cls):
return 'functional-query'
@classmethod
def add_test_dimensions(cls):
- super(TestTableWriters, cls).add_test_dimensions()
+ super(TestUnsupportedTableWriters, cls).add_test_dimensions()
cls.ImpalaTestMatrix.add_dimension(create_single_exec_option_dimension())
- # This class tests many formats, but doesn't use the contraints
- # Each format is tested within one test file, we constrain to text/none
- # as each test file only needs to be run once.
+ # This class tests different formats, but doesn't use constraints.
+ # The constraint added below is only to make sure that the test file runs once.
cls.ImpalaTestMatrix.add_constraint(lambda v:
(v.get_value('table_format').file_format =='text' and
v.get_value('table_format').compression_codec == 'none'))
- def test_seq_writer(self, vector, unique_database):
- self.run_test_case('QueryTest/seq-writer', vector, unique_database)
-
- @SkipIfS3.hive
- @SkipIfADLS.hive
- @SkipIfIsilon.hive
- @SkipIfLocal.hive
- def test_seq_writer_hive_compatibility(self, vector, unique_database):
- self.client.execute('set ALLOW_UNSUPPORTED_FORMATS=1')
- # Write sequence files with different compression codec/compression mode and then read
- # it back in Impala and Hive.
- # Note that we don't test snappy here as the snappy codec used by Impala does not seem
- # to be fully compatible with the snappy codec used by Hive.
- for comp_codec, comp_mode in [('NONE', 'RECORD'), ('NONE', 'BLOCK'),
- ('DEFAULT', 'RECORD'), ('DEFAULT', 'BLOCK'),
- ('GZIP', 'RECORD'), ('GZIP', 'BLOCK')]:
- table_name = '%s.seq_tbl_%s_%s' % (unique_database, comp_codec, comp_mode)
- self.client.execute('set COMPRESSION_CODEC=%s' % comp_codec)
- self.client.execute('set SEQ_COMPRESSION_MODE=%s' % comp_mode)
- self.client.execute('create table %s like functional.zipcode_incomes stored as '
- 'sequencefile' % table_name)
- # Write sequence file of size greater than 4K
- self.client.execute('insert into %s select * from functional.zipcode_incomes where '
- 'zip >= "5"' % table_name)
- # Write sequence file of size less than 4K
- self.client.execute('insert into %s select * from functional.zipcode_incomes where '
- 'zip="00601"' % table_name)
-
- count_query = 'select count(*) from %s' % table_name
-
- # Read it back in Impala
- output = self.client.execute(count_query)
- assert '16541' == output.get_data()
- # Read it back in Hive
- # Note that username is passed in for the sake of remote cluster tests. The default
- # HDFS user is typically 'hdfs', and this is needed to run a count() operation using
- # hive. For local mini clusters, the usename can be anything. See IMPALA-5413.
- output = self.run_stmt_in_hive(count_query, username='hdfs')
- assert '16541' == output.split('\n')[1]
-
- def test_avro_writer(self, vector):
- self.run_test_case('QueryTest/avro-writer', vector)
-
- def test_text_writer(self, vector):
- # TODO debug this test.
- # This caused by a zlib failure. Suspected cause is too small a buffer
- # passed to zlib for compression; similar to IMPALA-424
- pytest.skip()
- self.run_test_case('QueryTest/text-writer', vector)
+ def test_error_message(self, vector, unique_database):
+ # Tests that an appropriate error message is displayed for unsupported writers like
+ # compressed text, avro and sequence.
+ self.run_test_case('QueryTest/unsupported-writers', vector, unique_database)
@pytest.mark.execute_serially
class TestLargeCompressedFile(ImpalaTestSuite):
http://git-wip-us.apache.org/repos/asf/impala/blob/30e82c63/tests/shell/test_shell_interactive.py
----------------------------------------------------------------------
diff --git a/tests/shell/test_shell_interactive.py b/tests/shell/test_shell_interactive.py
index fe631cf..eac9d27 100755
--- a/tests/shell/test_shell_interactive.py
+++ b/tests/shell/test_shell_interactive.py
@@ -389,11 +389,10 @@ class TestImpalaShellInteractive(object):
assert "APPX_COUNT_DISTINCT" in result.stdout
assert "SUPPORT_START_OVER" in result.stdout
# Development, deprecated and removed options should not be shown.
+ # Note: there are currently no deprecated options
assert "Development Query Options:" not in result.stdout
- assert "DEBUG_ACTION" not in result.stdout
- assert "Deprecated Query Options:" not in result.stdout
- assert "ALLOW_UNSUPPORTED_FORMATS" not in result.stdout
- assert "MAX_IO_BUFFERS" not in result.stdout
+ assert "DEBUG_ACTION" not in result.stdout # Development option.
+ assert "MAX_IO_BUFFERS" not in result.stdout # Removed option.
shell2 = ImpalaShell()
shell2.send_cmd("set all")
@@ -401,7 +400,7 @@ class TestImpalaShellInteractive(object):
assert "Query options (defaults shown in []):" in result.stdout
assert "Advanced Query Options:" in result.stdout
assert "Development Query Options:" in result.stdout
- assert "Deprecated Query Options:" in result.stdout
+ assert "Deprecated Query Options:" not in result.stdout
advanced_part_start_idx = result.stdout.find("Advanced Query Options")
development_part_start_idx = result.stdout.find("Development Query Options")
deprecated_part_start_idx = result.stdout.find("Deprecated Query Options")
@@ -411,7 +410,6 @@ class TestImpalaShellInteractive(object):
assert "APPX_COUNT_DISTINCT" in advanced_part
assert "SUPPORT_START_OVER" in advanced_part
assert "DEBUG_ACTION" in development_part
- assert "ALLOW_UNSUPPORTED_FORMATS" in result.stdout[deprecated_part_start_idx:]
# Removed options should not be shown.
assert "MAX_IO_BUFFERS" not in result.stdout
[3/6] impala git commit: [DOCS] Clarification on admission control
and DDL statements
Posted by ta...@apache.org.
[DOCS] Clarification on admission control and DDL statements
Removed the confusing example and paragraphs.
Change-Id: I2e3e82bd34e88e7a13de1864aeb97f01023bc715
Reviewed-on: http://gerrit.cloudera.org:8080/10829
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/6f52ce10
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/6f52ce10
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/6f52ce10
Branch: refs/heads/master
Commit: 6f52ce10e302ed9d168731dc11db07aabbfa2e53
Parents: 83448f1
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Tue Jun 26 14:30:38 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue Jul 3 18:41:47 2018 +0000
----------------------------------------------------------------------
docs/topics/impala_admission.xml | 146 ++++++++++++++--------------------
1 file changed, 61 insertions(+), 85 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/impala/blob/6f52ce10/docs/topics/impala_admission.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_admission.xml b/docs/topics/impala_admission.xml
index 5de246b..317fa80 100644
--- a/docs/topics/impala_admission.xml
+++ b/docs/topics/impala_admission.xml
@@ -51,6 +51,11 @@ under the License.
not wait indefinitely, so that you can detect and correct <q>starvation</q> scenarios.
</p>
<p>
+ Queries, DML statements, and some DDL statements, including
+ <codeph>CREATE TABLE AS SELECT</codeph> and <codeph>COMPUTE
+ STATS</codeph> are affected by admission control.
+ </p>
+ <p>
Enable this feature if your cluster is
underutilized at some times and overutilized at others. Overutilization is indicated by performance
bottlenecks and queries being cancelled due to out-of-memory conditions, when those same queries are
@@ -765,38 +770,42 @@ impala.admission-control.pool-queue-timeout-ms.<varname>queue_name</varname></ph
<!-- End Config -->
<concept id="admission_guidelines">
-
- <title>Guidelines for Using Admission Control</title>
- <prolog>
- <metadata>
- <data name="Category" value="Planning"/>
- <data name="Category" value="Guidelines"/>
- <data name="Category" value="Best Practices"/>
- </metadata>
- </prolog>
-
- <conbody>
-
- <p>
- To see how admission control works for particular queries, examine the profile output for the query. This
- information is available through the <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname>
- immediately after running a query in the shell, on the <uicontrol>queries</uicontrol> page of the Impala
- debug web UI, or in the Impala log file (basic information at log level 1, more detailed information at log
- level 2). The profile output contains details about the admission decision, such as whether the query was
- queued or not and which resource pool it was assigned to. It also includes the estimated and actual memory
- usage for the query, so you can fine-tune the configuration for the memory limits of the resource pools.
- </p>
-
- <p>
- Remember that the limits imposed by admission control are <q>soft</q> limits.
- The decentralized nature of this mechanism means that each Impala node makes its own decisions about whether
- to allow queries to run immediately or to queue them. These decisions rely on information passed back and forth
- between nodes by the statestore service. If a sudden surge in requests causes more queries than anticipated to run
- concurrently, then throughput could decrease due to queries spilling to disk or contending for resources;
- or queries could be cancelled if they exceed the <codeph>MEM_LIMIT</codeph> setting while running.
- </p>
-
-<!--
+ <title>Guidelines for Using Admission Control</title>
+ <prolog>
+ <metadata>
+ <data name="Category" value="Planning"/>
+ <data name="Category" value="Guidelines"/>
+ <data name="Category" value="Best Practices"/>
+ </metadata>
+ </prolog>
+ <conbody>
+ <p>
+ To see how admission control works for particular queries, examine
+ the profile output for the query. This information is available
+ through the <codeph>PROFILE</codeph> statement in
+ <cmdname>impala-shell</cmdname> immediately after running a query in
+ the shell, on the <uicontrol>queries</uicontrol> page of the Impala
+ debug web UI, or in the Impala log file (basic information at log
+ level 1, more detailed information at log level 2). The profile output
+ contains details about the admission decision, such as whether the
+ query was queued or not and which resource pool it was assigned to. It
+ also includes the estimated and actual memory usage for the query, so
+ you can fine-tune the configuration for the memory limits of the
+ resource pools.
+ </p>
+ <p>
+ Remember that the limits imposed by admission control are
+ <q>soft</q> limits. The decentralized nature of this mechanism means
+ that each Impala node makes its own decisions about whether to allow
+ queries to run immediately or to queue them. These decisions rely on
+ information passed back and forth between nodes by the statestore
+ service. If a sudden surge in requests causes more queries than
+ anticipated to run concurrently, then throughput could decrease due to
+ queries spilling to disk or contending for resources; or queries could
+ be cancelled if they exceed the <codeph>MEM_LIMIT</codeph> setting
+ while running.
+ </p>
+ <!--
<p>
If you have trouble getting a query to run because its estimated memory usage is too high, you can override
the estimate by setting the <codeph>MEM_LIMIT</codeph> query option in <cmdname>impala-shell</cmdname>,
@@ -806,58 +815,25 @@ impala.admission-control.pool-queue-timeout-ms.<varname>queue_name</varname></ph
pre-allocated by the query.
</p>
-->
-
- <p>
- In <cmdname>impala-shell</cmdname>, you can also specify which resource pool to direct queries to by
- setting the <codeph>REQUEST_POOL</codeph> query option.
- </p>
-
- <p>
- The statements affected by the admission control feature are primarily queries, but also include statements
- that write data such as <codeph>INSERT</codeph> and <codeph>CREATE TABLE AS SELECT</codeph>. Most write
- operations in Impala are not resource-intensive, but inserting into a Parquet table can require substantial
- memory due to buffering intermediate data before writing out each Parquet data block. See
- <xref href="impala_parquet.xml#parquet_etl"/> for instructions about inserting data efficiently into
- Parquet tables.
- </p>
-
- <p>
- Although admission control does not scrutinize memory usage for other kinds of DDL statements, if a query
- is queued due to a limit on concurrent queries or memory usage, subsequent statements in the same session
- are also queued so that they are processed in the correct order:
- </p>
-
-<codeblock>-- This query could be queued to avoid out-of-memory at times of heavy load.
-select * from huge_table join enormous_table using (id);
--- If so, this subsequent statement in the same session is also queued
--- until the previous statement completes.
-drop table huge_table;
-</codeblock>
-
- <p>
- If you set up different resource pools for different users and groups, consider reusing any classifications
- you developed for use with Sentry security. See <xref href="impala_authorization.xml#authorization"/> for details.
- </p>
-
- <p>
- For details about all the Fair Scheduler configuration settings, see
- <xref keyref="FairScheduler">Fair Scheduler Configuration</xref>, in particular the tags such as <codeph><queue></codeph> and
- <codeph><aclSubmitApps></codeph> to map users and groups to particular resource pools (queues).
- </p>
-
-<!-- Wait a sec. We say admission control doesn't use RESERVATION_REQUEST_TIMEOUT at all.
- What's the real story here? Matt did refer to some timeout option that was
- available through the shell but not the DB-centric APIs.
-<p>
- Because you cannot override query options such as
- <codeph>RESERVATION_REQUEST_TIMEOUT</codeph>
- in a JDBC or ODBC application, consider configuring timeout periods
- on the application side to cancel queries that take
- too long due to being queued during times of high load.
-</p>
--->
- </conbody>
- </concept>
+ <p>
+ In <cmdname>impala-shell</cmdname>, you can also specify which
+ resource pool to direct queries to by setting the
+ <codeph>REQUEST_POOL</codeph> query option.
+ </p>
+ <p>
+ If you set up different resource pools for different users and
+ groups, consider reusing any classifications you developed for use
+ with Sentry security. See <xref
+ href="impala_authorization.xml#authorization"/> for details.
+ </p>
+ <p>
+ For details about all the Fair Scheduler configuration settings, see
+ <xref keyref="FairScheduler">Fair Scheduler Configuration</xref>, in
+ particular the tags such as <codeph><queue></codeph> and
+ <codeph><aclSubmitApps></codeph> to map users and groups to
+ particular resource pools (queues).
+ </p>
+ </conbody>
+ </concept>
</concept>
</concept>
-
[2/6] impala git commit: IMPALA-7237: handle hex digits in
ParseSmaps()
Posted by ta...@apache.org.
IMPALA-7237: handle hex digits in ParseSmaps()
Testing:
Manual. Added some temporary logging to print out which branch it took
with each line and confirmed it took the right branch for a line
starting with 'f'.
Change-Id: I3dad846dafb25b414bee1858eb63f3eda31d59ac
Reviewed-on: http://gerrit.cloudera.org:8080/10853
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/83448f1c
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/83448f1c
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/83448f1c
Branch: refs/heads/master
Commit: 83448f1c41d3f73ce0c0174c1725c78af3afd0d0
Parents: d03a2d6
Author: Tim Armstrong <ta...@cloudera.com>
Authored: Mon Jul 2 17:08:46 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Tue Jul 3 18:29:31 2018 +0000
----------------------------------------------------------------------
be/src/util/mem-info.cc | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/impala/blob/83448f1c/be/src/util/mem-info.cc
----------------------------------------------------------------------
diff --git a/be/src/util/mem-info.cc b/be/src/util/mem-info.cc
index 9ffc98d..10ce09f 100644
--- a/be/src/util/mem-info.cc
+++ b/be/src/util/mem-info.cc
@@ -115,9 +115,10 @@ MappedMemInfo MemInfo::ParseSmaps() {
string line;
getline(smaps, line);
if (line.empty()) continue;
- if (isdigit(line[0])) {
+ if (isdigit(line[0]) || (line[0] >= 'a' && line[0] <= 'f')) {
// Line is the start of a new mapping, of form:
// 561ceff9c000-561ceffa1000 rw-p 00000000 00:00 0
+ // We distinguish this case by checking for lower-case hex digits.
++result.num_maps;
continue;
}