You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jo...@apache.org on 2019/12/10 23:45:46 UTC

[impala] branch master updated (d72fd9a -> c5104d3)

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from d72fd9a  IMPALA-9209: Fix flakiness in test_end_data_stream_error
     new 4687885  [DOCS] Update impala_proxy.xml with the latest info
     new c5104d3  [DOCS] Copy edits in impala_conversion_functions.xml

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/topics/impala_conversion_functions.xml |  26 +-
 docs/topics/impala_jdbc.xml                 | 424 ++++++++++------------------
 docs/topics/impala_proxy.xml                | 164 ++++++-----
 3 files changed, 249 insertions(+), 365 deletions(-)


[impala] 01/02: [DOCS] Update impala_proxy.xml with the latest info

Posted by jo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 4687885b808c9856e792f2439435dbcf2bedf7d1
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Wed Dec 4 11:52:05 2019 -0800

    [DOCS] Update impala_proxy.xml with the latest info
    
    Change-Id: Ia9d80e21abb385704eea863d221e333441af9a39
    Reviewed-on: http://gerrit.cloudera.org:8080/14857
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Balazs Jeszenszky <je...@gmail.com>
    Reviewed-by: Vincent Tran <vt...@cloudera.com>
    Reviewed-by: Alex Rodoni <ar...@cloudera.com>
---
 docs/topics/impala_jdbc.xml  | 424 +++++++++++++++----------------------------
 docs/topics/impala_proxy.xml | 164 ++++++++++-------
 2 files changed, 240 insertions(+), 348 deletions(-)

diff --git a/docs/topics/impala_jdbc.xml b/docs/topics/impala_jdbc.xml
index 8dc3707..0711f9a 100644
--- a/docs/topics/impala_jdbc.xml
+++ b/docs/topics/impala_jdbc.xml
@@ -19,9 +19,7 @@ under the License.
 -->
 <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
 <concept id="impala_jdbc">
-
   <title id="jdbc">Configuring Impala to Work with JDBC</title>
-
   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
@@ -34,181 +32,106 @@ under the License.
       <data name="Category" value="Developers"/>
     </metadata>
   </prolog>
-
   <conbody>
-
-    <p>
-      <indexterm audience="hidden">JDBC</indexterm>
-      Impala supports the standard JDBC interface, allowing access from commercial Business
-      Intelligence tools and custom software written in Java or other programming languages. The
-      JDBC driver allows you to access Impala from a Java program that you write, or a Business
-      Intelligence or similar tool that uses JDBC to communicate with various database products.
-    </p>
-
-    <p>
-      Setting up a JDBC connection to Impala involves the following steps:
-    </p>
-
+    <p> Impala supports the standard JDBC interface, allowing access from
+      commercial Business Intelligence tools and custom software written in Java
+      or other programming languages. The JDBC driver allows you to access
+      Impala from a Java program that you write, or a Business Intelligence or
+      similar tool that uses JDBC to communicate with various database products. </p>
+    <p> Setting up a JDBC connection to Impala involves the following steps: </p>
     <ul>
-      <li>
-        Verifying the communication port where the Impala daemons in your cluster are listening
-        for incoming JDBC requests.
-      </li>
-
-      <li>
-        Installing the JDBC driver on every system that runs the JDBC-enabled application.
-      </li>
-
-      <li>
-        Specifying a connection string for the JDBC application to access one of the servers
-        running the <cmdname>impalad</cmdname> daemon, with the appropriate security settings.
-      </li>
+      <li> Verifying the communication port where the Impala daemons in your
+        cluster are listening for incoming JDBC requests. </li>
+      <li> Installing the JDBC driver on every system that runs the JDBC-enabled
+        application. </li>
+      <li> Specifying a connection string for the JDBC application to access one
+        of the servers running the <cmdname>impalad</cmdname> daemon, with the
+        appropriate security settings. </li>
     </ul>
-
     <p outputclass="toc inpage"/>
-
   </conbody>
-
   <concept id="jdbc_port">
-
     <title>Configuring the JDBC Port</title>
-
     <conbody>
-
-      <p>
-        The following are the default ports that Impala server accepts JDBC connections through:
-        <simpletable frame="all"
+      <p> The following are the default ports that Impala server accepts JDBC
+        connections through: <simpletable frame="all"
           relcolwidth="1.0* 1.03* 2.38*" id="simpletable_tr2_gnt_43b">
-
           <strow>
-
             <stentry><b>Protocol</b>
-
             </stentry>
-
             <stentry><b>Default Port</b>
-
             </stentry>
-
             <stentry><b>Flag to Specify an Alternate Port</b>
-
             </stentry>
-
           </strow>
-
           <strow>
-
             <stentry>HTTP</stentry>
-
             <stentry>28000</stentry>
-
             <stentry><codeph>&#8209;&#8209;hs2_http_port</codeph>
-
             </stentry>
-
           </strow>
-
           <strow>
-
             <stentry>Binary TCP</stentry>
-
             <stentry>21050</stentry>
-
             <stentry><codeph>&#8209;&#8209;hs2_port</codeph>
-
             </stentry>
-
           </strow>
-
         </simpletable>
       </p>
-
-      <p>
-        Make sure the port for the protocol you are using is available for communication with
-        clients, for example, that it is not blocked by firewall software.
-      </p>
-
-      <p>
-        If your JDBC client software connects to a different port, specify that alternative port
-        number with the flag in the above table when starting the <codeph>impalad</codeph>.
-      </p>
-
+      <p> Make sure the port for the protocol you are using is available for
+        communication with clients, for example, that it is not blocked by
+        firewall software. </p>
+      <p> If your JDBC client software connects to a different port, specify
+        that alternative port number with the flag in the above table when
+        starting the <codeph>impalad</codeph>. </p>
     </conbody>
-
   </concept>
-
   <concept id="jdbc_driver_choice">
-
     <title>Choosing the JDBC Driver</title>
-
     <prolog>
       <metadata>
         <data name="Category" value="Planning"/>
       </metadata>
     </prolog>
-
     <conbody>
-
-      <p>
-        In Impala 2.0 and later, you can use the Hive 0.13 JDBC driver. If you are already using
-        JDBC applications with an earlier Impala release, you should update your JDBC driver,
-        because the Hive 0.12 driver that was formerly the only choice is not compatible with
-        Impala 2.0 and later.
-      </p>
-
-      <p>
-        The Hive JDBC driver provides a substantial speed increase for JDBC applications with
-        Impala 2.0 and higher, for queries that return large result sets.
-      </p>
-
+      <p> In Impala 2.0 and later, you can use the Hive 0.13 or higher JDBC
+        driver. If you are already using JDBC applications with an earlier
+        Impala release, you should update your JDBC driver, because the Hive
+        0.12 driver that was formerly the only choice is not compatible with
+        Impala 2.0 and later. </p>
+      <p> The Hive JDBC driver provides a substantial speed increase for JDBC
+        applications with Impala 2.0 and higher, for queries that return large
+        result sets. </p>
     </conbody>
-
   </concept>
-
   <concept id="jdbc_setup">
-
     <title>Enabling Impala JDBC Support on Client Systems</title>
-
     <prolog>
       <metadata>
         <data name="Category" value="Installing"/>
       </metadata>
     </prolog>
-
     <conbody>
-
       <section id="install_hive_driver">
-
         <title>Using the Hive JDBC Driver</title>
-
-        <p>
-          You install the Hive JDBC driver (<codeph>hive-jdbc</codeph> package) through the
-          Linux package manager, on hosts within the cluster. The driver consists of several
-          Java JAR files. The same driver can be used by Impala and Hive.
-        </p>
-
-        <p>
-          To get the JAR files, install the Hive JDBC driver on each host in the cluster that
-          will run JDBC applications.
-<!-- TODO: Find a URL to point to for instructions and downloads -->
-        </p>
-
-        <note>
-          The latest JDBC driver, corresponding to Hive 0.13, provides substantial performance
-          improvements for Impala queries that return large result sets. Impala 2.0 and later
-          are compatible with the Hive 0.13 driver. If you already have an older JDBC driver
-          installed, and are running Impala 2.0 or higher, consider upgrading to the latest Hive
-          JDBC driver for best performance with JDBC applications.
-        </note>
-
-        <p>
-          If you are using JDBC-enabled applications on hosts outside the cluster, you cannot
-          use the the same install procedure on the hosts. Install the JDBC driver on at least
-          one cluster host using the preceding procedure. Then download the JAR files to each
-          client machine that will use JDBC with Impala:
-        </p>
-
-<codeblock>commons-logging-X.X.X.jar
+        <p> You install the Hive JDBC driver (<codeph>hive-jdbc</codeph>
+          package) through the Linux package manager, on hosts within the
+          cluster. The driver consists of several JAR files. The same driver can
+          be used by Impala and Hive. </p>
+        <p> To get the JAR files, install the Hive JDBC driver on each host in
+          the cluster that will run JDBC applications.  </p>
+        <note> The latest JDBC driver, corresponding to Hive 0.13, provides
+          substantial performance improvements for Impala queries that return
+          large result sets. Impala 2.0 and later are compatible with the Hive
+          0.13 driver. If you already have an older JDBC driver installed, and
+          are running Impala 2.0 or higher, consider upgrading to the latest
+          Hive JDBC driver for best performance with JDBC applications. </note>
+        <p> If you are using JDBC-enabled applications on hosts outside the
+          cluster, you cannot use the the same install procedure on the hosts.
+          Install the JDBC driver on at least one cluster host using the
+          preceding procedure. Then download the JAR files to each client
+          machine that will use JDBC with Impala: </p>
+        <codeblock>commons-logging-X.X.X.jar
   hadoop-common.jar
   hive-common-X.XX.X.jar
   hive-jdbc-X.XX.X.jar
@@ -222,185 +145,136 @@ under the License.
   slf4j-api-X.X.X.jar
   slf4j-logXjXX-X.X.X.jar
   </codeblock>
-
         <p>
-          <b>To enable JDBC support for Impala on the system where you run the JDBC
-          application:</b>
+          <b>To enable JDBC support for Impala on the system where you run the
+            JDBC application:</b>
         </p>
-
         <ol>
-          <li>
-            Download the JAR files listed above to each client machine.
-            <note>
-              For Maven users, see <xref keyref="Impala-JDBC-Example">this sample github
-              page</xref> for an example of the dependencies you could add to a
-              <codeph>pom</codeph> file instead of downloading the individual JARs.
-            </note>
+          <li> Download the JAR files listed above to each client machine.
+              <note> For Maven users, see <xref keyref="Impala-JDBC-Example"
+                >this sample github page</xref> for an example of the
+              dependencies you could add to a <codeph>pom</codeph> file instead
+              of downloading the individual JARs. </note>
           </li>
-
-          <li>
-            Store the JAR files in a location of your choosing, ideally a directory already
-            referenced in your <codeph>CLASSPATH</codeph> setting. For example:
-            <ul>
-              <li>
-                On Linux, you might use a location such as <codeph>/opt/jars/</codeph>.
-              </li>
-
-              <li>
-                On Windows, you might use a subdirectory underneath <filepath>C:\Program
-                Files</filepath>.
-              </li>
+          <li> Store the JAR files in a location of your choosing, ideally a
+            directory already referenced in your <codeph>CLASSPATH</codeph>
+            setting. For example: <ul>
+              <li> On Linux, you might use a location such as
+                  <codeph>/opt/jars/</codeph>. </li>
+              <li> On Windows, you might use a subdirectory underneath
+                  <filepath>C:\Program Files</filepath>. </li>
             </ul>
           </li>
-
-          <li>
-            To successfully load the Impala JDBC driver, client programs must be able to locate
-            the associated JAR files. This often means setting the <codeph>CLASSPATH</codeph>
-            for the client process to include the JARs. Consult the documentation for your JDBC
-            client for more details on how to install new JDBC drivers, but some examples of how
-            to set <codeph>CLASSPATH</codeph> variables include:
-            <ul>
-              <li>
-                On Linux, if you extracted the JARs to <codeph>/opt/jars/</codeph>, you might
-                issue the following command to prepend the JAR files path to an existing
-                classpath:
-<codeblock>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</codeblock>
+          <li> To successfully load the Impala JDBC driver, client programs must
+            be able to locate the associated JAR files. This often means setting
+            the <codeph>CLASSPATH</codeph> for the client process to include the
+            JARs. Consult the documentation for your JDBC client for more
+            details on how to install new JDBC drivers, but some examples of how
+            to set <codeph>CLASSPATH</codeph> variables include: <ul>
+              <li> On Linux, if you extracted the JARs to
+                  <codeph>/opt/jars/</codeph>, you might issue the following
+                command to prepend the JAR files path to an existing classpath:
+                <codeblock>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</codeblock>
               </li>
-
-              <li>
-                On Windows, use the <b>System Properties</b> control panel item to modify the
-                <b>Environment Variables</b> for your system. Modify the environment variables
-                to include the path to which you extracted the files.
-                <note>
-                  If the existing <codeph>CLASSPATH</codeph> on your client machine refers to
-                  some older version of the Hive JARs, ensure that the new JARs are the first
-                  ones listed. Either put the new JAR files earlier in the listings, or delete
-                  the other references to Hive JAR files.
-                </note>
+              <li> On Windows, use the <b>System Properties</b> control panel
+                item to modify the <b>Environment Variables</b> for your system.
+                Modify the environment variables to include the path to which
+                you extracted the files. <note> If the existing
+                    <codeph>CLASSPATH</codeph> on your client machine refers to
+                  some older version of the Hive JARs, ensure that the new JARs
+                  are the first ones listed. Either put the new JAR files
+                  earlier in the listings, or delete the other references to
+                  Hive JAR files. </note>
               </li>
             </ul>
           </li>
         </ol>
-
       </section>
-
     </conbody>
-
   </concept>
-
   <concept id="jdbc_connect">
-
     <title>Establishing JDBC Connections</title>
-
     <conbody>
-
-      <p>
-        The JDBC driver class depends on which driver you select.
-      </p>
-
+      <p> The JDBC driver class depends on which driver you select. </p>
       <note conref="../shared/impala_common.xml#common/proxy_jdbc_caveat"/>
-
       <section id="class_hive_driver">
-
         <title>Using the Hive JDBC Driver</title>
-
-        <p>
-          For example, with the Hive JDBC driver, the class name is
-          <codeph>org.apache.hive.jdbc.HiveDriver</codeph>. Once you have configured Impala to
-          work with JDBC, you can establish connections between the two. To do so for a cluster
-          that does not use Kerberos authentication, use a connection string of the form
-          <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;auth=noSasl</codeph>.
-<!--
+        <p> For example, with the Hive JDBC driver, the class name is
+            <codeph>org.apache.hive.jdbc.HiveDriver</codeph>. Once you have
+          configured Impala to work with JDBC, you can establish connections
+          between the two. To do so for a cluster that does not use Kerberos
+          authentication, use a connection string of the form
+              <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;auth=noSasl</codeph>.
+          <!--
         Include the <codeph>auth=noSasl</codeph> argument
         only when connecting to a non-Kerberos cluster; if Kerberos is enabled, omit the <codeph>auth</codeph> argument.
 -->
-          For example, you might use:
-        </p>
-
-<codeblock>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</codeblock>
-
-        <p>
-          To connect to an instance of Impala that requires Kerberos authentication, use a
-          connection string of the form
-          <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;principal=<varname>principal_name</varname></codeph>.
-          The principal must be the same user principal you used when starting Impala. For
-          example, you might use:
+          For example, you might use: </p>
+        <codeblock>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</codeblock>
+        <p> To connect to an instance of Impala that requires Kerberos
+          authentication, use a connection string of the form
+              <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/;principal=<varname>principal_name</varname></codeph>.
+          The principal must be the same user principal you used when starting
+          Impala. For example, you might use: </p>
+        <codeblock>jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM</codeblock>
+        <p> To connect to an instance of Impala that requires LDAP
+          authentication, use a connection string of the form
+              <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/<varname>db_name</varname>;user=<varname>ldap_userid</varname>;password=<varname>ldap_password</varname></codeph>.
+          For example, you might use: </p>
+        <codeblock>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</codeblock>
+        <p> To connect to an instance of Impala over HTTP, specify the HTTP
+          port, 28000 by default, and <codeph>transportMode=http</codeph> in the
+          connection string. For example:
+          <codeblock>jdbc:hive2://myhost.example.com:28000/;transportMode=http</codeblock>
         </p>
-
-<codeblock>jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM</codeblock>
-
-        <p>
-          To connect to an instance of Impala that requires LDAP authentication, use a
-          connection string of the form
-          <codeph>jdbc:hive2://<varname>host</varname>:<varname>port</varname>/<varname>db_name</varname>;user=<varname>ldap_userid</varname>;password=<varname>ldap_password</varname></codeph>.
-          For example, you might use:
-        </p>
-
-<codeblock>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</codeblock>
-
-        <p>
-          To connect to an instance of Impala over HTTP, specify the HTTP port, 28000 by
-          default, and <codeph>transportMode=http</codeph> in the connection string. For
-          example:
-<codeblock>jdbc:hive2://myhost.example.com:28000/;transportMode=http</codeblock>
-        </p>
-
         <note>
-          <p conref="../shared/impala_common.xml#common/hive_jdbc_ssl_kerberos_caveat"/>
+          <p
+            conref="../shared/impala_common.xml#common/hive_jdbc_ssl_kerberos_caveat"
+          />
         </note>
-
       </section>
-
     </conbody>
-
   </concept>
-
   <concept rev="2.3.0" id="jdbc_odbc_notes">
-
-    <title>Notes about JDBC and ODBC Interaction with Impala SQL Features</title>
-
+    <title>Notes about JDBC and ODBC Interaction with Impala SQL
+      Features</title>
     <conbody>
-
-      <p>
-        Most Impala SQL features work equivalently through the <cmdname>impala-shell</cmdname>
-        interpreter of the JDBC or ODBC APIs. The following are some exceptions to keep in mind
-        when switching between the interactive shell and applications using the APIs:
-      </p>
-
+      <p> Most Impala SQL features work equivalently through the
+          <cmdname>impala-shell</cmdname> interpreter of the JDBC or ODBC APIs.
+        The following are some exceptions to keep in mind when switching between
+        the interactive shell and applications using the APIs: </p>
       <ul>
         <li>
           <p conref="../shared/impala_common.xml#common/complex_types_blurb"/>
           <ul>
             <li>
-              <p>
-                Queries involving the complex types (<codeph>ARRAY</codeph>,
-                <codeph>STRUCT</codeph>, and <codeph>MAP</codeph>) require notation that might
-                not be available in all levels of JDBC and ODBC drivers. If you have trouble
-                querying such a table due to the driver level or inability to edit the queries
-                used by the application, you can create a view that exposes a <q>flattened</q>
-                version of the complex columns and point the application at the view. See
-                <xref href="impala_complex_types.xml#complex_types"/> for details.
+              <p> Queries involving the complex types (<codeph>ARRAY</codeph>,
+                  <codeph>STRUCT</codeph>, and <codeph>MAP</codeph>) require
+                notation that might not be available in all levels of JDBC and
+                ODBC drivers. If you have trouble querying such a table due to
+                the driver level or inability to edit the queries used by the
+                application, you can create a view that exposes a
+                  <q>flattened</q> version of the complex columns and point the
+                application at the view. See <xref
+                  href="impala_complex_types.xml#complex_types"/> for details.
               </p>
             </li>
-
             <li>
-              <p>
-                The complex types available in <keyword keyref="impala23_full"/> and higher are
-                supported by the JDBC <codeph>getColumns()</codeph> API. Both
-                <codeph>MAP</codeph> and <codeph>ARRAY</codeph> are reported as the JDBC SQL
-                Type <codeph>ARRAY</codeph>, because this is the closest matching Java SQL type.
-                This behavior is consistent with Hive. <codeph>STRUCT</codeph> types are
-                reported as the JDBC SQL Type <codeph>STRUCT</codeph>.
-              </p>
-
-              <p>
-                To be consistent with Hive's behavior, the TYPE_NAME field is populated with the
-                primitive type name for scalar types, and with the full <codeph>toSql()</codeph>
-                for complex types. The resulting type names are somewhat inconsistent, because
-                nested types are printed differently than top-level types. For example, the
-                following list shows how <codeph>toSQL()</codeph> for Impala types are
-                translated to <codeph>TYPE_NAME</codeph> values:
-<codeblock><![CDATA[DECIMAL(10,10)         becomes  DECIMAL
+              <p> The complex types available in <keyword keyref="impala23_full"
+                /> and higher are supported by the JDBC
+                  <codeph>getColumns()</codeph> API. Both <codeph>MAP</codeph>
+                and <codeph>ARRAY</codeph> are reported as the JDBC SQL Type
+                  <codeph>ARRAY</codeph>, because this is the closest matching
+                Java SQL type. This behavior is consistent with Hive.
+                  <codeph>STRUCT</codeph> types are reported as the JDBC SQL
+                Type <codeph>STRUCT</codeph>. </p>
+              <p> To be consistent with Hive's behavior, the TYPE_NAME field is
+                populated with the primitive type name for scalar types, and
+                with the full <codeph>toSql()</codeph> for complex types. The
+                resulting type names are somewhat inconsistent, because nested
+                types are printed differently than top-level types. For example,
+                the following list shows how <codeph>toSQL()</codeph> for Impala
+                types are translated to <codeph>TYPE_NAME</codeph> values: <codeblock><![CDATA[DECIMAL(10,10)         becomes  DECIMAL
 CHAR(10)               becomes  CHAR
 VARCHAR(10)            becomes  VARCHAR
 ARRAY<DECIMAL(10,10)>  becomes  ARRAY<DECIMAL(10,10)>
@@ -413,27 +287,17 @@ ARRAY<VARCHAR(10)>     becomes  ARRAY<VARCHAR(10)>
           </ul>
         </li>
       </ul>
-
     </conbody>
-
   </concept>
-
   <concept id="jdbc_kudu">
-
     <title>Kudu Considerations for DML Statements</title>
-
     <conbody>
-
-      <p>
-        Currently, Impala <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or other DML
-        statements issued through the JDBC interface against a Kudu table do not return JDBC
-        error codes for conditions such as duplicate primary key columns. Therefore, for
-        applications that issue a high volume of DML statements, prefer to use the Kudu Java API
-        directly rather than a JDBC application.
-      </p>
-
+      <p> Currently, Impala <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>, or
+        other DML statements issued through the JDBC interface against a Kudu
+        table do not return JDBC error codes for conditions such as duplicate
+        primary key columns. Therefore, for applications that issue a high
+        volume of DML statements, prefer to use the Kudu Java API directly
+        rather than a JDBC application. </p>
     </conbody>
-
   </concept>
-
 </concept>
diff --git a/docs/topics/impala_proxy.xml b/docs/topics/impala_proxy.xml
index 453c485..ca67885 100644
--- a/docs/topics/impala_proxy.xml
+++ b/docs/topics/impala_proxy.xml
@@ -48,9 +48,7 @@ under the License.
     </p>
 
     <p>
-      Currently, the Impala statestore mechanism does not include such proxying and
-      load-balancing features. Set up a software package of your choice to perform these
-      functions.
+      Set up a software package of your choice to perform these functions.
     </p>
 
     <note>
@@ -107,9 +105,7 @@ under the License.
         <li>
           Select and download the load-balancing proxy software or other load-balancing hardware
           appliance. It should only need to be installed and configured on a single host,
-          typically on an edge node. Pick a host other than the DataNodes where
-          <cmdname>impalad</cmdname> is running, because the intention is to protect against the
-          possibility of one or more of these DataNodes becoming unavailable.
+          typically on an edge node.
         </li>
 
         <li>
@@ -117,13 +113,15 @@ under the License.
           particular:
           <ul>
             <li>
-              Set up a port that the load balancer will listen on to relay Impala requests back
-              and forth.
+              To relay Impala requests back and forth, set up a port that the load balancer will
+              listen on.
             </li>
 
             <li>
-              See <xref href="#proxy_balancing" format="dita"/> for load balancing algorithm
-              options.
+              Select a load balancing algorithm. See
+              <xref
+                href="#proxy_balancing" format="dita"/> for load balancing
+              algorithm options.
             </li>
 
             <li>
@@ -136,7 +134,7 @@ under the License.
 
         <li>
           If you are using Hue or JDBC-based applications, you typically set up load balancing
-          for both ports 21000 and 21050, because these client applications connect through port
+          for both ports 21000 and 21050 because these client applications connect through port
           21050 while the <cmdname>impala-shell</cmdname> command connects through port 21000.
           See <xref href="impala_ports.xml#ports"/> for when to use port 21000, 21050, or
           another value depending on what type of connections you are load balancing.
@@ -149,8 +147,8 @@ under the License.
 
         <li>
           For any scripts, jobs, or configuration settings for applications that formerly
-          connected to a specific DataNode to run Impala SQL statements, change the connection
-          information (such as the <codeph>-i</codeph> option in
+          connected to a specific <cmdname>impalad</cmdname> to run Impala SQL statements,
+          change the connection information (such as the <codeph>-i</codeph> option in
           <cmdname>impala-shell</cmdname>) to point to the load balancer instead.
         </li>
       </ol>
@@ -231,10 +229,8 @@ under the License.
           </dt>
 
           <dd>
-            <p>
-              Distributes connections to all coordinator nodes. Typically not recommended for
-              Impala.
-            </p>
+            Distributes connections to all coordinator nodes. Typically not recommended for
+            Impala.
           </dd>
 
         </dlentry>
@@ -267,8 +263,7 @@ under the License.
 
       <p>
         In a cluster using Kerberos, applications check host credentials to verify that the host
-        they are connecting to is the same one that is actually processing the request, to
-        prevent man-in-the-middle attacks.
+        they are connecting to is the same one that is actually processing the request.
       </p>
 
       <p>
@@ -278,13 +273,12 @@ under the License.
       </p>
 
       <p>
-        In <keyword keyref="impala212_full">Impala 2.12</keyword> and higher, if you enable a
-        proxy server in a Kerberized cluster, users have an option to connect to Impala daemons
-        directly from <cmdname>impala-shell</cmdname> using the <codeph>-b</codeph> /
-        <codeph>--kerberos_host_fqdn</codeph> option when you start
-        <cmdname>impala-shell</cmdname>. This option can be used for testing or troubleshooting
-        purposes, but not recommended for live production environments as it defeats the purpose
-        of a load balancer/proxy.
+        In <keyword keyref="impala212_full">Impala 2.12</keyword> and higher versions, when you
+        enable a proxy server in a Kerberized cluster, users have an option to connect to Impala
+        daemons directly from <cmdname>impala-shell</cmdname> using the <codeph>-b</codeph> /
+        <codeph>--kerberos_host_fqdn</codeph> <cmdname>impala-shell</cmdname> flag. This option
+        can be used for testing or troubleshooting purposes, but not recommended for live
+        production environments as it defeats the purpose of a load balancer/proxy.
       </p>
 
       <p>
@@ -305,8 +299,7 @@ impala-shell -i impalad-1.mydomain.com -k -b loadbalancer-1.mydomain.com
       </p>
 
       <p>
-        To clarify that the load-balancing proxy server is legitimate, perform these extra
-        Kerberos setup steps:
+        To validate the load-balancing proxy server, perform these extra Kerberos setup steps:
       </p>
 
       <ol>
@@ -321,26 +314,29 @@ impala-shell -i impalad-1.mydomain.com -k -b loadbalancer-1.mydomain.com
           Choose the host you will use for the proxy server. Based on the Kerberos setup
           procedure, it should already have an entry
           <codeph>impala/<varname>proxy_host</varname>@<varname>realm</varname></codeph> in its
-          keytab. If not, go back over the initial Kerberos configuration steps for the keytab
-          on each host running the <cmdname>impalad</cmdname> daemon.
+          <filepath>keytab</filepath>. If not, go back over the initial Kerberos configuration
+          steps for the <filepath>keytab</filepath> on each host running the
+          <cmdname>impalad</cmdname> daemon.
         </li>
 
         <li>
-          Copy the keytab file from the proxy host to all other hosts in the cluster that run
-          the <cmdname>impalad</cmdname> daemon. (For optimal performance,
-          <cmdname>impalad</cmdname> should be running on all DataNodes in the cluster.) Put the
-          keytab file in a secure location on each of these other hosts.
+          Copy the <filepath>keytab</filepath> file from the proxy host to all other hosts in
+          the cluster that run the <cmdname>impalad</cmdname> daemon. Put the
+          <filepath>keytab</filepath> file in a secure location on each of these other hosts.
         </li>
 
         <li>
           Add an entry
           <codeph>impala/<varname>actual_hostname</varname>@<varname>realm</varname></codeph> to
-          the keytab on each host running the <cmdname>impalad</cmdname> daemon.
+          the <filepath>keytab</filepath> on each host running the <cmdname>impalad</cmdname>
+          daemon.
         </li>
 
         <li>
-          For each impalad node, merge the existing keytab with the proxy’s keytab using
-          <cmdname>ktutil</cmdname>, producing a new keytab file. For example:
+          For each <cmdname>impalad</cmdname> node, merge the existing
+          <filepath>keytab</filepath> with the proxy’s <filepath>keytab</filepath> using
+          <cmdname>ktutil</cmdname>, producing a new <filepath>keytab</filepath> file. For
+          example:
 <codeblock>$ ktutil
   ktutil: read_kt proxy.keytab
   ktutil: read_kt impala.keytab
@@ -349,44 +345,39 @@ impala-shell -i impalad-1.mydomain.com -k -b loadbalancer-1.mydomain.com
         </li>
 
         <li>
-          To verify that the keytabs are merged, run the command:
+          To verify that the <filepath>keytabs</filepath> are merged, run the command:
 <codeblock>
 klist -k <varname>keytabfile</varname>
 </codeblock>
-          which lists the credentials for both <codeph>principal</codeph> and
+          The command lists the credentials for both <codeph>principal</codeph> and
           <codeph>be_principal</codeph> on all nodes.
         </li>
 
         <li>
-          Make sure that the <codeph>impala</codeph> user has permission to read this merged
-          keytab file.
+          Make sure that the <codeph>impala</codeph> user has the permission to read this merged
+          <filepath>keytab</filepath> file.
         </li>
 
         <li>
-          Change the following configuration settings for each host in the cluster that
-          participates in the load balancing:
-          <ul>
-            <li>
-              In the <cmdname>impalad</cmdname> option definition, add:
+          For each coordinator <codeph>impalad</codeph> host in the cluster that participates in
+          the load balancing, add the following configuration options to receive client
+          connections coming through the load balancer proxy server:
 <codeblock>
---principal=impala/<i>proxy_host@realm</i>
-  --be_principal=impala/<i>actual_host@realm</i>
-  --keytab_file=<i>path_to_merged_keytab</i>
+--principal=impala/<varname>proxy_host@realm</varname>
+  --be_principal=impala/<varname>actual_host@realm</varname>
+  --keytab_file=<varname>path_to_merged_keytab</varname>
 </codeblock>
-              <note>
-                Every host has different <codeph>--be_principal</codeph> because the actual
-                hostname is different on each host. Specify the fully qualified domain name
-                (FQDN) for the proxy host, not the IP address. Use the exact FQDN as returned by
-                a reverse DNS lookup for the associated IP address.
-              </note>
-            </li>
+          <p>
+            The <codeph>--principal</codeph> setting prevents a client from connecting to a
+            coordinator <codeph>impalad</codeph> using a principal other than one specified.
+          </p>
 
-            <li>
-              Modify the startup options. See
-              <xref href="impala_config_options.xml#config_options"/> for the procedure to
-              modify the startup options.
-            </li>
-          </ul>
+          <note>
+            Every host has different <codeph>--be_principal</codeph> because the actual host
+            name is different on each host. Specify the fully qualified domain name (FQDN) for
+            the proxy host, not the IP address. Use the exact FQDN as returned by a reverse DNS
+            lookup for the associated IP address.
+          </note>
         </li>
 
         <li>
@@ -396,6 +387,40 @@ klist -k <varname>keytabfile</varname>
         </li>
       </ol>
 
+      <section id="section_fjz_mfn_yjb">
+
+        <title>Client Connection to Proxy Server in Kerberized Clusters</title>
+
+        <p>
+          When a client connect to Impala, the service principal specified by the client must
+          match the <codeph>-principal</codeph> setting of the Impala proxy server. And the
+          client should connect to the proxy server port.
+        </p>
+
+        <p>
+          In <filepath>hue.ini</filepath>, set the following for to configure Hue to
+          automatically connect to the proxy server:
+        </p>
+
+<codeblock>[impala]
+server_host=<varname>proxy_host</varname>
+impala_principal=impala/<varname>proxy_host</varname></codeblock>
+
+        <p>
+          The following are the JDBC connection string formats when connecting through the load
+          balancer with the load balancer's host name in the principal:
+        </p>
+
+<codeblock>jdbc:hive2://<varname>proxy_host</varname>:<varname>load_balancer_port</varname>/;principal=impala/_HOST@<varname>realm</varname>
+jdbc:hive2://<varname>proxy_host</varname>:<varname>load_balancer_port</varname>/;principal=impala/<varname>proxy_host</varname>@<varname>realm</varname></codeblock>
+
+        <p>
+          When starting <cmdname>impala-shell</cmdname>, specify the service principal via the
+          <codeph>-b</codeph> or <codeph>--kerberos_host_fqdn</codeph> flag.
+        </p>
+
+      </section>
+
     </conbody>
 
   </concept>
@@ -512,8 +537,9 @@ klist -k <varname>keytabfile</varname>
       <ul>
         <li>
           <p>
-            Install the load balancer: <codeph>yum install haproxy</codeph>
+            Install the load balancer:
           </p>
+<codeblock>yum install haproxy</codeblock>
         </li>
 
         <li>
@@ -604,7 +630,8 @@ listen stats :25002
     stats enable
     stats auth <varname>username</varname>:<varname>password</varname>
 
-# This is the setup for Impala. Impala client connect to load_balancer_host:25003.
+# Setup for Impala.
+# Impala client connect to load_balancer_host:25003.
 # HAProxy will balance connections among the list of servers listed below.
 # The list of Impalad is listening at port 21000 for beeswax (impala-shell) or original ODBC driver.
 # For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000.
@@ -621,12 +648,13 @@ listen impala :25003
 # Setup for Hue or other JDBC-enabled applications.
 # In particular, Hue requires sticky sessions.
 # The application connects to load_balancer_host:21051, and HAProxy balances
-# connections to the associated hosts, where Impala listens for JDBC
-# requests on port 21050.
+# connections to the associated hosts, where Impala listens for
+# JDBC requests at port 21050.
 listen impalajdbc :21051
     mode tcp
     option tcplog
     balance source
+
     server <varname>symbolic_name_5</varname> impala-host-1.example.com:21050 check
     server <varname>symbolic_name_6</varname> impala-host-2.example.com:21050 check
     server <varname>symbolic_name_7</varname> impala-host-3.example.com:21050 check
@@ -635,8 +663,8 @@ listen impalajdbc :21051
 
       <note type="important">
         Hue requires the <codeph>check</codeph> option at end of each line in the above file to
-        ensure HAProxy can detect any unreachable Impalad server, and failover can be
-        successful. Without the TCP check, you may hit an error when the
+        ensure HAProxy can detect any unreachable <cmdname>Impalad</cmdname> server, and
+        failover can be successful. Without the TCP check, you may hit an error when the
         <cmdname>impalad</cmdname> daemon to which Hue tries to connect is down.
       </note>
 


[impala] 02/02: [DOCS] Copy edits in impala_conversion_functions.xml

Posted by jo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit c5104d3d5b3478b26ec8ee6468b671ca29fbc3df
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Tue Dec 10 14:09:38 2019 -0800

    [DOCS] Copy edits in impala_conversion_functions.xml
    
    Change-Id: I32b6d146f0a78abdeb28cb103edcab847fe5b9da
    Reviewed-on: http://gerrit.cloudera.org:8080/14876
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Alex Rodoni <ar...@cloudera.com>
---
 docs/topics/impala_conversion_functions.xml | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/docs/topics/impala_conversion_functions.xml b/docs/topics/impala_conversion_functions.xml
index 57cdc17..8d161f5 100644
--- a/docs/topics/impala_conversion_functions.xml
+++ b/docs/topics/impala_conversion_functions.xml
@@ -397,10 +397,8 @@ under the License.
                       Month number
                     </entry>
                     <entry>
-                      <p>
-                        In date/time to string conversions, a 1-digit month is prefixed with a
-                        zero.
-                      </p>
+                      <p> In date/time to string conversions, 1-digit months are
+                        prefixed with a zero. </p>
                     </entry>
                   </row>
                   <row>
@@ -424,9 +422,8 @@ under the License.
                       In string to date/time conversions:
 
                       <ul>
-                        <li>
-                          Converts textual month names to 2-digit month numbers.
-                        </li>
+                        <li> Converts a textual month name to a 2-digit month
+                          number. </li>
 
                         <li>
                           The input strings are expected without trailing spaces, e.g.
@@ -504,15 +501,12 @@ under the License.
                       Week of year (1-53)
                     </entry>
                     <entry>
-                      <p>
-                        Not supported in a string to date/time conversions.
-                      </p>
+                      <p> Not supported in string to date/time conversions. </p>
 
 
 
-                      <p>
-                        1st week begins on January 1st and ends on January 7th.
-                      </p>
+                      <p> The 1st week begins on January 1st and ends on January
+                        7th. </p>
                     </entry>
                   </row>
                   <row>
@@ -558,10 +552,8 @@ under the License.
                       Day of month (1-31)
                     </entry>
                     <entry>
-                      <p>
-                        In date/time to string conversions, one digit day is prefixed with a
-                        zero.
-                      </p>
+                      <p> In date/time to string conversions, 1-digit days are
+                        prefixed with a zero. </p>
                     </entry>
                   </row>
                   <row>