You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kyuubi.apache.org by ch...@apache.org on 2023/04/03 10:52:57 UTC

[kyuubi] branch branch-1.7 updated: [KYUUBI #4655] [DOCS] Enrich docs for Kyuubi Hive JDBC driver

This is an automated email from the ASF dual-hosted git repository.

chengpan pushed a commit to branch branch-1.7
in repository https://gitbox.apache.org/repos/asf/kyuubi.git


The following commit(s) were added to refs/heads/branch-1.7 by this push:
     new 45c34059c [KYUUBI #4655] [DOCS] Enrich docs for Kyuubi Hive JDBC driver
45c34059c is described below

commit 45c34059c8e27d47adfa54fd950653c9cf31fa55
Author: Cheng Pan <ch...@apache.org>
AuthorDate: Mon Apr 3 18:51:27 2023 +0800

    [KYUUBI #4655] [DOCS] Enrich docs for Kyuubi Hive JDBC driver
    
    Update the outdated words for Kyuubi Hive JDBC driver, and supply more details about Kerberos authentication.
    
    - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
    
    - [x] Add screenshots for manual tests if appropriate
    
    <img width="1400" alt="image" src="https://user-images.githubusercontent.com/26535726/229476374-d662c3b2-c1bc-44e9-a717-92f401586feb.png">
    
    - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
    
    Closes #4655 from pan3793/docs-v2.
    
    Closes #4655
    
    9d2cb4875 [Cheng Pan] Update docs/quick_start/quick_start_with_jdbc.md
    00af58e27 [Cheng Pan] address comments
    48bf21664 [Cheng Pan] Update docs/quick_start/quick_start_with_jupyter.md
    054e2bea0 [Cheng Pan] nit
    a0a80b818 [Cheng Pan] nit
    41ff97de3 [Cheng Pan] [DOCS] Enrich docs for Kyuubi Hive JDBC Driver
    
    Authored-by: Cheng Pan <ch...@apache.org>
    Signed-off-by: Cheng Pan <ch...@apache.org>
    (cherry picked from commit a947dcb792b17f3fa40f03ec03397b6670b0b32a)
    Signed-off-by: Cheng Pan <ch...@apache.org>
---
 README.md                                    |   3 +-
 docs/appendix/terminology.md                 |   4 +-
 docs/client/jdbc/hive_jdbc.md                |  14 ++--
 docs/client/jdbc/kyuubi_jdbc.rst             | 115 ++++++++++++++++++++-------
 docs/extensions/server/authentication.rst    |   4 +-
 docs/quick_start/quick_start_with_helm.md    |   2 +-
 docs/quick_start/quick_start_with_jdbc.md    | 114 +++++++++++++-------------
 docs/quick_start/quick_start_with_jupyter.md |   2 +-
 8 files changed, 159 insertions(+), 99 deletions(-)

diff --git a/README.md b/README.md
index 16fc794ee..a69bfb590 100644
--- a/README.md
+++ b/README.md
@@ -70,8 +70,7 @@ HiveServer2 can identify and authenticate a caller, and then if the caller also
 
 Kyuubi extends the use of STS in a multi-tenant model based on a unified interface and relies on the concept of multi-tenancy to interact with cluster managers to finally gain the ability of resources sharing/isolation and data security. The loosely coupled architecture of the Kyuubi server and engine dramatically improves the client concurrency and service stability of the service itself.
 
-
-#### DataLake/LakeHouse Support
+#### DataLake/Lakehouse Support
 
 The vision of Kyuubi is to unify the portal and become an easy-to-use data lake management platform. Different kinds of workloads, such as ETL processing and BI analytics, can be supported by one platform, using one copy of data, with one SQL interface.
 
diff --git a/docs/appendix/terminology.md b/docs/appendix/terminology.md
index b81fa25fe..b349d77c7 100644
--- a/docs/appendix/terminology.md
+++ b/docs/appendix/terminology.md
@@ -129,9 +129,9 @@ As an enterprise service, SLA commitment is essential. Deploying Kyuubi in High
 </em>
 </p>
 
-## DataLake & LakeHouse
+## DataLake & Lakehouse
 
-Kyuubi unifies DataLake & LakeHouse access in the simplest pure SQL way, meanwhile it's also the securest way with authentication and SQL standard authorization.
+Kyuubi unifies DataLake & Lakehouse access in the simplest pure SQL way, meanwhile it's also the securest way with authentication and SQL standard authorization.
 
 ### Apache Iceberg
 
diff --git a/docs/client/jdbc/hive_jdbc.md b/docs/client/jdbc/hive_jdbc.md
index 42d2f7b5a..00498dfaa 100644
--- a/docs/client/jdbc/hive_jdbc.md
+++ b/docs/client/jdbc/hive_jdbc.md
@@ -19,14 +19,18 @@
 
 ## Instructions
 
-Kyuubi does not provide its own JDBC Driver so far,
-as it is fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI) tools to query,
-analyze and visualize data though Spark SQL engines.
+Kyuubi is fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI)
+tools to query, analyze and visualize data though Spark SQL engines.
+
+It's recommended to use [Kyuubi JDBC driver](./kyuubi_jdbc.html) for new applications.
 
 ## Install Hive JDBC
 
 For programing, the easiest way to get `hive-jdbc` is from [the maven central](https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc). For example,
 
+The following sections demonstrate how to use Hive JDBC driver 2.3.8 to connect Kyuubi Server, actually, any version
+less or equals 3.1.x should work fine.
+
 - **maven**
 
 ```xml
@@ -76,7 +80,3 @@ jdbc:hive2://<host>:<port>/<dbName>;<sessionVars>?<kyuubiConfs>#<[spark|hive]Var
 jdbc:hive2://localhost:10009/default;hive.server2.proxy.user=proxy_user?kyuubi.engine.share.level=CONNECTION;spark.ui.enabled=false#var_x=y
 ```
 
-## Unsupported Hive Features
-
-- Connect to HiveServer2 using HTTP transport. ```transportMode=http```
-
diff --git a/docs/client/jdbc/kyuubi_jdbc.rst b/docs/client/jdbc/kyuubi_jdbc.rst
index fdc40d599..305200d0d 100644
--- a/docs/client/jdbc/kyuubi_jdbc.rst
+++ b/docs/client/jdbc/kyuubi_jdbc.rst
@@ -17,14 +17,14 @@ Kyuubi Hive JDBC Driver
 =======================
 
 .. versionadded:: 1.4.0
-   Since 1.4.0, kyuubi community maintains a forked hive jdbc driver module and provides both shaded and non-shaded packages.
+   Kyuubi community maintains a forked Hive JDBC driver module and provides both shaded and non-shaded packages.
 
-This packages aims to support some missing functionalities of the original hive jdbc.
-For kyuubi engines that support multiple catalogs, it provides meta APIs for better support.
-The behaviors of the original hive jdbc have remained.
+This packages aims to support some missing functionalities of the original Hive JDBC driver.
+For Kyuubi engines that support multiple catalogs, it provides meta APIs for better support.
+The behaviors of the original Hive JDBC driver have remained.
 
-To access a Hive data warehouse or new lakehouse formats, such as Apache Iceberg/Hudi, delta lake using the kyuubi jdbc driver for Apache kyuubi, you need to configure
-the following:
+To access a Hive data warehouse or new Lakehouse formats, such as Apache Iceberg/Hudi, Delta Lake using the Kyuubi JDBC driver
+for Apache kyuubi, you need to configure the following:
 
 - The list of driver library files - :ref:`referencing-libraries`.
 - The Driver or DataSource class - :ref:`registering_class`.
@@ -46,28 +46,28 @@ In the code, specify the artifact `kyuubi-hive-jdbc-shaded` from `Maven Central`
 Maven
 ^^^^^
 
-.. code-block:: xml
+.. parsed-literal::
 
    <dependency>
        <groupId>org.apache.kyuubi</groupId>
        <artifactId>kyuubi-hive-jdbc-shaded</artifactId>
-       <version>1.5.2-incubating</version>
+       <version>\ |release|\</version>
    </dependency>
 
-Sbt
+sbt
 ^^^
 
-.. code-block:: sbt
+.. parsed-literal::
 
-   libraryDependencies += "org.apache.kyuubi" % "kyuubi-hive-jdbc-shaded" % "1.5.2-incubating"
+   libraryDependencies += "org.apache.kyuubi" % "kyuubi-hive-jdbc-shaded" % "\ |release|\"
 
 
 Gradle
 ^^^^^^
 
-.. code-block:: gradle
+.. parsed-literal::
 
-   implementation group: 'org.apache.kyuubi', name: 'kyuubi-hive-jdbc-shaded', version: '1.5.2-incubating'
+   implementation group: 'org.apache.kyuubi', name: 'kyuubi-hive-jdbc-shaded', version: '\ |release|\'
 
 Using the Driver in a JDBC Application
 **************************************
@@ -92,11 +92,9 @@ connection for JDBC:
 
 .. code-block:: java
 
-   private static Connection connectViaDM() throws Exception
-   {
-      Connection connection = null;
-      connection = DriverManager.getConnection(CONNECTION_URL);
-      return connection;
+   private static Connection newKyuubiConnection() throws Exception {
+     Connection connection = DriverManager.getConnection(CONNECTION_URL);
+     return connection;
    }
 
 .. _building_url:
@@ -112,12 +110,13 @@ accessing. The following is the format of the connection URL for the Kyuubi Hive
 
 .. code-block:: jdbc
 
-   jdbc:subprotocol://host:port/schema;<clientProperties;><[#|?]sessionProperties>
+   jdbc:subprotocol://host:port[/catalog]/[schema];<clientProperties;><[#|?]sessionProperties>
 
 - subprotocol: kyuubi or hive2
 - host: DNS or IP address of the kyuubi server
 - port: The number of the TCP port that the server uses to listen for client requests
-- dbName: Optional database name to set the current database to run the query against, use `default` if absent.
+- catalog: Optional catalog name to set the current catalog to run the query against.
+- schema: Optional database name to set the current database to run the query against, use `default` if absent.
 - clientProperties: Optional `semicolon(;)` separated `key=value` parameters identified and affect the client behavior locally. e.g., user=foo;password=bar.
 - sessionProperties: Optional `semicolon(;)` separated `key=value` parameters used to configure the session, operation or background engines.
   For instance, `kyuubi.engine.share.level=CONNECTION` determines the background engine instance is used only by the current connection. `spark.ui.enabled=false` disables the Spark UI of the engine.
@@ -127,7 +126,7 @@ accessing. The following is the format of the connection URL for the Kyuubi Hive
    - Properties are case-sensitive
    - Do not duplicate properties in the connection URL
 
-Connection URL over Http
+Connection URL over HTTP
 ************************
 
 .. versionadded:: 1.6.0
@@ -145,16 +144,78 @@ Connection URL over Service Discovery
 
    jdbc:subprotocol://<zookeeper quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi
 
-- zookeeper quorum is the corresponding zookeeper cluster configured by `kyuubi.ha.zookeeper.quorum` at the server side.
-- zooKeeperNamespace is  the corresponding namespace configured by `kyuubi.ha.zookeeper.namespace` at the server side.
+- zookeeper quorum is the corresponding zookeeper cluster configured by `kyuubi.ha.addresses` at the server side.
+- zooKeeperNamespace is  the corresponding namespace configured by `kyuubi.ha.namespace` at the server side.
 
-Authentication
---------------
+Kerberos Authentication
+-----------------------
+Since 1.6.0, Kyuubi JDBC driver implements the Kerberos authentication based on JAAS framework instead of `Hadoop UserGroupInformation`_,
+which means it does not forcibly rely on Hadoop dependencies to connect a kerberized Kyuubi Server.
 
+Kyuubi JDBC driver supports different approaches to connect a kerberized Kyuubi Server. First of all, please follow
+the `krb5.conf instruction`_ to setup ``krb5.conf`` properly.
 
-DataTypes
----------
+Authentication by Principal and Keytab
+**************************************
+
+.. versionadded:: 1.6.0
+
+.. tip::
+
+   It's the simplest way w/ minimal setup requirements for Kerberos authentication.
+
+It's straightforward to use principal and keytab for Kerberos authentication, just simply configure them in the JDBC URL.
+
+.. code-block::
+   jdbc:subprotocol://<zookeeper quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi
+   jdbc:kyuubi://host:port/schema;clientKeytab=<clientKeytab>;clientPrincipal=<clientPrincipal>;serverPrincipal=<serverPrincipal>
+
+- clientKeytab: path of Kerberos ``keytab`` file for client authentication
+- clientPrincipal: Kerberos ``principal`` for client authentication
+- serverPrincipal: Kerberos ``principal`` configured by `kyuubi.kinit.principal` at the server side. ``serverPrincipal`` is available
+  since 1.7.0, for previous versions, use ``principal`` instead.
+
+Authentication by Principal and TGT Cache
+*****************************************
+
+Another typical usage of Kerberos authentication is using `kinit` to generate the TGT cache first, then the application
+does Kerberos authentication through the TGT cache.
+
+.. code-block::
+
+   jdbc:kyuubi://host:port/schema;serverPrincipal=<serverPrincipal>
+
+Authentication by `Hadoop UserGroupInformation`_ ``doAs`` (programing only)
+***************************************************************************
+
+.. tip::
+
+  This approach allows project which already uses `Hadoop UserGroupInformation`_ for Kerberos authentication to easily
+  connect the kerberized Kyuubi Server. This approach does not work between [1.6.0, 1.7.0], and got fixed in 1.7.1.
+
+.. code-block::
+
+  String jdbcUrl = "jdbc:kyuubi://host:port/schema;serverPrincipal=<serverPrincipal>"
+  UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytab(clientPrincipal, clientKeytab);
+  ugi.doAs((PrivilegedExceptionAction<String>) () -> {
+    Connection conn = DriverManager.getConnection(jdbcUrl);
+    ...
+  });
+
+Authentication by Subject (programing only)
+*******************************************
+
+.. code-block:: java
+
+   String jdbcUrl = "jdbc:kyuubi://host:port/schema;serverPrincipal=<serverPrincipal>;kerberosAuthType=fromSubject"
+   Subject kerberizedSubject = ...;
+   Subject.doAs(kerberizedSubject, (PrivilegedExceptionAction<String>) () -> {
+     Connection conn = DriverManager.getConnection(jdbcUrl);
+     ...
+   });
 
 .. _Maven Central: https://mvnrepository.com/artifact/org.apache.kyuubi/kyuubi-hive-jdbc-shaded
 .. _JDBC Applications: ../bi_tools/index.html
 .. _java.sql.DriverManager: https://docs.oracle.com/javase/8/docs/api/java/sql/DriverManager.html
+.. _Hadoop UserGroupInformation: https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/security/UserGroupInformation.html
+.. _krb5.conf instruction: https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/KerberosReq.html
\ No newline at end of file
diff --git a/docs/extensions/server/authentication.rst b/docs/extensions/server/authentication.rst
index ab238040c..7a83b07c2 100644
--- a/docs/extensions/server/authentication.rst
+++ b/docs/extensions/server/authentication.rst
@@ -49,12 +49,12 @@ To create custom Authenticator class derived from the above interface, we need t
 
 - Referencing the library
 
-.. code-block:: xml
+.. parsed-literal::
 
    <dependency>
       <groupId>org.apache.kyuubi</groupId>
       <artifactId>kyuubi-common_2.12</artifactId>
-      <version>1.5.2-incubating</version>
+      <version>\ |release|\</version>
       <scope>provided</scope>
    </dependency>
 
diff --git a/docs/quick_start/quick_start_with_helm.md b/docs/quick_start/quick_start_with_helm.md
index a2de54445..0733a4de7 100644
--- a/docs/quick_start/quick_start_with_helm.md
+++ b/docs/quick_start/quick_start_with_helm.md
@@ -15,7 +15,7 @@
 - limitations under the License.
 -->
 
-# Getting Started With Kyuubi on Kubernetes
+# Getting Started with Helm
 
 ## Running Kyuubi with Helm
 
diff --git a/docs/quick_start/quick_start_with_jdbc.md b/docs/quick_start/quick_start_with_jdbc.md
index c22cc1b65..c40958191 100644
--- a/docs/quick_start/quick_start_with_jdbc.md
+++ b/docs/quick_start/quick_start_with_jdbc.md
@@ -15,82 +15,82 @@
 - limitations under the License.
 -->
 
-# Getting Started With Hive JDBC
+# Getting Started with Hive JDBC
 
-## How to install JDBC driver
+## How to get the Kyuubi JDBC driver
 
-Kyuubi JDBC driver is fully compatible with the 2.3.* version of hive JDBC driver, so we reuse hive JDBC driver to connect to Kyuubi server.
+Kyuubi Thrift API is fully compatible w/ HiveServer2, so technically, it allows to use any Hive JDBC driver to connect
+Kyuubi Server. But it's recommended to use [Kyuubi Hive JDBC driver](../client/jdbc/kyuubi_jdbc), which is forked from
+Hive 3.1.x JDBC driver, aims to support some missing functionalities of the original Hive JDBC driver.
 
-Add repository to your maven configuration file which may reside in `$MAVEN_HOME/conf/settings.xml`.
+The driver is available from Maven Central:
 
 ```xml
-<repositories>
-  <repository>
-    <id>central maven repo</id>
-    <name>central maven repo https</name>
-    <url>https://repo.maven.apache.org/maven2</url>
-  </repository>
-</repositories>
-```
-
-You can add below dependency to your `pom.xml` file in your application.
-
-```xml
-<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc -->
-<dependency>
-    <groupId>org.apache.hive</groupId>
-    <artifactId>hive-jdbc</artifactId>
-    <version>2.3.7</version>
-</dependency>
 <dependency>
-    <groupId>org.apache.hadoop</groupId>
-    <artifactId>hadoop-common</artifactId>
-    <!-- keep consistent with the build hadoop version -->
-    <version>2.7.4</version>
+    <groupId>org.apache.kyuubi</groupId>
+    <artifactId>kyuubi-hive-jdbc-shaded</artifactId>
+    <version>1.7.0</version>
 </dependency>
 ```
 
-## Use JDBC driver with kerberos
+## Connect to non-kerberized Kyuubi Server
 
 The below java code is using a keytab file to login and connect to Kyuubi server by JDBC.
 
 ```java
 package org.apache.kyuubi.examples;
   
-import java.io.IOException;
-import java.security.PrivilegedExceptionAction;
 import java.sql.*;
 
-import org.apache.hadoop.security.UserGroupInformation;
- 
-public class JDBCTest {
- 
-    private static String driverName = "org.apache.hive.jdbc.HiveDriver";
-    private static String kyuubiJdbcUrl = "jdbc:hive2://localhost:10009/default;";
- 
-    public static void main(String[] args) throws ClassNotFoundException, SQLException {
-        String principal = args[0]; // kerberos principal
-        String keytab = args[1]; // keytab file location
-        Configuration configuration = new Configuration();
-        configuration.set(HADOOP_SECURITY_AUTHENTICATION, "kerberos");
-        UserGroupInformation.setConfiguration(configuration);
-        UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab);
- 
-        Class.forName(driverName);
-        Connection conn = ugi.doAs(new PrivilegedExceptionAction<Connection>(){
-            public Connection run() throws SQLException {
-                return DriverManager.getConnection(kyuubiJdbcUrl);
-            }
-        });
-        Statement st = conn.createStatement();
-        ResultSet res = st.executeQuery("show databases");
-        while (res.next()) {
-            System.out.println(res.getString(1));
+public class KyuubiJDBC {
+
+  private static String driverName = "org.apache.kyuubi.jdbc.KyuubiHiveDriver";
+  private static String kyuubiJdbcUrl = "jdbc:kyuubi://localhost:10009/default;";
+
+  public static void main(String[] args) throws SQLException {
+    try (Connection conn = DriverManager.getConnection(kyuubiJdbcUrl)) {
+      try (Statement stmt = conn.createStatement()) {
+        try (ResultSet rs = st.executeQuery("show databases")) {
+          while (rs.next()) {
+            System.out.println(rs.getString(1));
+          }
+        }   
+      }
+    }
+  }
+}
+```
+
+## Connect to Kerberized Kyuubi Server
+
+The following Java code uses a keytab file to login and connect to Kyuubi Server by JDBC.
+
+```java
+package org.apache.kyuubi.examples;
+
+import java.sql.*;
+
+public class KyuubiJDBCDemo {
+
+  private static String driverName = "org.apache.kyuubi.jdbc.KyuubiHiveDriver";
+  private static String kyuubiJdbcUrlTemplate = "jdbc:kyuubi://localhost:10009/default;" +
+          "clientPrincipal=%s;clientKeytab=%s;serverPrincipal=%s";
+
+  public static void main(String[] args) throws SQLException {
+    String clientPrincipal = args[0]; // Kerberos principal
+    String clientKeytab = args[1];    // Keytab file location
+    String serverPrincipal = arg[2];  // Kerberos principal used by Kyuubi Server
+    String kyuubiJdbcUrl = String.format(kyuubiJdbcUrl, clientPrincipal, clientKeytab, serverPrincipal);
+    try (Connection conn = DriverManager.getConnection(kyuubiJdbcUrl)) {
+      try (Statement stmt = conn.createStatement()) {
+        try (ResultSet rs = st.executeQuery("show databases")) {
+          while (rs.next()) {
+            System.out.println(rs.getString(1));
+          }
         }
-        res.close();
-        st.close();
-        conn.close();
+      }
     }
+  }
 }
 ```
 
diff --git a/docs/quick_start/quick_start_with_jupyter.md b/docs/quick_start/quick_start_with_jupyter.md
index 44b3faa57..608da9284 100644
--- a/docs/quick_start/quick_start_with_jupyter.md
+++ b/docs/quick_start/quick_start_with_jupyter.md
@@ -15,5 +15,5 @@
 - limitations under the License.
 -->
 
-# Getting Started With Hive Jupyter Lap
+# Getting Started with Jupyter Lap