You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by zj...@apache.org on 2021/08/24 12:20:49 UTC

[zeppelin] branch branch-0.10 updated: [ZEPPELIN-5492] Update doc for jdbc interpreter

This is an automated email from the ASF dual-hosted git repository.

zjffdu pushed a commit to branch branch-0.10
in repository https://gitbox.apache.org/repos/asf/zeppelin.git


The following commit(s) were added to refs/heads/branch-0.10 by this push:
     new c275f15  [ZEPPELIN-5492] Update doc for jdbc interpreter
c275f15 is described below

commit c275f150609f6dd763d8f1de5c6241d6790bc09c
Author: Jeff Zhang <zj...@apache.org>
AuthorDate: Mon Aug 23 10:12:04 2021 +0800

    [ZEPPELIN-5492] Update doc for jdbc interpreter
    
    ### What is this PR for?
    
    Update jdbc interpreter doc:
    * Update the hive section
    * Add running sql continuously
    * Add presto/trino
    * Add impala
    
    ### What type of PR is it?
    [ Improvement ]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELIN-5492
    
    ### How should this be tested?
    * CI pass
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No
    
    Author: Jeff Zhang <zj...@apache.org>
    
    Closes #4206 from zjffdu/ZEPPELIN-5492 and squashes the following commits:
    
    55e7e058e8 [Jeff Zhang] [ZEPPELIN-5492] Update doc for jdbc interpreter
    
    (cherry picked from commit 9a18f63d466cc0b7529e30ff2d8069e805fdc1fa)
    Signed-off-by: Jeff Zhang <zj...@apache.org>
---
 .../themes/zeppelin/img/docs-img/jdbc_refresh.gif  | Bin 0 -> 370620 bytes
 docs/interpreter/jdbc.md                           | 245 +++++++++++++++++----
 docs/quickstart/sql_with_zeppelin.md               |   5 +-
 3 files changed, 205 insertions(+), 45 deletions(-)

diff --git a/docs/assets/themes/zeppelin/img/docs-img/jdbc_refresh.gif b/docs/assets/themes/zeppelin/img/docs-img/jdbc_refresh.gif
new file mode 100644
index 0000000..e891ceb
Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/jdbc_refresh.gif differ
diff --git a/docs/interpreter/jdbc.md b/docs/interpreter/jdbc.md
index d556160..081e64f 100644
--- a/docs/interpreter/jdbc.md
+++ b/docs/interpreter/jdbc.md
@@ -1,7 +1,7 @@
 ---
 layout: page
 title: "Generic JDBC Interpreter for Apache Zeppelin"
-description: "Generic JDBC Interpreter lets you create a JDBC connection to any data source. You can use Postgres, MySql, MariaDB, Redshift, Apache Hive, Apache Phoenix, Apache Drill and Apache Tajo using JDBC interpreter."
+description: "Generic JDBC Interpreter lets you create a JDBC connection to any data source. You can use Postgres, MySql, MariaDB, Redshift, Apache Hive, Presto/Trino, Impala, Apache Phoenix, Apache Drill and Apache Tajo using JDBC interpreter."
 group: interpreter
 ---
 <!--
@@ -57,6 +57,14 @@ By now, it has been tested with:
       <a href="https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC" target="_blank">JDBC Driver</a>
     </li>
     <li style="padding-bottom: 5px; list-style: circle">
+      <a href="https://trino.io/" target="_blank">Presto/Trino</a> - 
+      <a href="https://trino.io/docs/current/installation/jdbc.html" target="_blank">JDBC Driver</a>
+    </li>
+    <li style="padding-bottom: 5px; list-style: circle">
+      <a href="https://impala.apache.org/" target="_blank">Impala</a> - 
+      <a href="https://impala.apache.org/docs/build/html/topics/impala_jdbc.html" target="_blank">JDBC Driver</a>
+    </li>
+    <li style="padding-bottom: 5px; list-style: circle">
       <a href="https://phoenix.apache.org/" target="_blank">Apache Phoenix</a> itself is a JDBC driver
     </li>
     <li style="padding-bottom: 5px; list-style: circle">
@@ -154,7 +162,7 @@ The last step is **Dependency Setting**. Since Zeppelin only includes `PostgreSQ
 
 <img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/edit_dependencies.png" width="600px" />
 
-That's it. You can find more JDBC connection setting examples([Mysql](#mysql), [MariaDB](#mariadb), [Redshift](#redshift), [Apache Hive](#apache-hive), [Apache Phoenix](#apache-phoenix), and [Apache Tajo](#apache-tajo)) in [this section](#examples).
+That's it. You can find more JDBC connection setting examples([Mysql](#mysql), [MariaDB](#mariadb), [Redshift](#redshift), [Apache Hive](#apache-hive), [Presto/Trino](#prestotrino), [Impala](#impala), [Apache Phoenix](#apache-phoenix), and [Apache Tajo](#apache-tajo)) in [this section](#examples).
 
 ## JDBC Interpreter Datasource Pool Configuration
 The Jdbc interpreter uses the connection pool technology, and supports users to do some personal configuration of the connection pool. For example, we can configure `default.validationQuery='select 1'` and `default.testOnBorrow=true` in the Interpreter configuration to avoid the "Invalid SessionHandle" runtime error caused by Session timeout when connecting to HiveServer2 through JDBC interpreter.
@@ -280,23 +288,13 @@ For example, if a connection needs a schema parameter, it would have to add the
   </tr>
 </table>
 
-## Binding JDBC interpter to notebook
-To bind the interpreters created in the interpreter setting page, click the gear icon at the top-right corner.
-
-<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/click_interpreter_binding_button.png" width="600px" />
-
-Select(blue) or deselect(white) the interpreter buttons depending on your use cases.
-If you need to use more than one interpreter in the notebook, activate several buttons.
-Don't forget to click `Save` button, or you will face `Interpreter *** is not found` error.
-
-<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/jdbc_interpreter_binding.png" width="550px" />
-
 ## How to use
 ### Run the paragraph with JDBC interpreter
 To test whether your databases and Zeppelin are successfully connected or not, type `%jdbc_interpreter_name`(e.g. `%mysql`) at the top of the paragraph and run `show databases`.
 
 ```sql
 %jdbc_interpreter_name
+
 show databases
 ```
 If the paragraph is `FINISHED` without any errors, a new paragraph will be automatically added after the previous one with `%jdbc_interpreter_name`.
@@ -310,6 +308,7 @@ So you don't need to type this prefix in every paragraphs' header.
 You can write multiple sql statements in one paragraph, just separate them with semi-colon. e.g
 
 ```sql
+%jdbc_interpreter_name
 
 USE zeppelin_demo;
 
@@ -317,16 +316,42 @@ CREATE TABLE pet (name VARCHAR(20), owner VARCHAR(20),
        species VARCHAR(20), sex CHAR(1), birth DATE, death DATE);
 ```
 
-### Apply Zeppelin Dynamic Forms
+### SQL Comment
 
-You can leverage [Zeppelin Dynamic Form](../usage/dynamic_form/intro.html) inside your queries. You can use both the `text input` and `select form` parametrization features.
+2 kinds of SQL comments are supported:
+
+* Single line comment start with `--`
+* Multiple line comment around with `/*  ... */`
 
 ```sql
+
 %jdbc_interpreter_name
-SELECT name, country, performer
-FROM demo.performers
-WHERE name='${performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}'
+
+-- single line comment
+show tables;
+/* multiple 
+   line 
+   comment
+ */
+select * from test_1;
+
 ```
+
+
+### Apply Zeppelin Dynamic Forms
+
+You can leverage [Zeppelin Dynamic Form](../usage/dynamic_form/intro.html) inside your queries. You can use both the `text input` and `select form` parametrization features.
+
+### Run SQL Continuously
+
+By default, sql statements in one paragraph are executed only once. But you can run it continuously by specifying local property `refreshInterval` (unit: milli-seconds),
+So that the sql statements are executed every interval of `refreshInterval` milli-seconds. This is useful when your data in database is updated continuously by external system,
+and you can build dynamic dashboard in Zeppelin via this approach.
+
+e.g. Here we query the mysql which is updated continuously by other external system.
+
+<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/jdbc_refresh.gif" width="800px" />
+
 ### Usage *precode*
 You can set *precode* for each data source. Code runs once while opening the connection.
 
@@ -358,24 +383,27 @@ An example settings of interpreter for the two data sources, each of which has i
     <td>default.precode</td>
     <td>set search_path='test_path'</td>
   </tr>
+</table>
+
+<table class="table-configuration">
   <tr>
-    <td>mysql.driver</td>
+    <td>default.driver</td>
     <td>com.mysql.jdbc.Driver</td>
   </tr>
   <tr>
-    <td>mysql.password</td>
+    <td>default.password</td>
     <td>1</td>
   </tr>
   <tr>
-    <td>mysql.url</td>
+    <td>default.url</td>
     <td>jdbc:mysql://localhost:3306/</td>
   </tr>
   <tr>
-    <td>mysql.user</td>
+    <td>default.user</td>
     <td>root</td>
   </tr>
   <tr>
-    <td>mysql.precode</td>
+    <td>default.precode</td>
     <td>set @v=12</td>
   </tr>
 </table>
@@ -385,16 +413,18 @@ Test of execution *precode* for each data source.
 
 ```sql
 %jdbc
+
 show search_path
 ```
-Returns value of `search_path` which is set in the *default.precode*.
+Returns value of `search_path` which is set in the default jdbc (use postgresql) interpreter's *default.precode*.
 
 
 ```sql
 %mysql
+
 select @v
 ```
-Returns value of `v` which is set in the *mysql.precode*.
+Returns value of `v` which is set in the mysql interpreter's *default.precode*.
 
 
 ## Examples
@@ -578,9 +608,18 @@ Here are some examples you can refer to. Including the below connectors, you can
 
 ### Apache Hive
 
-<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/hive_setting.png" width="600px" />
+Zeppelin just connect to `hiveserver2` to run hive sql via hive jdbc. There are 2 cases of connecting with Hive:
+
+* Connect to Hive without KERBEROS
+* Connect to Hive with KERBEROS
+
+Each case requires different settings.
+
+##### Connect to Hive without KERBEROS
+
+In this scenario, you need to make the following settings at least. By default, hive job run as user of `default.user`.
+Refer [impersonation](../interpreter/jdbc.html#impersonation) if you want hive job run as the Zeppelin login user when authentication is enabled.
 
-##### Properties
 <table class="table-configuration">
   <tr>
     <th>Name</th>
@@ -598,46 +637,81 @@ Here are some examples you can refer to. Including the below connectors, you can
     <td>default.user</td>
     <td>hive_user</td>
   </tr>
+</table>
+
+
+<table class="table-configuration">
   <tr>
-    <td>default.password</td>
-    <td>hive_password</td>
+    <th>Artifact</th>
+    <th>Excludes</th>
   </tr>
   <tr>
-    <td>default.proxy.user.property</td>
-    <td>Example value: hive.server2.proxy.user</td>
+    <td>org.apache.hive:hive-jdbc:2.3.4</td>
+    <td></td>
+  </tr>
+</table>
+
+
+##### Connect to Hive with KERBEROS
+
+In this scenario, you need to make the following settings at least. By default, hive job run as user of client principal (`zeppelin.jdbc.principal`).
+Refer [impersonation](../interpreter/jdbc.html#impersonation) if you want hive job run as the Zeppelin login user when authentication is enabled.
+
+<table class="table-configuration">
+  <tr>
+    <th>Name</th>
+    <th>Value</th>
+  </tr>
+  <tr>
+    <td>default.driver</td>
+    <td>org.apache.hive.jdbc.HiveDriver</td>
+  </tr>
+  <tr>
+    <td>default.url</td>
+    <td>jdbc:hive2://emr-header-1:10000/default;principal={hive_server2_principal}</td>
+  </tr>
+  <tr>
+    <td>zeppelin.jdbc.auth.type</td>
+    <td>KERBEROS</td>
+  </tr>
+  <tr>
+    <td>zeppelin.jdbc.keytab.location</td>
+    <td>keytab of client</td>
+  </tr>
+  <tr>
+    <td>zeppelin.jdbc.principal</td>
+    <td>principal of client</td>
   </tr>
 </table>
 
-[Apache Hive 1 JDBC Driver Docs](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC)
-[Apache Hive 2 JDBC Driver Docs](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC)
 
-##### Dependencies
 <table class="table-configuration">
   <tr>
     <th>Artifact</th>
     <th>Excludes</th>
   </tr>
   <tr>
-    <td>org.apache.hive:hive-jdbc:0.14.0</td>
+    <td>org.apache.hive:hive-jdbc:2.3.4</td>
     <td></td>
   </tr>
   <tr>
-    <td>org.apache.hadoop:hadoop-common:2.6.0</td>
+    <td>org.apache.hive:hive-exec:2.3.4</td>
     <td></td>
   </tr>
 </table>
 
+
 [Maven Repository : org.apache.hive:hive-jdbc](https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc)
 
 ##### Impersonation
+
 When Zeppelin server is running with authentication enabled, then the interpreter can utilize Hive's user proxy feature 
 i.e. send extra parameter for creating and running a session ("hive.server2.proxy.user=": "${loggedInUser}"). 
 This is particularly useful when multiple users are sharing a notebook.
 
 To enable this set following:
 
-  - `zeppelin.jdbc.auth.type` as `SIMPLE` or `KERBEROS` (if required) in the interpreter setting.
-  - `${prefix}.proxy.user.property` as `hive.server2.proxy.user`
+  - `default.proxy.user.property` as `hive.server2.proxy.user`
   
 See [User Impersonation in interpreter](../usage/interpreter/user_impersonation.html) for more information.
 
@@ -652,10 +726,6 @@ See [User Impersonation in interpreter](../usage/interpreter/user_impersonation.
     <td>org.apache.hive.jdbc.HiveDriver</td>
   </tr>
   <tr>
-    <td>hive.password</td>
-    <td></td>
-  </tr>
-  <tr>
     <td>hive.url</td>
     <td>jdbc:hive2://hive-server-host:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2</td>
   </tr>
@@ -669,6 +739,91 @@ See [User Impersonation in interpreter](../usage/interpreter/user_impersonation.
   </tr>
 </table>
 
+### Presto/Trino
+
+Properties
+
+<table class="table-configuration">
+  <tr>
+    <th>Name</th>
+    <th>Value</th>
+  </tr>
+  <tr>
+    <td>default.driver</td>
+    <td>io.prestosql.jdbc.PrestoDriver</td>
+  </tr>
+  <tr>
+    <td>default.url</td>
+    <td>jdbc:presto://presto-server:9090/hive</td>
+  </tr>
+  <tr>
+    <td>default.user</td>
+    <td>presto_user</td>
+  </tr>
+</table>
+
+[Trino JDBC Driver Docs](https://trino.io/docs/current/installation/jdbc.html) <br/>
+[Presto JDBC Driver Docs](https://prestodb.io/docs/current/installation/jdbc.html)
+
+Dependencies
+
+<table class="table-configuration">
+  <tr>
+    <th>Artifact</th>
+    <th>Excludes</th>
+  </tr>
+  <tr>
+    <td>io.prestosql:presto-jdbc:350</td>
+    <td></td>
+  </tr>
+</table>
+
+### Impala
+
+Properties
+
+<table class="table-configuration">
+  <tr>
+    <th>Name</th>
+    <th>Value</th>
+  </tr>
+  <tr>
+    <td>default.driver</td>
+    <td>org.apache.hive.jdbc.HiveDriver</td>
+  </tr>
+  <tr>
+    <td>default.url</td>
+    <td>jdbc:hive2://emr-header-1.cluster-47080:21050/;auth=noSasl</td>
+  </tr>
+</table>
+
+Dependencies
+
+<table class="table-configuration">
+  <tr>
+    <th>Artifact</th>
+    <th>Excludes</th>
+  </tr>
+  <tr>
+    <td>org.apache.hive:hive-jdbc:2.3.4</td>
+    <td></td>
+  </tr>
+</table>
+
+[Impala JDBC Driver Docs](https://impala.apache.org/docs/build/html/topics/impala_jdbc.html)
+
+Dependencies
+
+<table class="table-configuration">
+  <tr>
+    <th>Artifact</th>
+    <th>Excludes</th>
+  </tr>
+  <tr>
+    <td>io.prestosql:presto-jdbc:350</td>
+    <td></td>
+  </tr>
+</table>
 
 ### Apache Phoenix
 
@@ -813,17 +968,18 @@ Before Adding one of the below dependencies, check the Phoenix version first.
 The JDBC interpreter also supports interpolation of `ZeppelinContext` objects into the paragraph text.
 The following example shows one use of this facility:
 
-####In Scala cell:
+#### In Scala cell:
 
 ```scala
 z.put("country_code", "KR")
     // ...
 ```
 
-####In later JDBC cell:
+#### In later JDBC cell:
 
 ```sql
 %jdbc_interpreter_name
+
 select * from patents_list where 
 priority_country = '{country_code}' and filing_date like '2015-%'
 ```
@@ -834,4 +990,5 @@ More details of this feature can be found in the Spark interpreter documentation
 [Zeppelin-Context](../usage/other_features/zeppelin_context.html)
 
 ## Bug reporting
+
 If you find a bug using JDBC interpreter, please create a [JIRA](https://issues.apache.org/jira/browse/ZEPPELIN) ticket.
diff --git a/docs/quickstart/sql_with_zeppelin.md b/docs/quickstart/sql_with_zeppelin.md
index e007f20..da709d5 100644
--- a/docs/quickstart/sql_with_zeppelin.md
+++ b/docs/quickstart/sql_with_zeppelin.md
@@ -33,6 +33,8 @@ The following guides explain how to use Apache Zeppelin that enables you to writ
   * [MariaDB](../interpreter/jdbc.html#mariadb)
   * [AWS Redshift](../interpreter/jdbc.html#redshift) 
   * [Apache Hive](../interpreter/jdbc.html#apache-hive)
+  * [Presto/Trino](../interpreter/jdbc.html#prestotrino)
+  * [Impala](../interpreter/jdbc.html#impala)
   * [Apache Phoenix](../interpreter/jdbc.html#apache-phoenix) 
   * [Apache Drill](../interpreter/jdbc.html#apache-drill)
   * [Apache Tajo](../interpreter/jdbc.html#apache-tajo)
@@ -43,7 +45,8 @@ The following guides explain how to use Apache Zeppelin that enables you to writ
 - can create query result including **UI widgets** using [Dynamic Form](../usage/dynamic_form/intro.html)
 
     ```sql
-    %sql 
+    %sql
+    
     select age, count(1) value 
     from bank 
     where age < ${maxAge=30}