You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by zj...@apache.org on 2021/08/24 12:20:49 UTC
[zeppelin] branch branch-0.10 updated: [ZEPPELIN-5492] Update doc
for jdbc interpreter
This is an automated email from the ASF dual-hosted git repository.
zjffdu pushed a commit to branch branch-0.10
in repository https://gitbox.apache.org/repos/asf/zeppelin.git
The following commit(s) were added to refs/heads/branch-0.10 by this push:
new c275f15 [ZEPPELIN-5492] Update doc for jdbc interpreter
c275f15 is described below
commit c275f150609f6dd763d8f1de5c6241d6790bc09c
Author: Jeff Zhang <zj...@apache.org>
AuthorDate: Mon Aug 23 10:12:04 2021 +0800
[ZEPPELIN-5492] Update doc for jdbc interpreter
### What is this PR for?
Update jdbc interpreter doc:
* Update the hive section
* Add running sql continuously
* Add presto/trino
* Add impala
### What type of PR is it?
[ Improvement ]
### Todos
* [ ] - Task
### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-5492
### How should this be tested?
* CI pass
### Screenshots (if appropriate)
### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No
Author: Jeff Zhang <zj...@apache.org>
Closes #4206 from zjffdu/ZEPPELIN-5492 and squashes the following commits:
55e7e058e8 [Jeff Zhang] [ZEPPELIN-5492] Update doc for jdbc interpreter
(cherry picked from commit 9a18f63d466cc0b7529e30ff2d8069e805fdc1fa)
Signed-off-by: Jeff Zhang <zj...@apache.org>
---
.../themes/zeppelin/img/docs-img/jdbc_refresh.gif | Bin 0 -> 370620 bytes
docs/interpreter/jdbc.md | 245 +++++++++++++++++----
docs/quickstart/sql_with_zeppelin.md | 5 +-
3 files changed, 205 insertions(+), 45 deletions(-)
diff --git a/docs/assets/themes/zeppelin/img/docs-img/jdbc_refresh.gif b/docs/assets/themes/zeppelin/img/docs-img/jdbc_refresh.gif
new file mode 100644
index 0000000..e891ceb
Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/jdbc_refresh.gif differ
diff --git a/docs/interpreter/jdbc.md b/docs/interpreter/jdbc.md
index d556160..081e64f 100644
--- a/docs/interpreter/jdbc.md
+++ b/docs/interpreter/jdbc.md
@@ -1,7 +1,7 @@
---
layout: page
title: "Generic JDBC Interpreter for Apache Zeppelin"
-description: "Generic JDBC Interpreter lets you create a JDBC connection to any data source. You can use Postgres, MySql, MariaDB, Redshift, Apache Hive, Apache Phoenix, Apache Drill and Apache Tajo using JDBC interpreter."
+description: "Generic JDBC Interpreter lets you create a JDBC connection to any data source. You can use Postgres, MySql, MariaDB, Redshift, Apache Hive, Presto/Trino, Impala, Apache Phoenix, Apache Drill and Apache Tajo using JDBC interpreter."
group: interpreter
---
<!--
@@ -57,6 +57,14 @@ By now, it has been tested with:
<a href="https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC" target="_blank">JDBC Driver</a>
</li>
<li style="padding-bottom: 5px; list-style: circle">
+ <a href="https://trino.io/" target="_blank">Presto/Trino</a> -
+ <a href="https://trino.io/docs/current/installation/jdbc.html" target="_blank">JDBC Driver</a>
+ </li>
+ <li style="padding-bottom: 5px; list-style: circle">
+ <a href="https://impala.apache.org/" target="_blank">Impala</a> -
+ <a href="https://impala.apache.org/docs/build/html/topics/impala_jdbc.html" target="_blank">JDBC Driver</a>
+ </li>
+ <li style="padding-bottom: 5px; list-style: circle">
<a href="https://phoenix.apache.org/" target="_blank">Apache Phoenix</a> itself is a JDBC driver
</li>
<li style="padding-bottom: 5px; list-style: circle">
@@ -154,7 +162,7 @@ The last step is **Dependency Setting**. Since Zeppelin only includes `PostgreSQ
<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/edit_dependencies.png" width="600px" />
-That's it. You can find more JDBC connection setting examples([Mysql](#mysql), [MariaDB](#mariadb), [Redshift](#redshift), [Apache Hive](#apache-hive), [Apache Phoenix](#apache-phoenix), and [Apache Tajo](#apache-tajo)) in [this section](#examples).
+That's it. You can find more JDBC connection setting examples([Mysql](#mysql), [MariaDB](#mariadb), [Redshift](#redshift), [Apache Hive](#apache-hive), [Presto/Trino](#prestotrino), [Impala](#impala), [Apache Phoenix](#apache-phoenix), and [Apache Tajo](#apache-tajo)) in [this section](#examples).
## JDBC Interpreter Datasource Pool Configuration
The Jdbc interpreter uses the connection pool technology, and supports users to do some personal configuration of the connection pool. For example, we can configure `default.validationQuery='select 1'` and `default.testOnBorrow=true` in the Interpreter configuration to avoid the "Invalid SessionHandle" runtime error caused by Session timeout when connecting to HiveServer2 through JDBC interpreter.
@@ -280,23 +288,13 @@ For example, if a connection needs a schema parameter, it would have to add the
</tr>
</table>
-## Binding JDBC interpter to notebook
-To bind the interpreters created in the interpreter setting page, click the gear icon at the top-right corner.
-
-<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/click_interpreter_binding_button.png" width="600px" />
-
-Select(blue) or deselect(white) the interpreter buttons depending on your use cases.
-If you need to use more than one interpreter in the notebook, activate several buttons.
-Don't forget to click `Save` button, or you will face `Interpreter *** is not found` error.
-
-<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/jdbc_interpreter_binding.png" width="550px" />
-
## How to use
### Run the paragraph with JDBC interpreter
To test whether your databases and Zeppelin are successfully connected or not, type `%jdbc_interpreter_name`(e.g. `%mysql`) at the top of the paragraph and run `show databases`.
```sql
%jdbc_interpreter_name
+
show databases
```
If the paragraph is `FINISHED` without any errors, a new paragraph will be automatically added after the previous one with `%jdbc_interpreter_name`.
@@ -310,6 +308,7 @@ So you don't need to type this prefix in every paragraphs' header.
You can write multiple sql statements in one paragraph, just separate them with semi-colon. e.g
```sql
+%jdbc_interpreter_name
USE zeppelin_demo;
@@ -317,16 +316,42 @@ CREATE TABLE pet (name VARCHAR(20), owner VARCHAR(20),
species VARCHAR(20), sex CHAR(1), birth DATE, death DATE);
```
-### Apply Zeppelin Dynamic Forms
+### SQL Comment
-You can leverage [Zeppelin Dynamic Form](../usage/dynamic_form/intro.html) inside your queries. You can use both the `text input` and `select form` parametrization features.
+2 kinds of SQL comments are supported:
+
+* Single line comment start with `--`
+* Multiple line comment around with `/* ... */`
```sql
+
%jdbc_interpreter_name
-SELECT name, country, performer
-FROM demo.performers
-WHERE name='${performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}'
+
+-- single line comment
+show tables;
+/* multiple
+ line
+ comment
+ */
+select * from test_1;
+
```
+
+
+### Apply Zeppelin Dynamic Forms
+
+You can leverage [Zeppelin Dynamic Form](../usage/dynamic_form/intro.html) inside your queries. You can use both the `text input` and `select form` parametrization features.
+
+### Run SQL Continuously
+
+By default, sql statements in one paragraph are executed only once. But you can run it continuously by specifying local property `refreshInterval` (unit: milli-seconds),
+So that the sql statements are executed every interval of `refreshInterval` milli-seconds. This is useful when your data in database is updated continuously by external system,
+and you can build dynamic dashboard in Zeppelin via this approach.
+
+e.g. Here we query the mysql which is updated continuously by other external system.
+
+<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/jdbc_refresh.gif" width="800px" />
+
### Usage *precode*
You can set *precode* for each data source. Code runs once while opening the connection.
@@ -358,24 +383,27 @@ An example settings of interpreter for the two data sources, each of which has i
<td>default.precode</td>
<td>set search_path='test_path'</td>
</tr>
+</table>
+
+<table class="table-configuration">
<tr>
- <td>mysql.driver</td>
+ <td>default.driver</td>
<td>com.mysql.jdbc.Driver</td>
</tr>
<tr>
- <td>mysql.password</td>
+ <td>default.password</td>
<td>1</td>
</tr>
<tr>
- <td>mysql.url</td>
+ <td>default.url</td>
<td>jdbc:mysql://localhost:3306/</td>
</tr>
<tr>
- <td>mysql.user</td>
+ <td>default.user</td>
<td>root</td>
</tr>
<tr>
- <td>mysql.precode</td>
+ <td>default.precode</td>
<td>set @v=12</td>
</tr>
</table>
@@ -385,16 +413,18 @@ Test of execution *precode* for each data source.
```sql
%jdbc
+
show search_path
```
-Returns value of `search_path` which is set in the *default.precode*.
+Returns value of `search_path` which is set in the default jdbc (use postgresql) interpreter's *default.precode*.
```sql
%mysql
+
select @v
```
-Returns value of `v` which is set in the *mysql.precode*.
+Returns value of `v` which is set in the mysql interpreter's *default.precode*.
## Examples
@@ -578,9 +608,18 @@ Here are some examples you can refer to. Including the below connectors, you can
### Apache Hive
-<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/hive_setting.png" width="600px" />
+Zeppelin just connect to `hiveserver2` to run hive sql via hive jdbc. There are 2 cases of connecting with Hive:
+
+* Connect to Hive without KERBEROS
+* Connect to Hive with KERBEROS
+
+Each case requires different settings.
+
+##### Connect to Hive without KERBEROS
+
+In this scenario, you need to make the following settings at least. By default, hive job run as user of `default.user`.
+Refer [impersonation](../interpreter/jdbc.html#impersonation) if you want hive job run as the Zeppelin login user when authentication is enabled.
-##### Properties
<table class="table-configuration">
<tr>
<th>Name</th>
@@ -598,46 +637,81 @@ Here are some examples you can refer to. Including the below connectors, you can
<td>default.user</td>
<td>hive_user</td>
</tr>
+</table>
+
+
+<table class="table-configuration">
<tr>
- <td>default.password</td>
- <td>hive_password</td>
+ <th>Artifact</th>
+ <th>Excludes</th>
</tr>
<tr>
- <td>default.proxy.user.property</td>
- <td>Example value: hive.server2.proxy.user</td>
+ <td>org.apache.hive:hive-jdbc:2.3.4</td>
+ <td></td>
+ </tr>
+</table>
+
+
+##### Connect to Hive with KERBEROS
+
+In this scenario, you need to make the following settings at least. By default, hive job run as user of client principal (`zeppelin.jdbc.principal`).
+Refer [impersonation](../interpreter/jdbc.html#impersonation) if you want hive job run as the Zeppelin login user when authentication is enabled.
+
+<table class="table-configuration">
+ <tr>
+ <th>Name</th>
+ <th>Value</th>
+ </tr>
+ <tr>
+ <td>default.driver</td>
+ <td>org.apache.hive.jdbc.HiveDriver</td>
+ </tr>
+ <tr>
+ <td>default.url</td>
+ <td>jdbc:hive2://emr-header-1:10000/default;principal={hive_server2_principal}</td>
+ </tr>
+ <tr>
+ <td>zeppelin.jdbc.auth.type</td>
+ <td>KERBEROS</td>
+ </tr>
+ <tr>
+ <td>zeppelin.jdbc.keytab.location</td>
+ <td>keytab of client</td>
+ </tr>
+ <tr>
+ <td>zeppelin.jdbc.principal</td>
+ <td>principal of client</td>
</tr>
</table>
-[Apache Hive 1 JDBC Driver Docs](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC)
-[Apache Hive 2 JDBC Driver Docs](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC)
-##### Dependencies
<table class="table-configuration">
<tr>
<th>Artifact</th>
<th>Excludes</th>
</tr>
<tr>
- <td>org.apache.hive:hive-jdbc:0.14.0</td>
+ <td>org.apache.hive:hive-jdbc:2.3.4</td>
<td></td>
</tr>
<tr>
- <td>org.apache.hadoop:hadoop-common:2.6.0</td>
+ <td>org.apache.hive:hive-exec:2.3.4</td>
<td></td>
</tr>
</table>
+
[Maven Repository : org.apache.hive:hive-jdbc](https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc)
##### Impersonation
+
When Zeppelin server is running with authentication enabled, then the interpreter can utilize Hive's user proxy feature
i.e. send extra parameter for creating and running a session ("hive.server2.proxy.user=": "${loggedInUser}").
This is particularly useful when multiple users are sharing a notebook.
To enable this set following:
- - `zeppelin.jdbc.auth.type` as `SIMPLE` or `KERBEROS` (if required) in the interpreter setting.
- - `${prefix}.proxy.user.property` as `hive.server2.proxy.user`
+ - `default.proxy.user.property` as `hive.server2.proxy.user`
See [User Impersonation in interpreter](../usage/interpreter/user_impersonation.html) for more information.
@@ -652,10 +726,6 @@ See [User Impersonation in interpreter](../usage/interpreter/user_impersonation.
<td>org.apache.hive.jdbc.HiveDriver</td>
</tr>
<tr>
- <td>hive.password</td>
- <td></td>
- </tr>
- <tr>
<td>hive.url</td>
<td>jdbc:hive2://hive-server-host:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2</td>
</tr>
@@ -669,6 +739,91 @@ See [User Impersonation in interpreter](../usage/interpreter/user_impersonation.
</tr>
</table>
+### Presto/Trino
+
+Properties
+
+<table class="table-configuration">
+ <tr>
+ <th>Name</th>
+ <th>Value</th>
+ </tr>
+ <tr>
+ <td>default.driver</td>
+ <td>io.prestosql.jdbc.PrestoDriver</td>
+ </tr>
+ <tr>
+ <td>default.url</td>
+ <td>jdbc:presto://presto-server:9090/hive</td>
+ </tr>
+ <tr>
+ <td>default.user</td>
+ <td>presto_user</td>
+ </tr>
+</table>
+
+[Trino JDBC Driver Docs](https://trino.io/docs/current/installation/jdbc.html) <br/>
+[Presto JDBC Driver Docs](https://prestodb.io/docs/current/installation/jdbc.html)
+
+Dependencies
+
+<table class="table-configuration">
+ <tr>
+ <th>Artifact</th>
+ <th>Excludes</th>
+ </tr>
+ <tr>
+ <td>io.prestosql:presto-jdbc:350</td>
+ <td></td>
+ </tr>
+</table>
+
+### Impala
+
+Properties
+
+<table class="table-configuration">
+ <tr>
+ <th>Name</th>
+ <th>Value</th>
+ </tr>
+ <tr>
+ <td>default.driver</td>
+ <td>org.apache.hive.jdbc.HiveDriver</td>
+ </tr>
+ <tr>
+ <td>default.url</td>
+ <td>jdbc:hive2://emr-header-1.cluster-47080:21050/;auth=noSasl</td>
+ </tr>
+</table>
+
+Dependencies
+
+<table class="table-configuration">
+ <tr>
+ <th>Artifact</th>
+ <th>Excludes</th>
+ </tr>
+ <tr>
+ <td>org.apache.hive:hive-jdbc:2.3.4</td>
+ <td></td>
+ </tr>
+</table>
+
+[Impala JDBC Driver Docs](https://impala.apache.org/docs/build/html/topics/impala_jdbc.html)
+
+Dependencies
+
+<table class="table-configuration">
+ <tr>
+ <th>Artifact</th>
+ <th>Excludes</th>
+ </tr>
+ <tr>
+ <td>io.prestosql:presto-jdbc:350</td>
+ <td></td>
+ </tr>
+</table>
### Apache Phoenix
@@ -813,17 +968,18 @@ Before Adding one of the below dependencies, check the Phoenix version first.
The JDBC interpreter also supports interpolation of `ZeppelinContext` objects into the paragraph text.
The following example shows one use of this facility:
-####In Scala cell:
+#### In Scala cell:
```scala
z.put("country_code", "KR")
// ...
```
-####In later JDBC cell:
+#### In later JDBC cell:
```sql
%jdbc_interpreter_name
+
select * from patents_list where
priority_country = '{country_code}' and filing_date like '2015-%'
```
@@ -834,4 +990,5 @@ More details of this feature can be found in the Spark interpreter documentation
[Zeppelin-Context](../usage/other_features/zeppelin_context.html)
## Bug reporting
+
If you find a bug using JDBC interpreter, please create a [JIRA](https://issues.apache.org/jira/browse/ZEPPELIN) ticket.
diff --git a/docs/quickstart/sql_with_zeppelin.md b/docs/quickstart/sql_with_zeppelin.md
index e007f20..da709d5 100644
--- a/docs/quickstart/sql_with_zeppelin.md
+++ b/docs/quickstart/sql_with_zeppelin.md
@@ -33,6 +33,8 @@ The following guides explain how to use Apache Zeppelin that enables you to writ
* [MariaDB](../interpreter/jdbc.html#mariadb)
* [AWS Redshift](../interpreter/jdbc.html#redshift)
* [Apache Hive](../interpreter/jdbc.html#apache-hive)
+ * [Presto/Trino](../interpreter/jdbc.html#prestotrino)
+ * [Impala](../interpreter/jdbc.html#impala)
* [Apache Phoenix](../interpreter/jdbc.html#apache-phoenix)
* [Apache Drill](../interpreter/jdbc.html#apache-drill)
* [Apache Tajo](../interpreter/jdbc.html#apache-tajo)
@@ -43,7 +45,8 @@ The following guides explain how to use Apache Zeppelin that enables you to writ
- can create query result including **UI widgets** using [Dynamic Form](../usage/dynamic_form/intro.html)
```sql
- %sql
+ %sql
+
select age, count(1) value
from bank
where age < ${maxAge=30}