You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by fe...@apache.org on 2018/04/28 06:14:58 UTC
zeppelin git commit: ZEPPELIN-3412 Enable query prefix syntax in
bigquery interpreter
Repository: zeppelin
Updated Branches:
refs/heads/master e65f73066 -> 0c3b446a5
ZEPPELIN-3412 Enable query prefix syntax in bigquery interpreter
### What is this PR for?
- interpreter
Enable query prefix syntax in bigquery interpreter.
Bigquery has a convenient query prefix syntax #standardSQL and #legacySQL.
This enable to switch between standardSQL and legacySQL without changing UseLegacySql option.
Currently (v0.8.0 or later), bigquery interpreter definitely runs with UseLegacySql option.
As described in the [docs](https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql?hl=en#sql-prefix), when this option is set to true or false, the query prefix cannot be used.
- documentation
Removed some contents from README.md. They are duplicated with interpreter documentation.
### What type of PR is it?
Improvement
### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-3412
### How should this be tested?
- Unit test should pass
### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? Yes
Author: iijima_satoshi <ii...@cyberagent.co.jp>
Closes #2929 from iijima-satoshi/bigquery-sql-prefix and squashes the following commits:
e5b1acd [iijima_satoshi] Address comments
f88cc42 [iijima_satoshi] Address comments
809f561 [iijima_satoshi] Address comments
4e2a6b2 [iijima_satoshi] Enable query prefix syntax in bigquery interpreter
Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo
Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/0c3b446a
Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/0c3b446a
Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/0c3b446a
Branch: refs/heads/master
Commit: 0c3b446a5304df3f3c0d665d2df926cca2dcddb7
Parents: e65f730
Author: iijima_satoshi <ii...@cyberagent.co.jp>
Authored: Fri Apr 27 16:47:40 2018 +0900
Committer: Felix Cheung <fe...@apache.org>
Committed: Fri Apr 27 23:14:54 2018 -0700
----------------------------------------------------------------------
bigquery/README.md | 85 --------------------
.../zeppelin/bigquery/BigQueryInterpreter.java | 19 ++++-
.../src/main/resources/interpreter-setting.json | 13 +--
.../bigquery/BigQueryInterpreterTest.java | 11 ++-
docs/interpreter/bigquery.md | 5 ++
5 files changed, 35 insertions(+), 98 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/bigquery/README.md
----------------------------------------------------------------------
diff --git a/bigquery/README.md b/bigquery/README.md
index fc09763..0dff5fe 100644
--- a/bigquery/README.md
+++ b/bigquery/README.md
@@ -1,10 +1,6 @@
# Overview
BigQuery interpreter for Apache Zeppelin
-# Pre requisities
-You can follow the instructions at [Apache Zeppelin on Dataproc](https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/blob/master/apache-zeppelin/README.MD) to bring up Zeppelin on Google dataproc.
-You could also install and bring up Zeppelin on Google compute Engine.
-
# Unit Tests
BigQuery Unit tests are excluded as these tests depend on the BigQuery external service. This is because BigQuery does not have a local mock at this point.
@@ -14,34 +10,6 @@ If you like to run these tests manually, please follow the following steps:
* Copy the project ID that you created and add it to the property "projectId" in `resources/constants.json`
* Run the command mvn <options> -Dbigquery.text.exclude='' test -pl bigquery -am
-
-# Interpreter Configuration
-
-Configure the following properties during Interpreter creation.
-
-<table class="table-configuration">
- <tr>
- <th>Name</th>
- <th>Default Value</th>
- <th>Description</th>
- </tr>
- <tr>
- <td>zeppelin.bigquery.project_id</td>
- <td> </td>
- <td>Google Project Id</td>
- </tr>
- <tr>
- <td>zeppelin.bigquery.wait_time</td>
- <td>5000</td>
- <td>Query Timeout in Milliseconds</td>
- </tr>
- <tr>
- <td>zeppelin.bigquery.max_no_of_rows</td>
- <td>100000</td>
- <td>Max result set size</td>
- </tr>
-</table>
-
# Connection
The Interpreter opens a connection with the BigQuery Service using the supplied Google project ID and the compute environment variables.
@@ -51,59 +19,6 @@ The Interpreter opens a connection with the BigQuery Service using the supplied
We have used the curated veneer version of the Java APIs versus [Idiomatic Java client] (https://github.com/GoogleCloudPlatform/gcloud-java/tree/master/gcloud-java-bigquery) to build the interpreter. This is mainly for usability reasons.
-# Enabling the BigQuery Interpreter
-
-In a notebook, to enable the **BigQuery** interpreter, click the **Gear** icon and select **bigquery**.
-
-# Using the BigQuery Interpreter
-
-In a paragraph, use `%bigquery.sql` to select the **BigQuery** interpreter and then input SQL statements against your datasets stored in BigQuery.
-You can use [BigQuery SQL Reference](https://cloud.google.com/bigquery/query-reference) to build your own SQL.
-
-For Example, SQL to query for top 10 departure delays across airports using the flights public dataset
-
-```bash
-%bigquery.sql
-SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays
-FROM [bigquery-samples:airline_ontime_data.flights]
-group by departure_airport
-order by 2 desc
-limit 10
-```
-
-Another Example, SQL to query for most commonly used java packages from the github data hosted in BigQuery
-
-```bash
-%bigquery.sql
-SELECT
- package,
- COUNT(*) count
-FROM (
- SELECT
- REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package,
- id
- FROM (
- SELECT
- SPLIT(content, '\n') line,
- id
- FROM
- [bigquery-public-data:github_repos.sample_contents]
- WHERE
- content CONTAINS 'import'
- AND sample_path LIKE '%.java'
- HAVING
- LEFT(line, 6)='import' )
- GROUP BY
- package,
- id )
-GROUP BY
- 1
-ORDER BY
- count DESC
-LIMIT
- 40
-```
-
# Sample Screenshot
![Zeppelin BigQuery](https://cloud.githubusercontent.com/assets/10060731/16938817/b9213ea0-4db6-11e6-8c3b-8149a0bdf874.png)
http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java
----------------------------------------------------------------------
diff --git a/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java b/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java
index b56f63c..0973fda 100644
--- a/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java
+++ b/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java
@@ -89,7 +89,7 @@ public class BigQueryInterpreter extends Interpreter {
static final String PROJECT_ID = "zeppelin.bigquery.project_id";
static final String WAIT_TIME = "zeppelin.bigquery.wait_time";
static final String MAX_ROWS = "zeppelin.bigquery.max_no_of_rows";
- static final String LEGACY_SQL = "zeppelin.bigquery.use_legacy_sql";
+ static final String SQL_DIALECT = "zeppelin.bigquery.sql_dialect";
private static String jobId = null;
private static String projectId = null;
@@ -226,8 +226,19 @@ public class BigQueryInterpreter extends Interpreter {
String projId = getProperty(PROJECT_ID);
long wTime = Long.parseLong(getProperty(WAIT_TIME));
long maxRows = Long.parseLong(getProperty(MAX_ROWS));
- String legacySql = getProperty(LEGACY_SQL);
- boolean useLegacySql = legacySql == null ? true : Boolean.parseBoolean(legacySql);
+ String sqlDialect = getProperty(SQL_DIALECT, "").toLowerCase();
+ Boolean useLegacySql;
+ switch (sqlDialect) {
+ case "standardsql":
+ useLegacySql = false;
+ break;
+ case "legacysql":
+ useLegacySql = true;
+ break;
+ default:
+ // Enable query prefix like '#standardSQL' if specified
+ useLegacySql = null;
+ }
Iterator<GetQueryResultsResponse> pages;
try {
pages = run(sql, projId, wTime, maxRows, useLegacySql);
@@ -247,7 +258,7 @@ public class BigQueryInterpreter extends Interpreter {
//Function to run the SQL on bigQuery service
public static Iterator<GetQueryResultsResponse> run(final String queryString,
- final String projId, final long wTime, final long maxRows, boolean useLegacySql)
+ final String projId, final long wTime, final long maxRows, Boolean useLegacySql)
throws IOException {
try {
logger.info("Use legacy sql: {}", useLegacySql);
http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/bigquery/src/main/resources/interpreter-setting.json
----------------------------------------------------------------------
diff --git a/bigquery/src/main/resources/interpreter-setting.json b/bigquery/src/main/resources/interpreter-setting.json
index 8e92ee4..8023bed 100644
--- a/bigquery/src/main/resources/interpreter-setting.json
+++ b/bigquery/src/main/resources/interpreter-setting.json
@@ -22,14 +22,15 @@
"envName": null,
"propertyName": "zeppelin.bigquery.max_no_of_rows",
"defaultValue": "100000",
- "description": "Maximum number of rows to fetch from BigQuery"
+ "description": "Maximum number of rows to fetch from BigQuery",
+ "type": "number"
},
- "zeppelin.bigquery.use_legacy_sql": {
+ "zeppelin.bigquery.sql_dialect": {
"envName": null,
- "propertyName": "zeppelin.bigquery.use_legacy_sql",
- "defaultValue": "true",
- "description": "set true to use legacy sql",
- "type": "checkbox"
+ "propertyName": "zeppelin.bigquery.sql_dialect",
+ "defaultValue": "",
+ "description": "BigQuery SQL dialect (standardSQL or legacySQL). If empty, query prefix like '#standardSQL' can be used.",
+ "type": "string"
}
},
"editor": {
http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java
----------------------------------------------------------------------
diff --git a/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java b/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java
index 04676ab..9dcd9f8 100644
--- a/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java
+++ b/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java
@@ -74,6 +74,7 @@ public class BigQueryInterpreterTest {
p.setProperty("zeppelin.bigquery.project_id", constants.getProjectId());
p.setProperty("zeppelin.bigquery.wait_time", "5000");
p.setProperty("zeppelin.bigquery.max_no_of_rows", "100");
+ p.setProperty("zeppelin.bigquery.sql_dialect", "");
intpGroup = new InterpreterGroup();
@@ -85,7 +86,6 @@ public class BigQueryInterpreterTest {
@Test
public void sqlSuccess() {
InterpreterResult ret = bqInterpreter.interpret(constants.getOne(), context);
-
assertEquals(InterpreterResult.Code.SUCCESS, ret.code());
assertEquals(ret.message().get(0).getType(), InterpreterResult.Type.TABLE);
}
@@ -93,14 +93,19 @@ public class BigQueryInterpreterTest {
@Test
public void badSqlSyntaxFails() {
InterpreterResult ret = bqInterpreter.interpret(constants.getWrong(), context);
-
assertEquals(InterpreterResult.Code.ERROR, ret.code());
}
@Test
+ public void testWithQueryPrefix() {
+ InterpreterResult ret = bqInterpreter.interpret(
+ "#standardSQL\n WITH t AS (select 1) SELECT * FROM t", context);
+ assertEquals(InterpreterResult.Code.SUCCESS, ret.code());
+ }
+
+ @Test
public void testInterpreterOutputData() {
InterpreterResult ret = bqInterpreter.interpret("SELECT 1 AS col1, 2 AS col2", context);
-
String[] lines = ret.message().get(0).getData().split("\\n");
assertEquals(2, lines.length);
assertEquals("col1\tcol2", lines[0]);
http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/docs/interpreter/bigquery.md
----------------------------------------------------------------------
diff --git a/docs/interpreter/bigquery.md b/docs/interpreter/bigquery.md
index 1b90f99..cdac762 100644
--- a/docs/interpreter/bigquery.md
+++ b/docs/interpreter/bigquery.md
@@ -48,6 +48,11 @@ limitations under the License.
<td>100000</td>
<td>Max result set size</td>
</tr>
+ <tr>
+ <td>zeppelin.bigquery.sql_dialect</td>
+ <td></td>
+ <td>BigQuery SQL dialect (standardSQL or legacySQL). If empty, [query prefix](https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql#sql-prefix) like '#standardSQL' can be used.</td>
+ </tr>
</table>