You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zeppelin.apache.org by fe...@apache.org on 2018/04/28 06:14:58 UTC

zeppelin git commit: ZEPPELIN-3412 Enable query prefix syntax in bigquery interpreter

Repository: zeppelin
Updated Branches:
  refs/heads/master e65f73066 -> 0c3b446a5


ZEPPELIN-3412 Enable query prefix syntax in bigquery interpreter

### What is this PR for?
- interpreter
Enable query prefix syntax in bigquery interpreter.
Bigquery has a convenient query prefix syntax #standardSQL and #legacySQL.
This enable to switch between standardSQL and legacySQL without changing UseLegacySql option.
Currently (v0.8.0 or later), bigquery interpreter definitely runs with UseLegacySql option.
As described in the [docs](https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql?hl=en#sql-prefix), when this option is set to true or false, the query prefix cannot be used.

- documentation
Removed some contents from README.md. They are duplicated with interpreter documentation.

### What type of PR is it?
Improvement

### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-3412

### How should this be tested?
- Unit test should pass

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? Yes

Author: iijima_satoshi <ii...@cyberagent.co.jp>

Closes #2929 from iijima-satoshi/bigquery-sql-prefix and squashes the following commits:

e5b1acd [iijima_satoshi] Address comments
f88cc42 [iijima_satoshi] Address comments
809f561 [iijima_satoshi] Address comments
4e2a6b2 [iijima_satoshi] Enable query prefix syntax in bigquery interpreter


Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo
Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/0c3b446a
Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/0c3b446a
Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/0c3b446a

Branch: refs/heads/master
Commit: 0c3b446a5304df3f3c0d665d2df926cca2dcddb7
Parents: e65f730
Author: iijima_satoshi <ii...@cyberagent.co.jp>
Authored: Fri Apr 27 16:47:40 2018 +0900
Committer: Felix Cheung <fe...@apache.org>
Committed: Fri Apr 27 23:14:54 2018 -0700

----------------------------------------------------------------------
 bigquery/README.md                              | 85 --------------------
 .../zeppelin/bigquery/BigQueryInterpreter.java  | 19 ++++-
 .../src/main/resources/interpreter-setting.json | 13 +--
 .../bigquery/BigQueryInterpreterTest.java       | 11 ++-
 docs/interpreter/bigquery.md                    |  5 ++
 5 files changed, 35 insertions(+), 98 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/bigquery/README.md
----------------------------------------------------------------------
diff --git a/bigquery/README.md b/bigquery/README.md
index fc09763..0dff5fe 100644
--- a/bigquery/README.md
+++ b/bigquery/README.md
@@ -1,10 +1,6 @@
 # Overview
 BigQuery interpreter for Apache Zeppelin
 
-# Pre requisities
-You can follow the instructions at [Apache Zeppelin on Dataproc](https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/blob/master/apache-zeppelin/README.MD) to bring up Zeppelin on Google dataproc. 
-You could also install and bring up Zeppelin on Google compute Engine.
-
 # Unit Tests
 BigQuery Unit tests are excluded as these tests depend on the BigQuery external service. This is because BigQuery does not have a local mock at this point.
 
@@ -14,34 +10,6 @@ If you like to run these tests manually, please follow the following steps:
 * Copy the project ID that you created and add it to the property "projectId" in `resources/constants.json`
 * Run the command mvn <options> -Dbigquery.text.exclude='' test -pl bigquery -am
 
-
-# Interpreter Configuration
-
-Configure the following properties during Interpreter creation.
-
-<table class="table-configuration">
-  <tr>
-    <th>Name</th>
-    <th>Default Value</th>
-    <th>Description</th>
-  </tr>
-  <tr>
-    <td>zeppelin.bigquery.project_id</td>
-    <td>  </td>
-    <td>Google Project Id</td>
-  </tr>
-  <tr>
-    <td>zeppelin.bigquery.wait_time</td>
-    <td>5000</td>
-    <td>Query Timeout in Milliseconds</td>
-  </tr>
-  <tr>
-    <td>zeppelin.bigquery.max_no_of_rows</td>
-    <td>100000</td>
-    <td>Max result set size</td>
-  </tr>
-</table>
-
 # Connection
 The Interpreter opens a connection with the BigQuery Service using the supplied Google project ID and the compute environment variables.
 
@@ -51,59 +19,6 @@ The Interpreter opens a connection with the BigQuery Service using the supplied
 
 We have used the curated veneer version of the Java APIs versus [Idiomatic Java client] (https://github.com/GoogleCloudPlatform/gcloud-java/tree/master/gcloud-java-bigquery) to build the interpreter. This is mainly for usability reasons.
 
-# Enabling the BigQuery Interpreter
-
-In a notebook, to enable the **BigQuery** interpreter, click the **Gear** icon and select **bigquery**.
-
-# Using the BigQuery Interpreter
-
-In a paragraph, use `%bigquery.sql` to select the **BigQuery** interpreter and then input SQL statements against your datasets stored in BigQuery.
-You can use [BigQuery SQL Reference](https://cloud.google.com/bigquery/query-reference) to build your own SQL.
-
-For Example, SQL to query for top 10 departure delays across airports using the flights public dataset
-
-```bash
-%bigquery.sql
-SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays
-FROM [bigquery-samples:airline_ontime_data.flights]
-group by departure_airport
-order by 2 desc
-limit 10
-```
-
-Another Example, SQL to query for most commonly used java packages from the github data hosted in BigQuery
-
-```bash
-%bigquery.sql
-SELECT
-  package,
-  COUNT(*) count
-FROM (
-  SELECT
-    REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package,
-    id
-  FROM (
-    SELECT
-      SPLIT(content, '\n') line,
-      id
-    FROM
-      [bigquery-public-data:github_repos.sample_contents]
-    WHERE
-      content CONTAINS 'import'
-      AND sample_path LIKE '%.java'
-    HAVING
-      LEFT(line, 6)='import' )
-  GROUP BY
-    package,
-    id )
-GROUP BY
-  1
-ORDER BY
-  count DESC
-LIMIT
-  40
-```
-
 # Sample Screenshot
 
 ![Zeppelin BigQuery](https://cloud.githubusercontent.com/assets/10060731/16938817/b9213ea0-4db6-11e6-8c3b-8149a0bdf874.png)

http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java
----------------------------------------------------------------------
diff --git a/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java b/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java
index b56f63c..0973fda 100644
--- a/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java
+++ b/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java
@@ -89,7 +89,7 @@ public class BigQueryInterpreter extends Interpreter {
   static final String PROJECT_ID = "zeppelin.bigquery.project_id";
   static final String WAIT_TIME = "zeppelin.bigquery.wait_time";
   static final String MAX_ROWS = "zeppelin.bigquery.max_no_of_rows";
-  static final String LEGACY_SQL = "zeppelin.bigquery.use_legacy_sql";
+  static final String SQL_DIALECT = "zeppelin.bigquery.sql_dialect";
 
   private static String jobId = null;
   private static String projectId = null;
@@ -226,8 +226,19 @@ public class BigQueryInterpreter extends Interpreter {
     String projId = getProperty(PROJECT_ID);
     long wTime = Long.parseLong(getProperty(WAIT_TIME));
     long maxRows = Long.parseLong(getProperty(MAX_ROWS));
-    String legacySql = getProperty(LEGACY_SQL);
-    boolean useLegacySql = legacySql == null ? true : Boolean.parseBoolean(legacySql);
+    String sqlDialect = getProperty(SQL_DIALECT, "").toLowerCase();
+    Boolean useLegacySql;
+    switch (sqlDialect) {
+      case "standardsql":
+        useLegacySql = false;
+        break;
+      case "legacysql":
+        useLegacySql = true;
+        break;
+      default:
+        // Enable query prefix like '#standardSQL' if specified
+        useLegacySql = null;
+    }
     Iterator<GetQueryResultsResponse> pages;
     try {
       pages = run(sql, projId, wTime, maxRows, useLegacySql);
@@ -247,7 +258,7 @@ public class BigQueryInterpreter extends Interpreter {
 
   //Function to run the SQL on bigQuery service
   public static Iterator<GetQueryResultsResponse> run(final String queryString,
-      final String projId, final long wTime, final long maxRows, boolean useLegacySql)
+      final String projId, final long wTime, final long maxRows, Boolean useLegacySql)
           throws IOException {
     try {
       logger.info("Use legacy sql: {}", useLegacySql);

http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/bigquery/src/main/resources/interpreter-setting.json
----------------------------------------------------------------------
diff --git a/bigquery/src/main/resources/interpreter-setting.json b/bigquery/src/main/resources/interpreter-setting.json
index 8e92ee4..8023bed 100644
--- a/bigquery/src/main/resources/interpreter-setting.json
+++ b/bigquery/src/main/resources/interpreter-setting.json
@@ -22,14 +22,15 @@
         "envName": null,
         "propertyName": "zeppelin.bigquery.max_no_of_rows",
         "defaultValue": "100000",
-        "description": "Maximum number of rows to fetch from BigQuery"
+        "description": "Maximum number of rows to fetch from BigQuery",
+        "type": "number"
       },
-      "zeppelin.bigquery.use_legacy_sql": {
+      "zeppelin.bigquery.sql_dialect": {
         "envName": null,
-        "propertyName": "zeppelin.bigquery.use_legacy_sql",
-        "defaultValue": "true",
-        "description": "set true to use legacy sql",
-        "type": "checkbox"
+        "propertyName": "zeppelin.bigquery.sql_dialect",
+        "defaultValue": "",
+        "description": "BigQuery SQL dialect (standardSQL or legacySQL). If empty, query prefix like '#standardSQL' can be used.",
+        "type": "string"
       }
     },
     "editor": {

http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java
----------------------------------------------------------------------
diff --git a/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java b/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java
index 04676ab..9dcd9f8 100644
--- a/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java
+++ b/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java
@@ -74,6 +74,7 @@ public class BigQueryInterpreterTest {
     p.setProperty("zeppelin.bigquery.project_id", constants.getProjectId());
     p.setProperty("zeppelin.bigquery.wait_time", "5000");
     p.setProperty("zeppelin.bigquery.max_no_of_rows", "100");
+    p.setProperty("zeppelin.bigquery.sql_dialect", "");
 
     intpGroup = new InterpreterGroup();
 
@@ -85,7 +86,6 @@ public class BigQueryInterpreterTest {
   @Test
   public void sqlSuccess() {
     InterpreterResult ret = bqInterpreter.interpret(constants.getOne(), context);
-
     assertEquals(InterpreterResult.Code.SUCCESS, ret.code());
     assertEquals(ret.message().get(0).getType(), InterpreterResult.Type.TABLE);
   }
@@ -93,14 +93,19 @@ public class BigQueryInterpreterTest {
   @Test
   public void badSqlSyntaxFails() {
     InterpreterResult ret = bqInterpreter.interpret(constants.getWrong(), context);
-
     assertEquals(InterpreterResult.Code.ERROR, ret.code());
   }
 
   @Test
+  public void testWithQueryPrefix() {
+    InterpreterResult ret = bqInterpreter.interpret(
+        "#standardSQL\n WITH t AS (select 1) SELECT * FROM t", context);
+    assertEquals(InterpreterResult.Code.SUCCESS, ret.code());
+  }
+
+  @Test
   public void testInterpreterOutputData() {
     InterpreterResult ret = bqInterpreter.interpret("SELECT 1 AS col1, 2 AS col2", context);
-
     String[] lines = ret.message().get(0).getData().split("\\n");
     assertEquals(2, lines.length);
     assertEquals("col1\tcol2", lines[0]);

http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/docs/interpreter/bigquery.md
----------------------------------------------------------------------
diff --git a/docs/interpreter/bigquery.md b/docs/interpreter/bigquery.md
index 1b90f99..cdac762 100644
--- a/docs/interpreter/bigquery.md
+++ b/docs/interpreter/bigquery.md
@@ -48,6 +48,11 @@ limitations under the License.
     <td>100000</td>
     <td>Max result set size</td>
   </tr>
+  <tr>
+    <td>zeppelin.bigquery.sql_dialect</td>
+    <td></td>
+    <td>BigQuery SQL dialect (standardSQL or legacySQL). If empty, [query prefix](https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql#sql-prefix) like '#standardSQL' can be used.</td>
+  </tr>
 </table>