You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by me...@apache.org on 2018/08/08 23:31:21 UTC

[beam-site] branch mergebot updated (5b736f0 -> f079eb1)

This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a change to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


    from 5b736f0  This closes #522
     add 24a8588  Prepare repository for deployment.
     new 488f57c  Add section for SQL shell
     new f079eb1  This closes #527

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 content/get-started/quickstart-go/index.html |   2 +-
 src/_includes/section-menu/sdks.html         |   1 +
 src/documentation/dsls/sql/shell.md          | 153 +++++++++++++++++++++++++++
 3 files changed, 155 insertions(+), 1 deletion(-)
 create mode 100644 src/documentation/dsls/sql/shell.md


[beam-site] 02/02: This closes #527

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit f079eb1676c7c5913eab0d387e22d421bc75ee3f
Merge: 24a8588 488f57c
Author: Mergebot <me...@apache.org>
AuthorDate: Wed Aug 8 23:30:45 2018 +0000

    This closes #527

 src/_includes/section-menu/sdks.html |   1 +
 src/documentation/dsls/sql/shell.md  | 153 +++++++++++++++++++++++++++++++++++
 2 files changed, 154 insertions(+)


[beam-site] 01/02: Add section for SQL shell

Posted by me...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

mergebot-role pushed a commit to branch mergebot
in repository https://gitbox.apache.org/repos/asf/beam-site.git

commit 488f57c37465922880242360e87261afa4d32777
Author: Anton Kedin <ke...@google.com>
AuthorDate: Tue Aug 7 15:04:08 2018 -0700

    Add section for SQL shell
---
 src/_includes/section-menu/sdks.html |   1 +
 src/documentation/dsls/sql/shell.md  | 153 +++++++++++++++++++++++++++++++++++
 2 files changed, 154 insertions(+)

diff --git a/src/_includes/section-menu/sdks.html b/src/_includes/section-menu/sdks.html
index ff4806b..0102b4b 100644
--- a/src/_includes/section-menu/sdks.html
+++ b/src/_includes/section-menu/sdks.html
@@ -57,6 +57,7 @@
   <ul class="section-nav-list">
     <li><a href="{{ site.baseurl }}/documentation/dsls/sql/overview/">Overview</a></li>
     <li><a href="{{ site.baseurl }}/documentation/dsls/sql/walkthrough/">Walkthrough</a></li>
+    <li><a href="{{ site.baseurl }}/documentation/dsls/sql/shell/">Shell</a></li>
     <li class="section-nav-item--collapsible">
       <span class="section-nav-list-title">SQL Reference</span>
 
diff --git a/src/documentation/dsls/sql/shell.md b/src/documentation/dsls/sql/shell.md
new file mode 100644
index 0000000..6f9c32c
--- /dev/null
+++ b/src/documentation/dsls/sql/shell.md
@@ -0,0 +1,153 @@
+---
+layout: section
+title: "Beam SQL: Shell"
+section_menu: section-menu/sdks.html
+permalink: /documentation/dsls/sql/shell/
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Beam SQL Shell
+
+## Overview
+
+Starting with version 2.6.0, Beam SQL includes an interactive shell, called the Beam SQL shell. The shell allows you to write pipelines as SQL queries without needing the Java SDK. By default, Beam uses the `DirectRunner` to execute the queries as Beam pipelines.
+
+This page describes how to work with the shell, but does not focus on specific features of Beam SQL. For a more thorough overview of the features used in the examples on this page, see the corresponding sections in the [Beam SQL documentation]({{ site.baseurl }}/documentation/dsls/sql/overview/).
+
+## Quickstart
+
+To use Beam SQL shell, you must first clone the [Beam SDK repository](https://github.com/apache/beam). Then, from the root of the repository clone, execute the following commands to run the shell:
+
+```
+./gradlew -p sdks/java/extensions/sql/shell -Pbeam.sql.shell.bundled=':beam-runners-flink_2.11,:beam-sdks-java-io-kafka' installDist
+
+./sdks/java/extensions/sql/shell/build/install/beam-sdks-java-extensions-sql-shell/bin/beam-sdks-java-extensions-sql-shell
+```
+
+After you run the commands,  the SQL shell starts and you can type queries:
+
+```
+Welcome to Beam SQL 2.6.0-SNAPSHOT (based on sqlline version 1.4.0)
+0: BeamSQL> 
+```
+
+_Note: If you haven't built the project before running the Gradle command, the command will take a few minutes as Gradle must build all dependencies first._
+
+The shell converts the queries into Beam pipelines, runs them using `DirectRunner`, and returns the results as tables when the pipelines finish:
+
+```
+0: BeamSQL> SELECT 'foo' AS NAME, 'bar' AS TYPE, 'num' AS NUMBER;
++------+------+--------+
+| NAME | TYPE | NUMBER |
++------+------+--------+
+| foo  | bar  | num    |
++------+------+--------+
+1 row selected (0.826 seconds)
+```
+
+## Declaring Tables
+
+Before reading data from a source or writing data to a destination, you must declare a virtual table using the `CREATE TABLE` statement. For example, if you have a local CSV file `"test-file.csv"` in the current folder, you can create a table with the following statement:
+
+```
+0: BeamSQL> CREATE TABLE csv_file (field1 VARCHAR, field2 INTEGER) TYPE text LOCATION 'test-file.csv';
+
+No rows affected (0.042 seconds)
+```
+
+The `CREATE TABLE` statement registers the CSV file as a table in Beam SQL and specifies the table's schema. This statement does not directly create a persistent physical table; it only describes the source/sink to Beam SQL so that you can use the table in the queries that read data and write data.
+
+_For more information about `CREATE TABLE` syntax and supported table types, see the [CREATE TABLE reference page]({{ site.baseurl }}/documentation/dsls/sql/create-table/)._
+
+## Reading and Writing Data
+
+To read data from the local CSV file that you declared in the previous section, execute the following query:
+
+```
+0: BeamSQL> SELECT field1 AS field FROM csv_file;
++--------+
+| field  |
++--------+
+| baz    |
+| foo    |
+| bar    |
+| bar    |
+| foo    |
++--------+
+```
+
+_For more information about `SELECT` syntax, see the [SELECT reference page]({{ site.baseurl }}/documentation/dsls/sql/select/)._
+
+To write data to the CSV file, use the `INSERT INTO … SELECT ...` statement:
+
+```
+0: BeamSQL> INSERT INTO csv_file SELECT 'foo', 'bar';
+```
+Read and write behavior depends on the type of the table. For example:
+
+*   The table type `text` is implemented using `TextIO`, so writing to a `text` table can produce multiple numbered files. 
+*   The table type `pubsub` is an unbounded source, so reading from a `pubsub` table never completes.
+
+## Developing with unbounded Sources
+
+When you want to inspect the data from an unbounded source during development, you must specify the `LIMIT x` clause at the end of the `SELECT` statement to limit the output to `x` number of records. Otherwise, the pipeline will never finish.
+
+```
+0: BeamSQL> SELECT field1 FROM unbounded_source LIMIT 10 ;
+```
+
+The example queries shown so far are fast queries that execute locally. These queries are helpful when you are investigating the data and iteratively designing the pipeline. Ideally, you want the queries to finish quickly and return output when complete. 
+
+When you're satisfied with the logic of your SQL statements, you can submit the statements as long-running jobs by dropping the `LIMIT x` statement. Then, the pipeline can potentially run forever if one of the tables represents an unbounded source.
+
+## Specifying the Runner
+
+By default, Beam uses the `DirectRunner` to run the pipeline on the machine where you're executing the commands. If you want to run the pipeline with a different runner, you must perform two steps:
+
+1.  Make sure the SQL shell includes the desired runner. Add the corresponding project id to the `-Pbeam.sql.shell.bundled` parameter of the Gradle invocation ([source code](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/shell/build.gradle), [project ids](https://github.com/apache/beam/blob/master/settings.gradle)). For example, use the following command to include Flink runner and KafkaIO:
+
+    ```
+    ./gradlew -p sdks/java/extensions/sql/shell -Pbeam.sql.shell.bundled=':beam-runners-flink_2.11,:beam-sdks-java-io-kafka' installDist
+    ```
+
+    _Note: You can bundle multiple runners (using a comma-separated list) or other additional components in the same manner. For example, you can add support for more I/Os._
+
+1.  Then, specify the runner using the `SET` command ([reference page]({{ site.baseurl }}/documentation/dsls/sql/set/)):
+
+    ```
+    0: BeamSQL> SET runner='FlinkRunner';
+    ```
+
+Beam will submit all future `INSERT` statements as pipelines to the specified runner. In this case, the Beam SQL shell does not display the query results. You must manage the submitted jobs through the corresponding runner's UI (for example, using the Flink UI or command line).
+
+## Specifying the PipelineOptions
+
+To configure the runner, you must specify `PipelineOptions` by using the `SET` command ([details]({{ site.baseurl }}/documentation/dsls/sql/set/)):
+
+```
+0: BeamSQL> SET projectId='gcpProjectId';
+0: BeamSQL> SET tempLocation='/tmp/tempDir';
+```
+## Packaging the SQL Shell
+
+You can also build your own standalone package for SQL shell using `distZip` or `distTar` tasks. For example:
+
+```
+./gradlew -p sdks/java/extensions/sql/shell -Pbeam.sql.shell.bundled=':beam-runners-flink_2.11,:beam-sdks-java-io-kafka' distZip
+
+ls ./sdks/java/extensions/sql/shell/build/distributions/
+beam-sdks-java-extensions-sql-shell-2.6.0-SNAPSHOT.tar beam-sdks-java-extensions-sql-shell-2.6.0-SNAPSHOT.zip
+```
+