You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by me...@apache.org on 2019/03/05 23:16:31 UTC

[beam] branch master updated: [BEAM-6749] Add BigQuery Storage API info to docs

This is an automated email from the ASF dual-hosted git repository.

melap pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new 28dc54c  [BEAM-6749] Add BigQuery Storage API info to docs
     new 1449805  Merge pull request #7950: [BEAM-6749] Add BigQuery Storage API info to docs
28dc54c is described below

commit 28dc54c9cc19454260059446ddf8e5e473e666f3
Author: Melissa Pashniak <me...@google.com>
AuthorDate: Tue Feb 26 15:08:04 2019 -0800

    [BEAM-6749] Add BigQuery Storage API info to docs
---
 .../documentation/io/built-in-google-bigquery.md   | 88 +++++++++++++++++++---
 1 file changed, 77 insertions(+), 11 deletions(-)

diff --git a/website/src/documentation/io/built-in-google-bigquery.md b/website/src/documentation/io/built-in-google-bigquery.md
index 72fa326..855b5cf 100644
--- a/website/src/documentation/io/built-in-google-bigquery.md
+++ b/website/src/documentation/io/built-in-google-bigquery.md
@@ -1,6 +1,6 @@
 ---
 layout: section
-title: "Google BigQuery IO"
+title: "Google BigQuery I/O connector"
 section_menu: section-menu/documentation.html
 permalink: /documentation/io/built-in/google-bigquery/
 ---
@@ -20,7 +20,7 @@ limitations under the License.
 
 [Built-in I/O Transforms]({{site.baseurl}}/documentation/io/built-in/)
 
-# Google BigQuery IO
+# Google BigQuery I/O connector
 
 <nav class="language-switcher">
   <strong>Adapt for:</strong>
@@ -166,12 +166,18 @@ schema](#creating-a-table-schema) covers schemas in more detail.
 
 ## Reading from BigQuery
 
-BigQueryIO allows you to read from a BigQuery table, or read the results of
-an arbitrary SQL query string. When you apply a BigQueryIO read transform,
-Beam invokes a [BigQuery export request](https://cloud.google.com/bigquery/docs/exporting-data).
-Beam’s use of this API is subject to BigQuery's [Quota](https://cloud.google.com/bigquery/quota-policy#export)
-and [Pricing](https://cloud.google.com/bigquery/pricing) policies.
+BigQueryIO allows you to read from a BigQuery table, or read the results of an
+arbitrary SQL query string. By default, Beam invokes a [BigQuery export
+request](https://cloud.google.com/bigquery/docs/exporting-data) when you apply a
+BigQueryIO read transform. However, the Beam SDK for Java (version 2.11.0 and
+later) adds support for the beta release of the [BigQuery Storage API](https://cloud.google.com/bigquery/docs/reference/storage/)
+as an [experimental feature](https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/annotations/Experimental.html).
+See [Using the BigQuery Storage API](#storage-api) for more information and a
+list of limitations.
 
+> Beam’s use of BigQuery APIs is subject to BigQuery's
+> [Quota](https://cloud.google.com/bigquery/quota-policy)
+> and [Pricing](https://cloud.google.com/bigquery/pricing) policies.
 
 <!-- Java specific -->
 
@@ -200,7 +206,6 @@ allow you to read from a table, or read fields using a query string.
 Avro `GenericRecord` into your custom type, or use `readTableRows()` to parse
 them into JSON `TableRow` objects.
 
-
 <!-- Python specific -->
 
 {:.language-py}
@@ -262,13 +267,74 @@ in the following example:
 {% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py tag:model_bigqueryio_read_query_std_sql
 %}```
 
+### Using the BigQuery Storage API {#storage-api}
+
+The [BigQuery Storage API](https://cloud.google.com/bigquery/docs/reference/storage/)
+allows you to directly access tables in BigQuery storage. As a result, your
+pipeline can read from BigQuery storage faster than previously possible.
+
+The Beam SDK for Java (version 2.11.0 and later) adds support for the beta
+release of the BigQuery Storage API as an [experimental feature](https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/annotations/Experimental.html).
+Beam's support for the BigQuery Storage API has the following limitations:
+
+* The SDK for Python does not support the BigQuery Storage API.
+* You must read from a table. Reading with a query string is not currently
+  supported.
+* Dynamic work re-balancing is not currently supported. As a result, reads might
+  be less efficient in the presence of stragglers.
+
+Because this is currently a Beam experimental feature, export based reads are
+recommended for production jobs.
+
+#### Enabling the API
+
+The BigQuery Storage API is distinct from the existing BigQuery API. You must
+[enable the BigQuery Storage API](https://cloud.google.com/bigquery/docs/reference/storage/#enabling_the_api)
+for your Google Cloud Platform project.
+
+#### Updating your code
+
+Use the following methods when you read from a table:
+
+* Required: Specify [withMethod(Method.DIRECT_READ)](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.TypedRead.html#withMethod-org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.TypedRead.Method-) to use the BigQuery Storage API for
+  the read operation.
+* Optional: To use features such as [column projection and column filtering](https://cloud.google.com/bigquery/docs/reference/storage/),
+  you must also specify a [TableReadOptions](https://googleapis.github.io/google-cloud-java/google-api-grpc/apidocs/index.html?com/google/cloud/bigquery/storage/v1beta1/ReadOptions.TableReadOptions.html)
+  proto using the [withReadOptions](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.TypedRead.html#withReadOptions-com.google.cloud.bigquery.storage.v1beta1.ReadOptions.TableReadOptions-) method.
+
+The following code snippet is from the [BigQueryTornadoes
+example](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java).
+When the example's read method option is set to `DIRECT_READ`, the pipeline uses
+the BigQuery Storage API and column projection to read public samples of weather
+data from a BigQuery table. You can view the [full source code on
+GitHub](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java).
+
+```java
+   TableReadOptions tableReadOptions =
+       TableReadOptions.newBuilder()
+           .addAllSelectedFields(Lists.newArrayList("month", "tornado"))
+           .build();
+
+   rowsFromBigQuery =
+       p.apply(
+            BigQueryIO.readTableRows()
+               .from(options.getInput())
+               .withMethod(Method.DIRECT_READ)
+               .withReadOptions(tableReadOptions));
+```
+```py
+# The SDK for Python does not support the BigQuery Storage API.
+```
+
 
 ## Writing to BigQuery
 
 BigQueryIO allows you to write to BigQuery tables. If you are using the Beam SDK
-for Java, you can also write different rows to different tables. BigQueryIO
-write transforms use APIs that are subject to BigQuery's [Quota](https://cloud.google.com/bigquery/quota-policy#export)
-and [Pricing](https://cloud.google.com/bigquery/pricing) policies.
+for Java, you can also write different rows to different tables.
+
+> BigQueryIO write transforms use APIs that are subject to BigQuery's
+> [Quota](https://cloud.google.com/bigquery/quota-policy) and
+> [Pricing](https://cloud.google.com/bigquery/pricing) policies.
 
 When you apply a write transform, you must provide the following information
 for the destination table(s):