You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by me...@apache.org on 2019/03/05 23:16:31 UTC
[beam] branch master updated: [BEAM-6749] Add BigQuery Storage API
info to docs
This is an automated email from the ASF dual-hosted git repository.
melap pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push:
new 28dc54c [BEAM-6749] Add BigQuery Storage API info to docs
new 1449805 Merge pull request #7950: [BEAM-6749] Add BigQuery Storage API info to docs
28dc54c is described below
commit 28dc54c9cc19454260059446ddf8e5e473e666f3
Author: Melissa Pashniak <me...@google.com>
AuthorDate: Tue Feb 26 15:08:04 2019 -0800
[BEAM-6749] Add BigQuery Storage API info to docs
---
.../documentation/io/built-in-google-bigquery.md | 88 +++++++++++++++++++---
1 file changed, 77 insertions(+), 11 deletions(-)
diff --git a/website/src/documentation/io/built-in-google-bigquery.md b/website/src/documentation/io/built-in-google-bigquery.md
index 72fa326..855b5cf 100644
--- a/website/src/documentation/io/built-in-google-bigquery.md
+++ b/website/src/documentation/io/built-in-google-bigquery.md
@@ -1,6 +1,6 @@
---
layout: section
-title: "Google BigQuery IO"
+title: "Google BigQuery I/O connector"
section_menu: section-menu/documentation.html
permalink: /documentation/io/built-in/google-bigquery/
---
@@ -20,7 +20,7 @@ limitations under the License.
[Built-in I/O Transforms]({{site.baseurl}}/documentation/io/built-in/)
-# Google BigQuery IO
+# Google BigQuery I/O connector
<nav class="language-switcher">
<strong>Adapt for:</strong>
@@ -166,12 +166,18 @@ schema](#creating-a-table-schema) covers schemas in more detail.
## Reading from BigQuery
-BigQueryIO allows you to read from a BigQuery table, or read the results of
-an arbitrary SQL query string. When you apply a BigQueryIO read transform,
-Beam invokes a [BigQuery export request](https://cloud.google.com/bigquery/docs/exporting-data).
-Beam’s use of this API is subject to BigQuery's [Quota](https://cloud.google.com/bigquery/quota-policy#export)
-and [Pricing](https://cloud.google.com/bigquery/pricing) policies.
+BigQueryIO allows you to read from a BigQuery table, or read the results of an
+arbitrary SQL query string. By default, Beam invokes a [BigQuery export
+request](https://cloud.google.com/bigquery/docs/exporting-data) when you apply a
+BigQueryIO read transform. However, the Beam SDK for Java (version 2.11.0 and
+later) adds support for the beta release of the [BigQuery Storage API](https://cloud.google.com/bigquery/docs/reference/storage/)
+as an [experimental feature](https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/annotations/Experimental.html).
+See [Using the BigQuery Storage API](#storage-api) for more information and a
+list of limitations.
+> Beam’s use of BigQuery APIs is subject to BigQuery's
+> [Quota](https://cloud.google.com/bigquery/quota-policy)
+> and [Pricing](https://cloud.google.com/bigquery/pricing) policies.
<!-- Java specific -->
@@ -200,7 +206,6 @@ allow you to read from a table, or read fields using a query string.
Avro `GenericRecord` into your custom type, or use `readTableRows()` to parse
them into JSON `TableRow` objects.
-
<!-- Python specific -->
{:.language-py}
@@ -262,13 +267,74 @@ in the following example:
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets.py tag:model_bigqueryio_read_query_std_sql
%}```
+### Using the BigQuery Storage API {#storage-api}
+
+The [BigQuery Storage API](https://cloud.google.com/bigquery/docs/reference/storage/)
+allows you to directly access tables in BigQuery storage. As a result, your
+pipeline can read from BigQuery storage faster than previously possible.
+
+The Beam SDK for Java (version 2.11.0 and later) adds support for the beta
+release of the BigQuery Storage API as an [experimental feature](https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/annotations/Experimental.html).
+Beam's support for the BigQuery Storage API has the following limitations:
+
+* The SDK for Python does not support the BigQuery Storage API.
+* You must read from a table. Reading with a query string is not currently
+ supported.
+* Dynamic work re-balancing is not currently supported. As a result, reads might
+ be less efficient in the presence of stragglers.
+
+Because this is currently a Beam experimental feature, export based reads are
+recommended for production jobs.
+
+#### Enabling the API
+
+The BigQuery Storage API is distinct from the existing BigQuery API. You must
+[enable the BigQuery Storage API](https://cloud.google.com/bigquery/docs/reference/storage/#enabling_the_api)
+for your Google Cloud Platform project.
+
+#### Updating your code
+
+Use the following methods when you read from a table:
+
+* Required: Specify [withMethod(Method.DIRECT_READ)](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.TypedRead.html#withMethod-org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.TypedRead.Method-) to use the BigQuery Storage API for
+ the read operation.
+* Optional: To use features such as [column projection and column filtering](https://cloud.google.com/bigquery/docs/reference/storage/),
+ you must also specify a [TableReadOptions](https://googleapis.github.io/google-cloud-java/google-api-grpc/apidocs/index.html?com/google/cloud/bigquery/storage/v1beta1/ReadOptions.TableReadOptions.html)
+ proto using the [withReadOptions](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.TypedRead.html#withReadOptions-com.google.cloud.bigquery.storage.v1beta1.ReadOptions.TableReadOptions-) method.
+
+The following code snippet is from the [BigQueryTornadoes
+example](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java).
+When the example's read method option is set to `DIRECT_READ`, the pipeline uses
+the BigQuery Storage API and column projection to read public samples of weather
+data from a BigQuery table. You can view the [full source code on
+GitHub](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java).
+
+```java
+ TableReadOptions tableReadOptions =
+ TableReadOptions.newBuilder()
+ .addAllSelectedFields(Lists.newArrayList("month", "tornado"))
+ .build();
+
+ rowsFromBigQuery =
+ p.apply(
+ BigQueryIO.readTableRows()
+ .from(options.getInput())
+ .withMethod(Method.DIRECT_READ)
+ .withReadOptions(tableReadOptions));
+```
+```py
+# The SDK for Python does not support the BigQuery Storage API.
+```
+
## Writing to BigQuery
BigQueryIO allows you to write to BigQuery tables. If you are using the Beam SDK
-for Java, you can also write different rows to different tables. BigQueryIO
-write transforms use APIs that are subject to BigQuery's [Quota](https://cloud.google.com/bigquery/quota-policy#export)
-and [Pricing](https://cloud.google.com/bigquery/pricing) policies.
+for Java, you can also write different rows to different tables.
+
+> BigQueryIO write transforms use APIs that are subject to BigQuery's
+> [Quota](https://cloud.google.com/bigquery/quota-policy) and
+> [Pricing](https://cloud.google.com/bigquery/pricing) policies.
When you apply a write transform, you must provide the following information
for the destination table(s):