You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/03/01 00:42:17 UTC

[GitHub] [beam] rezarokni commented on a change in pull request #13644: [BEAM-11544] BQML pattern

rezarokni commented on a change in pull request #13644:
URL: https://github.com/apache/beam/pull/13644#discussion_r584390651



##########
File path: website/www/site/content/en/documentation/patterns/bqml.md
##########
@@ -0,0 +1,98 @@
+---
+title: "BigQuery ML integration"
+---
+
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# BigQuery ML integration
+
+With the samples on this page we will demonstrate how to integrate models exported from [BigQuery ML (BQML)](https://cloud.google.com/bigquery-ml/docs) into your Apache Beam pipeline using [TFX Basic Shared Libraries (tfx_bsl)](https://github.com/tensorflow/tfx-bsl).
+
+Roughly, the sections below will go through the following steps in more detail:
+
+1. Create and train your BigQuery ML model
+1. Export your BigQuery ML model
+1. Create a transform that uses the brand-new BigQuery ML model
+
+## Create and train your BigQuery ML model
+
+To be able to incorporate your BQML model into an Apache Beam pipeline using tfx_bsl, it has to be in the [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) format. An overview that maps different model types to their export model format for BQML can be found [here](https://cloud.google.com/bigquery-ml/docs/exporting-models#export_model_formats_and_samples).
+
+For the sake of simplicity, we'll be training a (simplified version of the) logistic regression model in the [BQML quickstart guide](https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start), using the publicly available Google Analytics sample dataset. An overview of all models you can create using BQML can be found [here](https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in).
+
+After creating a BigQuery dataset, you continue to create the model, which is fully defined in SQL:
+
+```SQL
+CREATE MODEL IF NOT EXISTS `bqml_tutorial.sample_model`
+OPTIONS(model_type='logistic_reg', input_label_cols=["label"]) AS
+SELECT
+  IF(totals.transactions IS NULL, 0, 1) AS label,
+  IFNULL(geoNetwork.country, "") AS country
+FROM
+  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
+WHERE
+  _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
+```
+
+The model will predict if a purchase will be made given the country of the visitor on data gathered between 2016-08-01 and 2017-06-30.
+
+## Export your BigQuery ML model
+
+In order to incorporate your model in an Apache Beam pipeline, you will need to export it. Prerequisites to do so are [installing the `bq` command-line tool](https://cloud.google.com/bigquery/docs/bq-command-line-tool) and [creating a Google Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) to store your exported model.
+
+Export the model using the following command:
+
+```bash
+bq extract -m bqml_tutorial.sample_model gs://some/gcs/path
+```
+
+## Create an Apache Beam transform that uses your BigQuery ML model
+
+In this section we will construct an Apache Beam pipeline that will use the BigQuery ML model we just created and exported. The model can be served using Google Cloud AI Platform Prediction - for this please refer to the [AI Platform patterns](https://beam.apache.org/documentation/patterns/ai-platform/). In this case, we'll be illustrating how to use the tfx_bsl library to do local predictions (on your Apache Beam workers).
+
+First, the model needs to be downloaded to a local directory where you will be developing the rest of your pipeline (e.g. to `serving_dir/sample_model/1`).
+
+Then, you can start developing your pipeline like you would normally do. We will be using the `RunInference` PTransform from the [tfx_bsl](https://github.com/tensorflow/tfx-bsl) library, and we will point it to our local directory where the model is stored (see the `model_path` variable in the code example). The transform takes elements of the type `tf.train.Example` as inputs and outputs elements of the type `tensorflow_serving.apis.prediction_log_pb2.PredictionLog`.
+
+```python
+import apache_beam
+import tensorflow as tf
+from google.protobuf import text_format
+from tfx_bsl.beam import run_inference
+from tfx_bsl.public.beam import RunInference
+from tfx_bsl.public.proto import model_spec_pb2
+
+
+inputs = """
+features {
+    feature { key: "country" value { bytes_list { value: 'Belgium' }}}
+}
+"""
+
+def create_tf_example(json_obj):
+    return text_format.Parse(json_obj, tf.train.Example())

Review comment:
       Might be worth looking at the TF api's for this as it maybe more familiar to folks working with Tensorflow .
   
   https://www.tensorflow.org/tutorials/load_data/tfrecord

##########
File path: website/www/site/content/en/documentation/patterns/bqml.md
##########
@@ -0,0 +1,98 @@
+---
+title: "BigQuery ML integration"
+---
+
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# BigQuery ML integration
+
+With the samples on this page we will demonstrate how to integrate models exported from [BigQuery ML (BQML)](https://cloud.google.com/bigquery-ml/docs) into your Apache Beam pipeline using [TFX Basic Shared Libraries (tfx_bsl)](https://github.com/tensorflow/tfx-bsl).
+
+Roughly, the sections below will go through the following steps in more detail:
+
+1. Create and train your BigQuery ML model
+1. Export your BigQuery ML model
+1. Create a transform that uses the brand-new BigQuery ML model
+
+## Create and train your BigQuery ML model
+
+To be able to incorporate your BQML model into an Apache Beam pipeline using tfx_bsl, it has to be in the [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) format. An overview that maps different model types to their export model format for BQML can be found [here](https://cloud.google.com/bigquery-ml/docs/exporting-models#export_model_formats_and_samples).
+
+For the sake of simplicity, we'll be training a (simplified version of the) logistic regression model in the [BQML quickstart guide](https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start), using the publicly available Google Analytics sample dataset. An overview of all models you can create using BQML can be found [here](https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in).
+
+After creating a BigQuery dataset, you continue to create the model, which is fully defined in SQL:
+
+```SQL
+CREATE MODEL IF NOT EXISTS `bqml_tutorial.sample_model`
+OPTIONS(model_type='logistic_reg', input_label_cols=["label"]) AS
+SELECT
+  IF(totals.transactions IS NULL, 0, 1) AS label,
+  IFNULL(geoNetwork.country, "") AS country
+FROM
+  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
+WHERE
+  _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'

Review comment:
       Does this table not have time partition enabled? 

##########
File path: website/www/site/content/en/documentation/patterns/bqml.md
##########
@@ -0,0 +1,98 @@
+---
+title: "BigQuery ML integration"
+---
+
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# BigQuery ML integration
+
+With the samples on this page we will demonstrate how to integrate models exported from [BigQuery ML (BQML)](https://cloud.google.com/bigquery-ml/docs) into your Apache Beam pipeline using [TFX Basic Shared Libraries (tfx_bsl)](https://github.com/tensorflow/tfx-bsl).
+
+Roughly, the sections below will go through the following steps in more detail:
+
+1. Create and train your BigQuery ML model
+1. Export your BigQuery ML model
+1. Create a transform that uses the brand-new BigQuery ML model
+
+## Create and train your BigQuery ML model
+
+To be able to incorporate your BQML model into an Apache Beam pipeline using tfx_bsl, it has to be in the [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) format. An overview that maps different model types to their export model format for BQML can be found [here](https://cloud.google.com/bigquery-ml/docs/exporting-models#export_model_formats_and_samples).
+
+For the sake of simplicity, we'll be training a (simplified version of the) logistic regression model in the [BQML quickstart guide](https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start), using the publicly available Google Analytics sample dataset. An overview of all models you can create using BQML can be found [here](https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in).
+
+After creating a BigQuery dataset, you continue to create the model, which is fully defined in SQL:
+
+```SQL
+CREATE MODEL IF NOT EXISTS `bqml_tutorial.sample_model`
+OPTIONS(model_type='logistic_reg', input_label_cols=["label"]) AS
+SELECT
+  IF(totals.transactions IS NULL, 0, 1) AS label,
+  IFNULL(geoNetwork.country, "") AS country
+FROM
+  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
+WHERE
+  _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
+```
+
+The model will predict if a purchase will be made given the country of the visitor on data gathered between 2016-08-01 and 2017-06-30.
+
+## Export your BigQuery ML model
+
+In order to incorporate your model in an Apache Beam pipeline, you will need to export it. Prerequisites to do so are [installing the `bq` command-line tool](https://cloud.google.com/bigquery/docs/bq-command-line-tool) and [creating a Google Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) to store your exported model.
+
+Export the model using the following command:
+
+```bash
+bq extract -m bqml_tutorial.sample_model gs://some/gcs/path
+```
+
+## Create an Apache Beam transform that uses your BigQuery ML model
+
+In this section we will construct an Apache Beam pipeline that will use the BigQuery ML model we just created and exported. The model can be served using Google Cloud AI Platform Prediction - for this please refer to the [AI Platform patterns](https://beam.apache.org/documentation/patterns/ai-platform/). In this case, we'll be illustrating how to use the tfx_bsl library to do local predictions (on your Apache Beam workers).
+
+First, the model needs to be downloaded to a local directory where you will be developing the rest of your pipeline (e.g. to `serving_dir/sample_model/1`).
+
+Then, you can start developing your pipeline like you would normally do. We will be using the `RunInference` PTransform from the [tfx_bsl](https://github.com/tensorflow/tfx-bsl) library, and we will point it to our local directory where the model is stored (see the `model_path` variable in the code example). The transform takes elements of the type `tf.train.Example` as inputs and outputs elements of the type `tensorflow_serving.apis.prediction_log_pb2.PredictionLog`.
+
+```python
+import apache_beam
+import tensorflow as tf
+from google.protobuf import text_format
+from tfx_bsl.beam import run_inference
+from tfx_bsl.public.beam import RunInference
+from tfx_bsl.public.proto import model_spec_pb2
+
+
+inputs = """
+features {
+    feature { key: "country" value { bytes_list { value: 'Belgium' }}}
+}
+"""
+
+def create_tf_example(json_obj):
+    return text_format.Parse(json_obj, tf.train.Example())
+
+model_path = "serving_dir/sample_model/1"
+
+with beam.Pipeline() as p:
+    res = (
+        p
+        | beam.Create([
+            create_tf_example(inputs)
+        ])
+        | RunInference(
+            model_spec_pb2.InferenceSpecType(
+                saved_model_spec=model_spec_pb2.SavedModelSpec(
+                    model_path=model_path,
+                    signature_name=['serving_default'])))
+```

Review comment:
       It would be useful to show how to parse the results coming from RunInference which is a predict log.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org