You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by da...@apache.org on 2023/02/24 15:12:59 UTC

[beam] branch master updated: restructure ml overview website page (#25607)

This is an automated email from the ASF dual-hosted git repository.

damccorm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new bddfd86afd2 restructure ml overview website page (#25607)
bddfd86afd2 is described below

commit bddfd86afd26e12fab01ac5fdacc2b387fc598e8
Author: Juta Staes <ju...@ml6.eu>
AuthorDate: Fri Feb 24 16:12:49 2023 +0100

    restructure ml overview website page (#25607)
    
    * restructure ml overview website page
    
    * small edits ML website
---
 .../www/site/content/en/documentation/ml/overview.md | 20 +++++++++++++++-----
 .../partials/section-menu/en/documentation.html      |  6 +++---
 website/www/site/static/images/ml-workflows.svg      |  2 +-
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/website/www/site/content/en/documentation/ml/overview.md b/website/www/site/content/en/documentation/ml/overview.md
index dabe7e9629f..feec2c4e807 100644
--- a/website/www/site/content/en/documentation/ml/overview.md
+++ b/website/www/site/content/en/documentation/ml/overview.md
@@ -36,12 +36,12 @@ Let’s take a look at the different building blocks that we need to create an e
 2. **Data validation**: After you receieve your data, check the quality of your data. For example, you might want to detect outliers and calculate standard deviations and class distributions.
 3. **Data preprocessing**: After you validate your data, transform the data so that it is ready to use to train your model.
 4. Model training: When your data is ready, you can start training your AI/ML model. This step is typically repeated multiple times, depending on the quality of your trained model.
-5. Model validation: Before you deploy your new model, validate its performance and accuracy.
+5. **Model validation**: Before you deploy your new model, validate its performance and accuracy.
 6. **Model deployment**: Deploy your model, using it to run inference on new or existing data.
 
 To keep your model up to date and performing well as your data grows and evolves, run these steps multiple times. In addition, you can apply MLOps to your project to automate the AI/ML workflows throughout the model and data lifecycle. Use orchestrators to automate this flow and to handle the transition between the different building blocks in your project.
 
-You can use Apache Beam for data validation, data preprocessing, and model deployment/inference. The next section examines these building blocks in more detail and explores how they can be orchestrated.
+You can use Apache Beam for data validation, data preprocessing, model validation, and model deployment/inference. The next section examines these building blocks in more detail and explores how they can be orchestrated.
 
 ## Data processing
 
@@ -62,10 +62,12 @@ Beam provides different ways to implement inference as part of your pipeline. Yo
 
 The recommended way to implement inference is by using the [RunInference API](/documentation/sdks/python-machine-learning/). RunInference takes advantage of existing Apache Beam concepts, such as the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
 
-You can integrate your model in your pipeline by using the corresponding model handlers. A `ModelHandler` is an object that wraps the underlying model and allows you to configure its parameters. Model handlers are available for PyTorch, scikit-learn, and TensorFlow. Examples of how to use RunInference for PyTorch, scikit-learn, and TensorFlow are shown in this [notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb).
+You can integrate your model in your pipeline by using the corresponding model handlers. A `ModelHandler` is an object that wraps the underlying model and allows you to configure its parameters. Model handlers are available for PyTorch, scikit-learn, and TensorFlow. Examples of how to use RunInference for PyTorch, scikit-learn, and TensorFlow are shown in the [RunInference notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_skl [...]
 
 Because they can process multiple computations simultaneously, GPUs are optimized for training artificial intelligence and deep learning models. RunInference also allows you to use GPUs for significant inference speedup. An example of how to use RunInference with GPUs is demonstrated on the [RunInference metrics](/documentation/ml/runinference-metrics) page.
 
+Another usecase of running machine learning models is to run them on hardware devices. [Nvidia TensorRT](https://developer.nvidia.com/tensorrt) is a machine learning framework used to run inference on Nvidia hardware. See [TensorRT Inference](/documentation/ml/tensorrt-runinference) for an example of a pipeline that uses TensorRT and Beam with the RunInference transform and a BERT-based text classification model.
+
 ### Custom Inference
 
 The RunInference API doesn't currently support making remote inference calls using, for example, the Natural Language API or the Cloud Vision API. Therefore, in order to use these remote APIs with Apache Beam, you need to write custom inference calls. The [Remote inference in Apache Beam notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/custom_remote_inference.ipynb) shows how to implement a custom remote inference call using `beam.DoFn`. When you implement  [...]
@@ -78,6 +80,12 @@ The RunInference API doesn't currently support making remote inference calls usi
 
 * Consider monitoring and measuring the performance of a pipeline when deploying, because monitoring can provide insight into the status and health of the application.
 
+## Model validation
+
+Model validation allows you to benchmark your model’s performance against a previously unseen dataset. You can extract chosen metrics, create visualizations, log metadata, and compare the performance of different models with the end goal of validating whether your model is ready to deploy. Beam provides support for running model evaluation on a TensorFlow model directly inside your pipeline.
+
+Further reading:
+* [ML model evaluation](/documentation/ml/model-evaluation): Illustrates how to integrate model evaluation as part of your pipeline by using [TensorFlow Model Analysis (TFMA)](https://www.tensorflow.org/tfx/guide/tfma).
 
 ## Orchestrators
 
@@ -85,13 +93,15 @@ In order to automate and track the AI/ML workflows throughout your project, you
 
 When you use Apache Beam as one of the building blocks in your project, these orchestrators are able to launch your Apache Beam job and to keep track of the input and output of your pipeline. These tasks are essential when moving your AI/ML solution into production, because they allow you to handle your model and data over time and improve the quality and reproducibility of results.
 
+Further reading:
+* [ML Workflow Orchestration](/documentation/ml/orchestration): Illustrates how to orchestrate ML workflows consisting of multiple steps by using Kubeflow Pipelines and Tensorflow Extended.
+
 ## Examples
 
 You can find examples of end-to-end AI/ML pipelines for several use cases:
-* [ML Workflow Orchestration](/documentation/ml/orchestration): Illustrates how to orchestrate ML workflows consisting of multiple steps by using Kubeflow Pipelines and Tensorflow Extended.
+
 * [Multi model pipelines in Beam](/documentation/ml/multi-model-pipelines): Explains how multi-model pipelines work and gives an overview of what you need to know to build one using the RunInference API.
 * [Online Clustering in Beam](/documentation/ml/online-clustering): Demonstrates how to set up a real-time clustering pipeline that can read text from Pub/Sub, convert the text into an embedding using a transformer-based language model with the RunInference API, and cluster the text using BIRCH with stateful processing.
 * [Anomaly Detection in Beam](/documentation/ml/anomaly-detection): Demonstrates how to set up an anomaly detection pipeline that reads text from Pub/Sub in real time and then detects anomalies using a trained HDBSCAN clustering model with the RunInference API.
 * [Large Language Model Inference in Beam](/documentation/ml/large-language-modeling): Demonstrates a pipeline that uses RunInference to perform translation with the T5 language model which contains 11 billion parameters.
 * [Per Entity Training in Beam](/documentation/ml/per-entity-training): Demonstrates a pipeline that trains a Decision Tree Classifier per education level for predicting if the salary of a person is >= 50k.
-* [TensorRT Inference](/documentation/ml/tensorrt-runinference): Demonstrates a pipeline that uses TensorRT with the RunInference transform and a BERT-based text classification model.
diff --git a/website/www/site/layouts/partials/section-menu/en/documentation.html b/website/www/site/layouts/partials/section-menu/en/documentation.html
index 2722baf9bb7..61d7aa9fe35 100755
--- a/website/www/site/layouts/partials/section-menu/en/documentation.html
+++ b/website/www/site/layouts/partials/section-menu/en/documentation.html
@@ -217,16 +217,16 @@
 
   <ul class="section-nav-list">
     <li><a href="/documentation/ml/overview/">Overview</a></li>
-    <li><a href="/documentation/ml/orchestration/">Workflow Orchestration</a></li>
     <li><a href="/documentation/ml/data-processing/">Data processing</a></li>
+    <li><a href="/documentation/ml/runinference-metrics/">RunInference Metrics</a></li>
+    <li><a href="/documentation/ml/tensorrt-runinference">TensorRT Inference</a></li>
     <li><a href="/documentation/ml/model-evaluation/">Model evaluation</a></li>
+    <li><a href="/documentation/ml/orchestration/">Workflow Orchestration</a></li>
     <li><a href="/documentation/ml/multi-model-pipelines/">Multi-model pipelines</a></li>
     <li><a href="/documentation/ml/online-clustering/">Online Clustering</a></li>
-    <li><a href="/documentation/ml/runinference-metrics/">RunInference Metrics</a></li>
     <li><a href="/documentation/ml/anomaly-detection/">Anomaly Detection</a></li>
     <li><a href="/documentation/ml/large-language-modeling">Large Language Model Inference in Beam</a></li>
     <li><a href="/documentation/ml/per-entity-training">Per Entity Training in Beam</a></li>
-    <li><a href="/documentation/ml/tensorrt-runinference">TensorRT Inference</a></li>
   </ul>
 </li>
 <li class="section-nav-item--collapsible">
diff --git a/website/www/site/static/images/ml-workflows.svg b/website/www/site/static/images/ml-workflows.svg
index 2a9cb3c1f27..90130a40672 100755
--- a/website/www/site/static/images/ml-workflows.svg
+++ b/website/www/site/static/images/ml-workflows.svg
@@ -14,4 +14,4 @@ limitations under the License.
 -->
 <!-- Do not edit this file with editors other than diagrams.net -->
 <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
-<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="890px" height="290px" viewBox="-0.5 -0.5 890 290" content="&lt;mxfile host=&quot;app.diagrams.net&quot; modified=&quot;2022-09-30T09:13:42.911Z&quot; agent=&quot;5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36&quot; etag=&quot;jwziFvvco7NdpbSyV7Nh&quot; version=&quot;20.3.7&quot; type=&quot;google&quot;&gt;&lt;diagram id=&quot;8C2 [...]
\ No newline at end of file
+<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="890px" height="290px" viewBox="-0.5 -0.5 890 290" content="&lt;mxfile host=&quot;app.diagrams.net&quot; modified=&quot;2023-02-23T11:08:52.799Z&quot; agent=&quot;5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36&quot; etag=&quot;t7LQG5DbOQ_CrP_ZUB0z&quot; version=&quot;20.8.23&quot; type=&quot;google&quot;&gt;&lt;diagram id=&quot;8C [...]
\ No newline at end of file