You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "rszper (via GitHub)" <gi...@apache.org> on 2023/03/23 21:43:13 UTC

[GitHub] [beam] rszper commented on a diff in pull request #25947: Add documentation for the auto model updates

rszper commented on code in PR #25947:
URL: https://github.com/apache/beam/pull/25947#discussion_r1146891464


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline that uses ML models, see the
 [Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler

Review Comment:
   Is slowly updating a type of side input pattern? If so, I think it should be hyphenated:
   
   Slowly-updating side input pattern



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline that uses ML models, see the
 [Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler
+The RunInference PTransform will accept a side input of [ModelMetadata](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata), which is a `NamedTuple` containing the `model_id` and `model_name`,
+to update the models used by the ModelHandler in the RunInference PTransform without the need of stopping the pipeline for the model updates.
+  * `model_id`: Unique identifier for the model. This can be a file path or a URL where the model can be accessed. It is used to load the model for inference.
+  * `model_name`: Human-readable name for the model. This can be used to identify the model in the metrics generated by the RunInference transform.
+
+**Note**: The side input PCollection must follow [AsSingleton](https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html?highlight=assingleton#apache_beam.pvalue.AsSingleton) view or the pipeline will result in error.
+
+**Note**: If the main PCollection emits inputs and side input has yet to receive inputs, the main PCollection will get buffered until there is

Review Comment:
   ```suggestion
   **Note**: If the main PCollection emits inputs and a side input has yet to receive inputs, the main PCollection is buffered until there is
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline that uses ML models, see the
 [Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler
+The RunInference PTransform will accept a side input of [ModelMetadata](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata), which is a `NamedTuple` containing the `model_id` and `model_name`,
+to update the models used by the ModelHandler in the RunInference PTransform without the need of stopping the pipeline for the model updates.
+  * `model_id`: Unique identifier for the model. This can be a file path or a URL where the model can be accessed. It is used to load the model for inference.
+  * `model_name`: Human-readable name for the model. This can be used to identify the model in the metrics generated by the RunInference transform.
+
+**Note**: The side input PCollection must follow [AsSingleton](https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html?highlight=assingleton#apache_beam.pvalue.AsSingleton) view or the pipeline will result in error.
+
+**Note**: If the main PCollection emits inputs and side input has yet to receive inputs, the main PCollection will get buffered until there is

Review Comment:
   We generally don't want two notes in a row. Does this content need to be in a note? Can it just be text in the file? It both sections must be notes, could they be combined into one note?



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline that uses ML models, see the
 [Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler
+The RunInference PTransform will accept a side input of [ModelMetadata](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata), which is a `NamedTuple` containing the `model_id` and `model_name`,
+to update the models used by the ModelHandler in the RunInference PTransform without the need of stopping the pipeline for the model updates.
+  * `model_id`: Unique identifier for the model. This can be a file path or a URL where the model can be accessed. It is used to load the model for inference.
+  * `model_name`: Human-readable name for the model. This can be used to identify the model in the metrics generated by the RunInference transform.
+
+**Note**: The side input PCollection must follow [AsSingleton](https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html?highlight=assingleton#apache_beam.pvalue.AsSingleton) view or the pipeline will result in error.
+
+**Note**: If the main PCollection emits inputs and side input has yet to receive inputs, the main PCollection will get buffered until there is
+            an update to the side input. This could happen with Global windowed side inputs with data driven triggers such as `AfterCount`, `AfterProcessingTime`. So until there is an update to the side input, emit the default/initial model id that is used to pass the respective `ModelHandler` as side input..

Review Comment:
   ```suggestion
               an update to the side input. This could happen with Global windowed side inputs with data driven triggers such as `AfterCount`, `AfterProcessingTime`. Until the side input is updated, emit the default or initial model ID that is used to pass the respective `ModelHandler` as a side input.
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -243,6 +243,17 @@ For more information, see the [`PredictionResult` documentation](https://github.
 For detailed instructions explaining how to build and run a Python pipeline that uses ML models, see the
 [Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
 
+## Slowly updating side input pattern to update models used by ModelHandler
+The RunInference PTransform will accept a side input of [ModelMetadata](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata), which is a `NamedTuple` containing the `model_id` and `model_name`,

Review Comment:
   ```suggestion
   To update models used by the model handler in the RunInference PTransform without stopping the pipeline, use a [`ModelMetadata`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelMetadata) side input, which is a `NamedTuple` containing the `model_id` and `model_name`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org