You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "shub-kris (via GitHub)" <gi...@apache.org> on 2023/01/23 14:45:39 UTC

[GitHub] [beam] shub-kris commented on a diff in pull request #25099: Add more info in documentation

shub-kris commented on code in PR #25099:
URL: https://github.com/apache/beam/pull/25099#discussion_r1084145402


##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -25,11 +25,12 @@ RunInference works well on arbitrarily large models as long as they can fit on y
 This example demonstrates running inference with a `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks. Each task is converted into a text-to-text format. The example uses `T5-11B`, which contains 11 billion parameters and is 45 GB in size. In  order to work well on a variety of tasks, `T5` prepends a different prefix to the input corresponding to each task. For example, for translation, the input would be: `translate English to German: …` and for summarization, it would be: `summarize: …`. For more information about `T5` see the [T5 overiew](https://huggingface.co/docs/transformers/model_doc/t5) in the HuggingFace documentation.
 
 ### Run the Pipeline ?
-First, install the required packages and pass the required arguments.
+First, install the required packages listed in [requirements.txt](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/large_language_modeling/requirements.txt) and pass the required arguments.
 You can view the code on [GitHub](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
 
-1. Locally on your machine: `python main.py --runner DirectRunner`. You need to have 45 GB of disk space available to run this example.
-2. On Google Cloud using Dataflow: `python main.py --runner DataflowRunner`
+1. Locally on your machine: `python main.py --runner DirectRunner --model_state_dict_path path_to_saved_model`. You need to have 45 GB of disk space available to run this example.
+2. On Google Cloud using Dataflow: `python main.py --runner DataflowRunner  --model_state_dict_path path_to_saved_model --project PROJECT_ID

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org