You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/20 12:49:34 UTC

[GitHub] [beam] shub-kris opened a new pull request, #25099: Add more info in documentation

shub-kris opened a new pull request, #25099:
URL: https://github.com/apache/beam/pull/25099

   This PR aims to add more information on  the arguments required to pass for running the large language model runinference pipeline. 
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] shub-kris commented on pull request #25099: Add more info in documentation

Posted by GitBox <gi...@apache.org>.

shub-kris commented on PR #25099:
URL: https://github.com/apache/beam/pull/25099#issuecomment-1398334631

   Hi, @damccorm please find the updated documentation. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on a diff in pull request #25099: Add more info in documentation

Posted by GitBox <gi...@apache.org>.

damccorm commented on code in PR #25099:
URL: https://github.com/apache/beam/pull/25099#discussion_r1082617853


##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -25,11 +25,12 @@ RunInference works well on arbitrarily large models as long as they can fit on y
 This example demonstrates running inference with a `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks. Each task is converted into a text-to-text format. The example uses `T5-11B`, which contains 11 billion parameters and is 45 GB in size. In  order to work well on a variety of tasks, `T5` prepends a different prefix to the input corresponding to each task. For example, for translation, the input would be: `translate English to German: …` and for summarization, it would be: `summarize: …`. For more information about `T5` see the [T5 overiew](https://huggingface.co/docs/transformers/model_doc/t5) in the HuggingFace documentation.
 
 ### Run the Pipeline ?
-First, install the required packages and pass the required arguments.
+First, install the required packages listed in [requirements.txt](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/large_language_modeling/requirements.txt) and pass the required arguments.
 You can view the code on [GitHub](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
 
-1. Locally on your machine: `python main.py --runner DirectRunner`. You need to have 45 GB of disk space available to run this example.
-2. On Google Cloud using Dataflow: `python main.py --runner DataflowRunner`
+1. Locally on your machine: `python main.py --runner DirectRunner --model_state_dict_path path_to_saved_model`. You need to have 45 GB of disk space available to run this example.
+2. On Google Cloud using Dataflow: `python main.py --runner DataflowRunner  --model_state_dict_path path_to_saved_model --project PROJECT_ID

Review Comment:
   ```suggestion
   1. Locally on your machine: `python main.py --runner DirectRunner --model_state_dict_path <local or remote path to state_dict> path_to_saved_model <local or remote path to saved model file> `. You need to have 45 GB of disk space available to run this example.
   2. On Google Cloud using Dataflow: `python main.py --runner DataflowRunner  --model_state_dict_path <local or remote path to state_dict> path_to_saved_model <local or remote path to saved model file>  --project PROJECT_ID
   ```



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -25,11 +25,12 @@ RunInference works well on arbitrarily large models as long as they can fit on y
 This example demonstrates running inference with a `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks. Each task is converted into a text-to-text format. The example uses `T5-11B`, which contains 11 billion parameters and is 45 GB in size. In  order to work well on a variety of tasks, `T5` prepends a different prefix to the input corresponding to each task. For example, for translation, the input would be: `translate English to German: …` and for summarization, it would be: `summarize: …`. For more information about `T5` see the [T5 overiew](https://huggingface.co/docs/transformers/model_doc/t5) in the HuggingFace documentation.
 
 ### Run the Pipeline ?
-First, install the required packages and pass the required arguments.
+First, install the required packages listed in [requirements.txt](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/large_language_modeling/requirements.txt) and pass the required arguments.
 You can view the code on [GitHub](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
 
-1. Locally on your machine: `python main.py --runner DirectRunner`. You need to have 45 GB of disk space available to run this example.
-2. On Google Cloud using Dataflow: `python main.py --runner DataflowRunner`
+1. Locally on your machine: `python main.py --runner DirectRunner --model_state_dict_path path_to_saved_model`. You need to have 45 GB of disk space available to run this example.
+2. On Google Cloud using Dataflow: `python main.py --runner DataflowRunner  --model_state_dict_path path_to_saved_model --project PROJECT_ID

Review Comment:
   Could you also please include instructions on pulling the model to get the state_dict/saved model file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm merged pull request #25099: Add more info in documentation

Posted by "damccorm (via GitHub)" <gi...@apache.org>.

damccorm merged PR #25099:
URL: https://github.com/apache/beam/pull/25099


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] shub-kris commented on a diff in pull request #25099: Add more info in documentation

Posted by "shub-kris (via GitHub)" <gi...@apache.org>.

shub-kris commented on code in PR #25099:
URL: https://github.com/apache/beam/pull/25099#discussion_r1084145402


##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -25,11 +25,12 @@ RunInference works well on arbitrarily large models as long as they can fit on y
 This example demonstrates running inference with a `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks. Each task is converted into a text-to-text format. The example uses `T5-11B`, which contains 11 billion parameters and is 45 GB in size. In  order to work well on a variety of tasks, `T5` prepends a different prefix to the input corresponding to each task. For example, for translation, the input would be: `translate English to German: …` and for summarization, it would be: `summarize: …`. For more information about `T5` see the [T5 overiew](https://huggingface.co/docs/transformers/model_doc/t5) in the HuggingFace documentation.
 
 ### Run the Pipeline ?
-First, install the required packages and pass the required arguments.
+First, install the required packages listed in [requirements.txt](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/large_language_modeling/requirements.txt) and pass the required arguments.
 You can view the code on [GitHub](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
 
-1. Locally on your machine: `python main.py --runner DirectRunner`. You need to have 45 GB of disk space available to run this example.
-2. On Google Cloud using Dataflow: `python main.py --runner DataflowRunner`
+1. Locally on your machine: `python main.py --runner DirectRunner --model_state_dict_path path_to_saved_model`. You need to have 45 GB of disk space available to run this example.
+2. On Google Cloud using Dataflow: `python main.py --runner DataflowRunner  --model_state_dict_path path_to_saved_model --project PROJECT_ID

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org