You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/28 23:38:13 UTC

[GitHub] [spark] rithwik-db opened a new pull request, #39267: [SPARK-41592] Pytorch file Distributed Training

rithwik-db opened a new pull request, #39267:
URL: https://github.com/apache/spark/pull/39267

NOTE: If you want to only view the diff from the other WIP PR regarding the baseline changes, look at the LAST COMMIT in this PR's commit history (titled Running PyTorch Files Distributed-ly). Since I am sending out parallel PRs that are related, you should view this commit to see the diff pertaining to this ticket.

### What changes were proposed in this pull request?

This is an addition to https://github.com/apache/spark/pull/39188 to add support for multi node training using PyTorch files. The users would follow the second workflow in the [design document](https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit#heading=h.8yvw9xq428fh) to run training on the executors. I added some new utility functions as well as built on top of current functions. This is largely WIP so testing will be added very soon.

### Why are the changes needed?

Look at the [main ticket](https://issues.apache.org/jira/browse/SPARK-41589) for more details.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested with a pseudo-integration test (doesn't actually reflect pytorch involvement yet)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org