You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "mathewjacob1002 (via GitHub)" <gi...@apache.org> on 2023/07/20 05:17:20 UTC

[GitHub] [spark] mathewjacob1002 opened a new pull request, #42087: [SPARK-44264] Added Example to Deepspeed Distributor

mathewjacob1002 opened a new pull request, #42087:
URL: https://github.com/apache/spark/pull/42087

   ### What changes were proposed in this pull request?
   Added examples to the docstring of using DeepspeedTorchDistributor
   
   
   ### Why are the changes needed?
   More concrete examples, allowing for a better understanding of feature.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, docs changes. 
   
   
   ### How was this patch tested?
   make html
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42087: [SPARK-44264] Added Example to Deepspeed Distributor

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #42087:
URL: https://github.com/apache/spark/pull/42087#discussion_r1274230550


##########
python/pyspark/ml/deepspeed/deepspeed_distributor.py:
##########
@@ -61,6 +61,30 @@ def __init__(
             The configuration file to be used for launching the deepspeed application.
             If it's a dictionary containing the parameters, then we will create the file.
             If None, deepspeed will fall back to default parameters.
+
+        Examples
+        --------
+        Run Deepspeed training function on a single node
+
+        >>> def train(learning_rate):
+                import deepspeed
+                # rest of training function
+                return model
+        >>> distributor = DeepspeedTorchDistributor(num_gpus=4,

Review Comment:
   All these arguments should be like `numGpus` because PySpark follows camelCase for user facing API. Another option is to make the argument one word. like `ngpus`, `nnodes`, `gpu`, `local`, `deepspeedconf`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rithwik-db commented on a diff in pull request #42087: [SPARK-44264] Added Example to Deepspeed Distributor

Posted by "rithwik-db (via GitHub)" <gi...@apache.org>.
rithwik-db commented on code in PR #42087:
URL: https://github.com/apache/spark/pull/42087#discussion_r1274259415


##########
python/pyspark/ml/deepspeed/deepspeed_distributor.py:
##########
@@ -35,11 +35,11 @@ class DeepspeedTorchDistributor(TorchDistributor):
 
     def __init__(
         self,
-        num_gpus: int = 1,
+        numGpus: int = 1,

Review Comment:
   I think it would be best to stay as close to the Deepspeed CLI as possible; [Deepspeed](https://www.deepspeed.ai/getting-started/) uses `--num_gpus` so I think `numGpus` would be closer to that API than `ngpus`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42087: [SPARK-44264][PYTHON][DOCS] Added Example to Deepspeed Distributor

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42087:
URL: https://github.com/apache/spark/pull/42087#issuecomment-1651124197

   Merged to master and branch-3.5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42087: [SPARK-44264] Added Example to Deepspeed Distributor

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #42087:
URL: https://github.com/apache/spark/pull/42087#discussion_r1274230031


##########
python/pyspark/ml/deepspeed/deepspeed_distributor.py:
##########
@@ -61,6 +61,30 @@ def __init__(
             The configuration file to be used for launching the deepspeed application.
             If it's a dictionary containing the parameters, then we will create the file.
             If None, deepspeed will fall back to default parameters.
+
+        Examples
+        --------
+        Run Deepspeed training function on a single node
+
+        >>> def train(learning_rate):
+                import deepspeed
+                # rest of training function
+                return model
+        >>> distributor = DeepspeedTorchDistributor(num_gpus=4,
+                                                    nnodes=1,
+                                                    use_gpu=True,
+                                                    local_mode=True,
+                                                    deepspeed_config="path/to/config.json")
+        >>> output = distributor.run(train, 0.01)
+
+        Run Deepspeed training function on multiple nodes
+
+        >>> distributor = DeepspeedTorchDistributor(num_gpus=4,
+                                                    nnodes=3,
+                                                    use_gpu=True,
+                                                    local_mode=False,
+                                                    deepspeed_config="path/to/config.json")

Review Comment:
   ```suggestion
           >>> def train(learning_rate):
           ...     import deepspeed
           ...     # rest of training function
           ...     return model
           >>> distributor = DeepspeedTorchDistributor(
           ...     num_gpus=4,
           ...     nnodes=1,
           ...     use_gpu=True,
           ...     local_mode=True,
           ...     deepspeed_config="path/to/config.json")
           >>> output = distributor.run(train, 0.01)
   
           Run Deepspeed training function on multiple nodes
   
           >>> distributor = DeepspeedTorchDistributor(
           ...     num_gpus=4,
           ...     nnodes=3,
           ...     use_gpu=True,
           ...     local_mode=False,
           ...     deepspeed_config="path/to/config.json")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maddiedawson commented on pull request #42087: [SPARK-44264] Added Example to Deepspeed Distributor

Posted by "maddiedawson (via GitHub)" <gi...@apache.org>.
maddiedawson commented on PR #42087:
URL: https://github.com/apache/spark/pull/42087#issuecomment-1650622457

   @HyukjinKwon ptal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maddiedawson commented on a diff in pull request #42087: [SPARK-44264] Added Example to Deepspeed Distributor

Posted by "maddiedawson (via GitHub)" <gi...@apache.org>.
maddiedawson commented on code in PR #42087:
URL: https://github.com/apache/spark/pull/42087#discussion_r1269803233


##########
python/pyspark/ml/deepspeed/deepspeed_distributor.py:
##########
@@ -61,6 +61,30 @@ def __init__(
             The configuration file to be used for launching the deepspeed application.
             If it's a dictionary containing the parameters, then we will create the file.
             If None, deepspeed will fall back to default parameters.
+
+        Examples
+        --------
+        Run Deepspeed training function on a single node
+
+        >>> def train(learning_rate):
+            import deepspeed
+            # rest of training function
+            return model

Review Comment:
   Add an indent here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42087: [SPARK-44264] Added Example to Deepspeed Distributor

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #42087:
URL: https://github.com/apache/spark/pull/42087#discussion_r1274257605


##########
python/pyspark/ml/deepspeed/deepspeed_distributor.py:
##########
@@ -35,11 +35,11 @@ class DeepspeedTorchDistributor(TorchDistributor):
 
     def __init__(
         self,
-        num_gpus: int = 1,
+        numGpus: int = 1,

Review Comment:
   cc @rithwik-db on the naming. Is the name like `ngpus`, `local` better? or the current ones are better? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rithwik-db commented on a diff in pull request #42087: [SPARK-44264] Added Example to Deepspeed Distributor

Posted by "rithwik-db (via GitHub)" <gi...@apache.org>.
rithwik-db commented on code in PR #42087:
URL: https://github.com/apache/spark/pull/42087#discussion_r1274259415


##########
python/pyspark/ml/deepspeed/deepspeed_distributor.py:
##########
@@ -35,11 +35,11 @@ class DeepspeedTorchDistributor(TorchDistributor):
 
     def __init__(
         self,
-        num_gpus: int = 1,
+        numGpus: int = 1,

Review Comment:
   I think it would be best to stay as similar to the Deepspeed cli as possible; [Deepspeed](https://www.deepspeed.ai/getting-started/) uses `--num_gpus` so I think `numGpus` would be closer to that API than `ngpus`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #42087: [SPARK-44264][PYTHON][DOCS] Added Example to Deepspeed Distributor

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #42087: [SPARK-44264][PYTHON][DOCS] Added Example to Deepspeed Distributor
URL: https://github.com/apache/spark/pull/42087


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org