You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "lu-wang-dl (via GitHub)" <gi...@apache.org> on 2023/07/06 23:43:53 UTC

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #41770: [WIP] Write a Deepspeed Distributed Learning Class DeepspeedTorchDistributor

lu-wang-dl commented on code in PR #41770:
URL: https://github.com/apache/spark/pull/41770#discussion_r1255024025


##########
python/pyspark/ml/torch/distributor.py:
##########
@@ -1003,3 +1007,97 @@ def _get_spark_partition_data_loader(
         # if num_workers is zero, we cannot set `prefetch_factor` otherwise
         # torch will raise error.
         return DataLoader(dataset, batch_size, num_workers=num_workers)
+
+
+class DeepspeedTorchDistributor(TorchDistributor):
+    
+    def __init__(self, num_processes: int = 1, local_mode: bool = True, use_gpu: bool = True, deepspeed_config = None):
+        super().__init__(num_processes, local_mode, use_gpu)
+        self.deepspeed_config = deepspeed_config 
+        self.ssl_conf = "deepspeed.spark.distributor.ignoreSsl"

Review Comment:
   Do we need this conf?



##########
python/pyspark/ml/torch/distributor.py:
##########
@@ -1003,3 +1007,97 @@ def _get_spark_partition_data_loader(
         # if num_workers is zero, we cannot set `prefetch_factor` otherwise
         # torch will raise error.
         return DataLoader(dataset, batch_size, num_workers=num_workers)
+
+
+class DeepspeedTorchDistributor(TorchDistributor):
+    
+    def __init__(self, num_processes: int = 1, local_mode: bool = True, use_gpu: bool = True, deepspeed_config = None):

Review Comment:
   What is the input format for `deepspeed_config`? The path for deepspeed config?
   You may need to add doc strings.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org