You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "srowen (via GitHub)" <gi...@apache.org> on 2023/08/09 15:26:25 UTC

[GitHub] [spark] srowen commented on a diff in pull request #42382: [ML] Remove usage of RDD APIs for load/save in spark-ml

srowen commented on code in PR #42382:
URL: https://github.com/apache/spark/pull/42382#discussion_r1288721492


##########
python/pyspark/ml/util.py:
##########
@@ -437,7 +437,7 @@ def extractJsonParams(instance: "Params", skipParams: Sequence[str]) -> Dict[str
     def saveMetadata(
         instance: "Params",
         path: str,
-        sc: SparkContext,
+        sparkSession: SparkSession,

Review Comment:
   Same comment here about retaining compatibility



##########
mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala:
##########
@@ -405,12 +405,14 @@ private[ml] object DefaultParamsWriter {
   def saveMetadata(
       instance: Params,
       path: String,
-      sc: SparkContext,

Review Comment:
   I wonder if we should retain the existing SparkContext method. Third party libraries would still use it, unless they later make a change like you're making to Spark ML. So this would break them. When it's easy enough to retain (and deprecate?) this shared method. Same for loadMetadata



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org