You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/30 06:19:16 UTC

[GitHub] [spark] amaliujia commented on a diff in pull request #39240: [SPARK-41440][CONNECT][PYTHON] Avoid the cache operator for general Sample.

amaliujia commented on code in PR #39240:
URL: https://github.com/apache/spark/pull/39240#discussion_r1059256894


##########
connector/connect/common/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -353,9 +353,10 @@ message Sample {
   // (Optional) The random seed.
   optional int64 seed = 5;
 
-  // (Optional) Explicitly sort the underlying plan to make the ordering deterministic.
-  // This flag is only used to randomly splits DataFrame with the provided weights.
-  optional bool force_stable_sort = 6;
+  // (Required) Explicitly sort the underlying plan to make the ordering deterministic or cache it.
+  // This flag is true when invoking `dataframe.randomSplit` to randomly splits DataFrame with the
+  // provided weights. Otherwise, it is false.
+  bool deterministic_order = 6;

Review Comment:
   I think the most useful principle is if you are not satisfied the default value of a field and you must know whether it is set or not: then use optional. Otherwise leave it as default field access rule.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org