You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/15 04:59:02 UTC

[GitHub] [spark] cloud-fan commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

cloud-fan commented on code in PR #38653:
URL: https://github.com/apache/spark/pull/38653#discussion_r1022318950


##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -316,6 +319,36 @@ message StatCrosstab {
   string col2 = 3;
 }
 
+// Replaces null values.
+// It will invoke 'Dataset.na.fill' (same as 'DataFrameNaFunctions.fill') to compute the results.
+// Following 3 parameter combinations are supported:
+//  1, 'values' only contains 1 item, 'cols' is empty:
+//    replaces null values in all type-matched columns.
+//  2, 'values' only contains 1 item, 'cols' is not empty:
+//    replaces null values in specified columns.
+//  3, 'values' contains more than 1 items, then 'cols' is required to have the same length:
+//    replaces each specified column with corresponding value.
+message NAFill {
+  // (Required) The input relation.
+  Relation input = 1;
+
+  // (Optional) Optional list of column names to consider.
+  repeated string cols = 2;
+
+  // (Required) Values to replace null values with. Should contains at least 1 item.
+  repeated Type values = 3;
+
+  // Available data types.
+  message Type {

Review Comment:
   It seems more flexible to restrict the types in the server side. Once we relax the restriction in the future, people don't need to update the clients.



##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -316,6 +319,36 @@ message StatCrosstab {
   string col2 = 3;
 }
 
+// Replaces null values.
+// It will invoke 'Dataset.na.fill' (same as 'DataFrameNaFunctions.fill') to compute the results.
+// Following 3 parameter combinations are supported:
+//  1, 'values' only contains 1 item, 'cols' is empty:
+//    replaces null values in all type-matched columns.
+//  2, 'values' only contains 1 item, 'cols' is not empty:
+//    replaces null values in specified columns.
+//  3, 'values' contains more than 1 items, then 'cols' is required to have the same length:
+//    replaces each specified column with corresponding value.
+message NAFill {
+  // (Required) The input relation.
+  Relation input = 1;
+
+  // (Optional) Optional list of column names to consider.
+  repeated string cols = 2;
+
+  // (Required) Values to replace null values with. Should contains at least 1 item.
+  repeated Type values = 3;
+
+  // Available data types.
+  message Type {

Review Comment:
   It seems more flexible to restrict the types in the server side. Once we relax the restriction in the future, people don't need to upgrade the clients.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org