You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/29 07:55:57 UTC

[GitHub] [spark] grundprinzip commented on a diff in pull request #39283: [SPARK-41767][CONNECT][PYTHON] Implement `Column.{withField, dropFields}`

grundprinzip commented on code in PR #39283:
URL: https://github.com/apache/spark/pull/39283#discussion_r1058789739


##########
connector/connect/common/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -241,6 +242,20 @@ message Expression {
     Expression extraction = 2;
   }
 
+  // Add, replace or drop field of `StructType` into `col` expression by name.

Review Comment:
   ... a field ...



##########
connector/connect/common/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -241,6 +242,20 @@ message Expression {
     Expression extraction = 2;
   }
 
+  // Add, replace or drop field of `StructType` into `col` expression by name.
+  message UpdateFields {
+    // (Required) The struct expression
+    Expression struct_expression = 1;
+
+    // (Required) The field name

Review Comment:
   .



##########
connector/connect/common/src/main/protobuf/spark/connect/expressions.proto:
##########
@@ -241,6 +242,20 @@ message Expression {
     Expression extraction = 2;
   }
 
+  // Add, replace or drop field of `StructType` into `col` expression by name.
+  message UpdateFields {
+    // (Required) The struct expression

Review Comment:
   .



##########
python/pyspark/sql/connect/expressions.py:
##########
@@ -420,6 +420,59 @@ def __repr__(self) -> str:
             return f"{self._name}({', '.join([str(arg) for arg in self._args])})"
 
 
+class WithField(Expression):
+    def __init__(
+        self,
+        structExpr: Expression,
+        fieldName: str,
+        valueExpr: Expression,
+    ) -> None:
+        super().__init__()
+
+        assert isinstance(structExpr, Expression)
+        self._structExpr = structExpr
+
+        assert isinstance(fieldName, str)
+        self._fieldName = fieldName
+
+        assert isinstance(valueExpr, Expression)
+        self._valueExpr = valueExpr
+
+    def to_plan(self, session: "SparkConnectClient") -> proto.Expression:
+        expr = proto.Expression()
+        expr.update_fields.struct_expression.CopyFrom(self._structExpr.to_plan(session))
+        expr.update_fields.field_name = self._fieldName
+        expr.update_fields.value_expression.CopyFrom(self._valueExpr.to_plan(session))
+        return expr
+
+    def __repr__(self) -> str:

Review Comment:
   Shouldn't this be handled automatically now?



##########
python/pyspark/sql/connect/expressions.py:
##########
@@ -420,6 +420,59 @@ def __repr__(self) -> str:
             return f"{self._name}({', '.join([str(arg) for arg in self._args])})"
 
 
+class WithField(Expression):
+    def __init__(
+        self,
+        structExpr: Expression,
+        fieldName: str,
+        valueExpr: Expression,
+    ) -> None:
+        super().__init__()
+
+        assert isinstance(structExpr, Expression)
+        self._structExpr = structExpr
+
+        assert isinstance(fieldName, str)
+        self._fieldName = fieldName
+
+        assert isinstance(valueExpr, Expression)
+        self._valueExpr = valueExpr
+
+    def to_plan(self, session: "SparkConnectClient") -> proto.Expression:
+        expr = proto.Expression()
+        expr.update_fields.struct_expression.CopyFrom(self._structExpr.to_plan(session))
+        expr.update_fields.field_name = self._fieldName
+        expr.update_fields.value_expression.CopyFrom(self._valueExpr.to_plan(session))
+        return expr
+
+    def __repr__(self) -> str:
+        return f"WithField({self._structExpr}, {self._fieldName}, {self._valueExpr})"
+
+
+class DropField(Expression):
+    def __init__(
+        self,
+        structExpr: Expression,
+        fieldName: str,
+    ) -> None:
+        super().__init__()
+
+        assert isinstance(structExpr, Expression)
+        self._structExpr = structExpr
+
+        assert isinstance(fieldName, str)
+        self._fieldName = fieldName
+
+    def to_plan(self, session: "SparkConnectClient") -> proto.Expression:
+        expr = proto.Expression()
+        expr.update_fields.struct_expression.CopyFrom(self._structExpr.to_plan(session))
+        expr.update_fields.field_name = self._fieldName
+        return expr
+
+    def __repr__(self) -> str:
+        return f"DropField({self._structExpr}, {self._fieldName})"

Review Comment:
   Same question?



##########
python/pyspark/sql/tests/connect/test_connect_column.py:
##########
@@ -653,19 +676,118 @@ def test_column_accessor(self):
             sdf.select(sdf.c[0:1], sdf["c"][2:10]).toPandas(),
         )
 
-    def test_unsupported_functions(self):

Review Comment:
   Were those three last ones? Nice!! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org