You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "ueshin (via GitHub)" <gi...@apache.org> on 2023/07/15 00:32:17 UTC

[GitHub] [spark] ueshin commented on a diff in pull request #41989: [SPARK-43965][PYTHON][CONNECT] Support Python UDTF in Spark Connect

ueshin commented on code in PR #41989:
URL: https://github.com/apache/spark/pull/41989#discussion_r1264258450


##########
python/pyspark/sql/connect/plan.py:
##########
@@ -2173,6 +2175,107 @@ def plan(self, session: "SparkConnectClient") -> proto.Relation:
         return plan
 
 
+class PythonUDTF:
+    """Represents a Python user-defined table function."""
+
+    def __init__(
+        self,
+        func: Type,
+        return_type: Union[DataType, str],
+        eval_type: int,
+        python_ver: str,
+    ) -> None:
+        self._func = func
+        self._name = func.__name__
+        self._return_type: DataType = (
+            UnparsedDataType(return_type) if isinstance(return_type, str) else return_type
+        )
+        self._eval_type = eval_type
+        self._python_ver = python_ver
+
+    def _parse_return_type(self, session: "SparkConnectClient") -> StructType:
+        if isinstance(self._return_type, UnparsedDataType):
+            parsed = session._analyze(
+                method="ddl_parse", ddl_string=self._return_type.data_type_string
+            ).parsed
+        else:
+            parsed = self._return_type
+
+        if not isinstance(parsed, StructType):
+            raise PySparkTypeError(
+                error_class="INVALID_UDTF_RETURN_TYPE",
+                message_parameters={"name": self._name, "return_type": f"{parsed}"},
+            )

Review Comment:
   As this check is done in the server side, we don't need to check this in client side, then we can reduce one round-trip to parse the schema. `UnparsedDataType` will be handled in `SparkConnectPlanner.transformDataType`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org