You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "ueshin (via GitHub)" <gi...@apache.org> on 2023/07/27 20:02:28 UTC

[GitHub] [spark] ueshin commented on a diff in pull request #42157: [SPARK-43968][PYTHON] Improve error messages for Python UDTFs with wrong number of outputs

ueshin commented on code in PR #42157:
URL: https://github.com/apache/spark/pull/42157#discussion_r1276759271


##########
python/pyspark/worker.py:
##########
@@ -654,6 +657,19 @@ def wrap_udtf(f, return_type):
             assert return_type.needConversion()
             toInternal = return_type.toInternal
 
+            def verify_and_convert_result(result):
+                # TODO(SPARK-44005): support returning non-tuple values
+                if result is not None and hasattr(result, "__len__"):
+                    if len(result) != len(return_type):

Review Comment:
   nit: ditto.



##########
python/pyspark/worker.py:
##########
@@ -604,8 +604,11 @@ def verify_result(result):
                         },
                     )
 
-                # Check when the dataframe has both rows and columns.
-                if not result.empty or len(result.columns) != 0:
+                # Validate the output schema when the result dataframe has either output
+                # rows or columns. Note that we avoid using `df.empty` here because the
+                # result dataframe may contain an empty row. For example, when a UDTF is
+                # defined as follows: def eval(self): yield tuple().
+                if len(result) > 0 or len(result.columns) > 0:
                     if len(result.columns) != len(return_type):

Review Comment:
   nit: We might want to have an variable for `len(return_type)` outside of `verify_result` to avoid any potential overhead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org