You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "allisonwang-db (via GitHub)" <gi...@apache.org> on 2023/08/03 16:54:47 UTC

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42310: [SPARK-44561][PYTHON] Fix AssertionError when converting UDTF output to a complex type

allisonwang-db commented on code in PR #42310:
URL: https://github.com/apache/spark/pull/42310#discussion_r1283466745


##########
python/pyspark/sql/pandas/types.py:
##########
@@ -763,6 +764,9 @@ def _create_converter_from_pandas(
     error_on_duplicated_field_names : bool, optional
         Whether raise an exception when there are duplicated field names.
         (default ``True``)
+    ignore_unexpected_complex_type_values : bool, optional
+        Whether ignore the case where unexpected values are given for complex types.

Review Comment:
   Can we add a bit more comment to explain what is an unexpected value for a complex type? I.e., provide some concrete examples.



##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -1789,9 +1786,102 @@ def eval(self):
             ("x: array<int>", [Row(x=[0, 1, 2])]),
             ("x: array<float>", [Row(x=[0, 1.1, 2])]),
             ("x: array<array<int>>", err),
-            # TODO(SPARK-44561): fix AssertionError in convert_map and convert_struct
-            # ("x: map<string,int>", None),
-            # ("x: struct<a:int>", None)
+            ("x: map<string,int>", err),
+            ("x: struct<a:int>", err),
+        ]:
+            with self.subTest(ret_type=ret_type):
+                self._check_result_or_exception(TestUDTF, ret_type, expected)
+
+    def test_map_output_type_casting(self):

Review Comment:
   Thanks for adding these tests!



##########
python/pyspark/sql/pandas/types.py:
##########
@@ -781,28 +785,51 @@ def correct_timestamp(pser: pd.Series) -> pd.Series:
     def _converter(dt: DataType) -> Optional[Callable[[Any], Any]]:
 
         if isinstance(dt, ArrayType):
-            _element_conv = _converter(dt.elementType)
-            if _element_conv is None:
-                return None
+            _element_conv = _converter(dt.elementType) or (lambda x: x)

Review Comment:
   just curious why do we need (lambda x: x)?



##########
python/pyspark/sql/types.py:
##########
@@ -2402,7 +2402,7 @@ def __repr__(self) -> str:
                 "%s=%r" % (k, v) for k, v in zip(self.__fields__, tuple(self))
             )
         else:
-            return "<Row(%s)>" % ", ".join("%r" % field for field in self)
+            return "<Row(%s)>" % ", ".join(repr(field) for field in self)

Review Comment:
   This is fixed in https://github.com/apache/spark/pull/42303?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org