You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2024/02/16 02:37:44 UTC

Re: [PR] [SPARK-38098][PYTHON] Add support for ArrayType of nested StructType to arrow-based conversion [spark]

itholic commented on code in PR #35391:
URL: https://github.com/apache/spark/pull/35391#discussion_r1491894555


##########
python/pyspark/sql/pandas/types.py:
##########
@@ -86,8 +86,15 @@ def to_arrow_type(dt: DataType) -> "pa.DataType":
     elif type(dt) == DayTimeIntervalType:
         arrow_type = pa.duration("us")
     elif type(dt) == ArrayType:
-        if type(dt.elementType) in [StructType, TimestampType]:
+        if type(dt.elementType) == TimestampType:
             raise TypeError("Unsupported type in conversion to Arrow: " + str(dt))
+        elif type(dt.elementType) == StructType:
+            if LooseVersion(pa.__version__) < LooseVersion("2.0.0"):
+                raise TypeError(
+                    "Array of StructType is only supported with pyarrow 2.0.0 and above"

Review Comment:
   Hi, @LucaCanali I think I've found a case where Array of StructType doesn't work properly:
   
   **In:**
   ```python
   df = spark.createDataFrame(
     [
       ("a", [("b", False), ("c", True)]),
     ]
   ).toDF("c1", "c2")
   df.toPandas()
   ```
   
   **Out:**
   ```python
     c1                                                 c2
   0  a  [{'_1': 'b', '_2': False}, {'_1': 'c', '_2': T...
   ```
   
   **Expected:**
   ```python
     c1                       c2
   0  a  [(b, False), (c, True)]
   ```
   
   
   I roughly suspect that this is maybe an internal table conversion issue from PyArrow side but not very sure about this, so could you check this out when you find some time??
   
   Thanks in advance!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org