You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2024/02/16 02:37:44 UTC
Re: [PR] [SPARK-38098][PYTHON] Add support for ArrayType of nested StructType to arrow-based conversion [spark]
itholic commented on code in PR #35391:
URL: https://github.com/apache/spark/pull/35391#discussion_r1491894555
##########
python/pyspark/sql/pandas/types.py:
##########
@@ -86,8 +86,15 @@ def to_arrow_type(dt: DataType) -> "pa.DataType":
elif type(dt) == DayTimeIntervalType:
arrow_type = pa.duration("us")
elif type(dt) == ArrayType:
- if type(dt.elementType) in [StructType, TimestampType]:
+ if type(dt.elementType) == TimestampType:
raise TypeError("Unsupported type in conversion to Arrow: " + str(dt))
+ elif type(dt.elementType) == StructType:
+ if LooseVersion(pa.__version__) < LooseVersion("2.0.0"):
+ raise TypeError(
+ "Array of StructType is only supported with pyarrow 2.0.0 and above"
Review Comment:
Hi, @LucaCanali I think I've found a case where Array of StructType doesn't work properly:
**In:**
```python
df = spark.createDataFrame(
[
("a", [("b", False), ("c", True)]),
]
).toDF("c1", "c2")
df.toPandas()
```
**Out:**
```python
c1 c2
0 a [{'_1': 'b', '_2': False}, {'_1': 'c', '_2': T...
```
**Expected:**
```python
c1 c2
0 a [(b, False), (c, True)]
```
I roughly suspect that this is maybe an internal table conversion issue from PyArrow side but not very sure about this, so could you check this out when you find some time??
Thanks in advance!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org