You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/14 05:16:15 UTC

[GitHub] [spark] BryanCutler commented on pull request #34509: [SPARK-34521][PYTHON][SQL] Fix spark.createDataFrame when using pandas with StringDtype

BryanCutler commented on pull request #34509:
URL: https://github.com/apache/spark/pull/34509#issuecomment-993166320


   The difference with #28743 is that was trying to deal with pyarrow extension types. For a pandas extension type the `__arrow_array__` interface will return an arrow array which could be a standard arrow type or an extension type. For this PR, we are talking about a standard string array, which pyspark can work with. Otherwise, if it's a pyarrow extension type, the storage type would need to be checked, which would be a standard arrow type. From that, pyspark could work with the storage type but that might not be very useful because all of the extension would be stripped out.
   
   This PR is a step in the right direction, so I think it's ok to merge. This will add support for any pandas extension types that are backed by a standard arrow array, although I don't think it will be able to convert it back to pandas as the original extension type. To fully support pandas/pyarrow extension types we would need to propagate the extension type info through spark so that when it is worked on again in python, the extension part can be loaded back up. I'm not exactly sure how difficult that might be to do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org