You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhengruifeng (via GitHub)" <gi...@apache.org> on 2023/08/31 13:30:05 UTC
[GitHub] [spark] zhengruifeng opened a new pull request, #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow
zhengruifeng opened a new pull request, #42754:
URL: https://github.com/apache/spark/pull/42754
### What changes were proposed in this pull request?
Move the arrow batch creation to the `isCommand` branch
### Why are the changes needed?
https://github.com/apache/spark/pull/42736 and https://github.com/apache/spark/pull/42743 introduced the `CalendarIntervalType` in Spark Connect Python Client, however, there is a failure
```
spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)")
...
pyspark.errors.exceptions.connect.UnsupportedOperationException: [UNSUPPORTED_DATATYPE] Unsupported data type "INTERVAL".
```
The root causes is that `handleSqlCommand` always create an arrow batch while `ArrowUtils` doesn't accept `CalendarIntervalType` now.
this PR mainly focus on enabling `schema` with datatypes not compatible with arrow.
In the future, we should make `ArrowUtils` accept `CalendarIntervalType` to make `collect/toPandas` works
### Does this PR introduce _any_ user-facing change?
yes
after this PR
```
In [1]: spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)")
Out[1]: DataFrame[make_interval(100, 11, 1, 1, 12, 30, 1.001001): interval]
In [2]: spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)").schema
Out[2]: StructType([StructField('make_interval(100, 11, 1, 1, 12, 30, 1.001001)', CalendarIntervalType(), True)])
```
### How was this patch tested?
enabled ut
### Was this patch authored or co-authored using generative AI tooling?
no
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow
Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42754:
URL: https://github.com/apache/spark/pull/42754#issuecomment-1701048504
cc @HyukjinKwon @grundprinzip
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow
Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42754:
URL: https://github.com/apache/spark/pull/42754#issuecomment-1702093873
thanks, merged to master
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow
Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42754:
URL: https://github.com/apache/spark/pull/42754#issuecomment-1701045485
CI link: https://github.com/zhengruifeng/spark/actions/runs/6035501160/job/16376014090
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow
Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42754:
URL: https://github.com/apache/spark/pull/42754#issuecomment-1702008756
also cc @hvanhovell
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhengruifeng closed pull request #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow
Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow
URL: https://github.com/apache/spark/pull/42754
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org