You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhengruifeng (via GitHub)" <gi...@apache.org> on 2023/08/31 13:30:05 UTC

[GitHub] [spark] zhengruifeng opened a new pull request, #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow

zhengruifeng opened a new pull request, #42754:
URL: https://github.com/apache/spark/pull/42754

   ### What changes were proposed in this pull request?
   
   Move the arrow batch creation to the `isCommand` branch
   
   
   ### Why are the changes needed?
   
   https://github.com/apache/spark/pull/42736 and https://github.com/apache/spark/pull/42743 introduced the `CalendarIntervalType` in Spark Connect Python Client, however, there is a failure
   
   ```
   spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)")
   
   ...
   
   pyspark.errors.exceptions.connect.UnsupportedOperationException: [UNSUPPORTED_DATATYPE] Unsupported data type "INTERVAL".
   ```
   
   The root causes is that `handleSqlCommand` always create an arrow batch while `ArrowUtils` doesn't accept `CalendarIntervalType` now.
   
   this PR mainly focus on enabling `schema` with datatypes not compatible with arrow.
   In the future, we should make `ArrowUtils` accept `CalendarIntervalType` to make `collect/toPandas` works
   
   ### Does this PR introduce _any_ user-facing change?
   yes
   
   after this PR
   ```
   In [1]: spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)")
   Out[1]: DataFrame[make_interval(100, 11, 1, 1, 12, 30, 1.001001): interval]
   
   In [2]: spark.sql("SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001)").schema
   Out[2]: StructType([StructField('make_interval(100, 11, 1, 1, 12, 30, 1.001001)', CalendarIntervalType(), True)])
   ```
   
   
   ### How was this patch tested?
   enabled ut
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42754:
URL: https://github.com/apache/spark/pull/42754#issuecomment-1701048504

   cc @HyukjinKwon @grundprinzip 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42754:
URL: https://github.com/apache/spark/pull/42754#issuecomment-1702093873

   thanks, merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42754:
URL: https://github.com/apache/spark/pull/42754#issuecomment-1701045485

   CI link: https://github.com/zhengruifeng/spark/actions/runs/6035501160/job/16376014090


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42754:
URL: https://github.com/apache/spark/pull/42754#issuecomment-1702008756

   also cc @hvanhovell 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #42754: [SPARK-45026][CONNECT] `spark.sql` should support datatypes not compatible with arrow
URL: https://github.com/apache/spark/pull/42754


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org