You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/02 13:27:36 UTC

[GitHub] [spark] peter-toth opened a new pull request, #38885: [SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog

peter-toth opened a new pull request, #38885:
URL: https://github.com/apache/spark/pull/38885

   ### What changes were proposed in this pull request?
   Currently the config `spark.sql.sources.useV1SourceList` doesn't work with V2 file tables in session catalog, it is always the V1 path that is used. This PR enables V2 file tables in read paths via session catalog and fixes a few issues where V2 behaves differently to V1.
   
   ### Why are the changes needed?
   It would be good if we could use the already available V2 file source implmenentaions with the session catalog. We ran into a few problems with V2 optimization paths that want to fix in the future. But, currently Spark don't have built-in catalog support for any of the V2 file table implementations. As a first step this PR enables V2 for the select query plans only. All commands and `InsertIntoStatement` remain using V1 implementations.
   
   The PR also contains some test changes:
   - `SQLQuerySuite` is splitted into V1 and V2 versions.
   - V2 versions of `OrcPartitionDiscoverySuite` and `ParquetPartitionDiscoverySuite` are modified to behave like the V1 versions do. Basically the order of output columns changed in the edge case when partitioning and data columns overlap.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, see order of output columns when partitioning and data columns overlap.
   
   ### How was this patch tested?
   Existing and new UTs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #38885: [SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog

Posted by GitBox <gi...@apache.org>.
peter-toth commented on PR #38885:
URL: https://github.com/apache/spark/pull/38885#issuecomment-1335243106

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] peter-toth commented on pull request #38885: [SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog

Posted by GitBox <gi...@apache.org>.
peter-toth commented on PR #38885:
URL: https://github.com/apache/spark/pull/38885#issuecomment-1339214805

   @cloud-fan, do you think we could start enabling V2 file tables with this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #38885: [WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #38885:
URL: https://github.com/apache/spark/pull/38885#issuecomment-1605183251

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #38885: [WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #38885: [WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog
URL: https://github.com/apache/spark/pull/38885


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org