You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/24 04:35:18 UTC

[GitHub] [spark] cloud-fan commented on pull request #38640: [WIP][SPARK-41124][SQL][TEST] Add DSv2 PlanStabilitySuites

cloud-fan commented on PR #38640:
URL: https://github.com/apache/spark/pull/38640#issuecomment-1325947062

> Actually, I'm happy to work on making parquet v2 tables available in a separate ticket/PR if you can give my some guidance.

I tried to do it long time ago but failed as there are some design issues. We need to fully understand the use cases of `CREATE TABLE ... USING v1Source` and see how to make it work for v2 sources:
1. Just a name mapping, so that people can use table name instead of providing all the data source information every time. JDBC data source is a good example of it.
2. Schema cache. The data source may have a large cost to infer the data schema and need the Spark catalog to cache it. File source is a good example here.

We also need to think about the semantic of `ALTER TABLE`, `REFRESH TABLE`, `df.write.mode("overwrite").saveAsTable`, etc.

Some code references. For v1 source, we have a rule `FindDataSourceTable` to resolve table with v1 source. For v2 source, we probably should have a similar rule to resolve v2 source to `TableProvider`.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org