You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/11/27 07:07:06 UTC

[GitHub] [spark] cloud-fan commented on issue #26684: [WIP][SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.

cloud-fan commented on issue #26684: [WIP][SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables.
URL: https://github.com/apache/spark/pull/26684#issuecomment-558959828
 
 
   This seems like a hard problem. What we need is:
   1. access hive metadata only once when resolving a table.
   2. allow having catalog name in the table name for v1 tables.
   
   There are two things conflicting:
   1. we want to make fewer changes to the v1 code path. we want to still get v1 table through `SessionCatalog.lookupRelation`
   2. we want to know the table from session catalog is v1 or v2, through `V2SessionCatalog.loadTable`
   
   To do these 2 things together with one Hive metastore access, we have 3 options:
   1. In `ResolveTables`, if we see a `V1Table`, we return a v1 relation instead of skipping it. This needs to refactor the view resolution, so that we don't need to resolve view and table recursively in one rule `ResolveRelations`.
   2. In `ResolveRelations`, we look up table using v2 API `V2SessionCatalog.loadTable`
   3. introduce a cache. This needs to be carefully designed, so that the cache only takes affect between `ResolveTables` and `ResolveRelations`.
   
   I think option 2 is the easiest to do at the current stage.
   
   cc @rdblue @brkyvz 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org