You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/13 04:33:02 UTC

[GitHub] [spark] cloud-fan edited a comment on issue #27894: [SPARK-31136] Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

cloud-fan edited a comment on issue #27894: [SPARK-31136] Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/27894#issuecomment-598534428
 
 
   I agree that we should evaluate the "cost to break", but looking at unit tests may not be a good idea. They heavily rely on internal assumptions and changing the table format will definitely break a lot of unit tests.
   
   Ideally, table format only decides how the table is stored and should be a performance thing, but hive table is a bit different. IMO, the "cost to break" is losing hive compatibility a bit: Now the tables created without USING may not be readable to hive, and some hive specific commands like LOAD TABLE doesn't work for them.
   
   On the other hand, the "cost to maintain" is losing Spark's perf benefits: Many users just run `CREATE TABLE` like they do in other databases, which creates a hive table before 3.0. This means all the features we build for our native readers are not available, like the vectorized reader, nested column pruning, nested field filter pushdown (@dbtsai is working on it), bucketed table, etc.
   
   I think in this case the "cost to maintain" is more serious and we should accept that change and don't revert it. cc @marmbrus @srowen  @maropu @viirya @HyukjinKwon 
   
   UPDATED to make my opinion more clear.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org