You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/12/20 16:41:51 UTC

[GitHub] [spark] rdblue commented on issue #26868: [SPARK-29665][SQL] refine the TableProvider interface

rdblue commented on issue #26868: [SPARK-29665][SQL] refine the TableProvider interface
URL: https://github.com/apache/spark/pull/26868#issuecomment-567989105
 
 
   @cloud-fan, what is your rationale for saying "For [sources like Kafka], a simple getTable(properties) is the best."? You didn't give any argument why that is the case.
   
   My understanding is that the existing Kafka source has a static schema, so inferSchema is easy to implement. For other Kafka implementations, people may use a schema store in which case the lookup is fairly easy (although we would encourage building a Kafka catalog in this case). I don't see how the two inference methods make it more difficult to implement in this case.
   
   Having two `getTable` calls does make the API more confusing. If I want to store a Kafka stream in the built-in generic catalog, we agree that catalog should pass the schema and partitioning to `TableProvider.getTable` (Your point 2.). That means that both `getTable(properties)` and `getTable(schema, partitioning, properties)` must be implemented. And if an author doesn't implement the optional `SupportsExternalMetadata` interface the source won't work with a generic metastore. I think that's a significant problem and is surprising behavior.
   
   Like I said, we clearly need `getTable(schema, partitioning, properties)`. I don't think that we actually need a second variant, especially when implementing only the "simple" version doesn't work with the built-in metastore.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org