You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/21 07:35:13 UTC

[GitHub] [iceberg] zhangdove commented on issue #1230: How to read/write iceberg in Spark Structed Streaming

zhangdove commented on issue #1230:
URL: https://github.com/apache/iceberg/issues/1230#issuecomment-713372597


   When Analyzing Iceberg's Catalog, I find that There is still an issue left here, and I have made some new discoveries:
   
   `spark.read.format("iceberg").load("hdfs://nn:8020/path/to/table")` By this way, Iceberg table loading does not use the Iceberg Catalog. Of course, Iceberg's metadata information will not be cached. Instead, Iceberg Table will be obtained directly by using `IcebergSource.findTable(options,conf)`.
   
   However, when Iceberg table is loaded using `spark.table("prod.db.table")`, CachingCatalog(`cache-enabled`default value is true) automatically looks for Iceberg table from the cache(Caffeine Cache).
   
   Finally, whether it is incorrect that I find that the description of the document [in this place](https://github.com/apache/iceberg/blob/master/site/docs/spark.md#querying-with-dataframes)?
   
   The correct description should not be this ?
   `Using spark.table("prod.db.table") loads an isolated table reference that is not refreshed when other queries update the table.`
   
   @rdblue How do you think this description? Should we update this place?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org