You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/13 19:32:49 UTC

[GitHub] [iceberg] wypoon commented on a change in pull request #3722: Spark: Use snapshot schema when reading snapshot

wypoon commented on a change in pull request #3722:
URL: https://github.com/apache/iceberg/pull/3722#discussion_r768059097



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java
##########
@@ -89,15 +93,10 @@ public SparkTable(Table icebergTable, boolean refreshEagerly) {
     this(icebergTable, null, refreshEagerly);
   }
 
-  public SparkTable(Table icebergTable, StructType requestedSchema, boolean refreshEagerly) {
+  public SparkTable(Table icebergTable, Long snapshotId, boolean refreshEagerly) {
     this.icebergTable = icebergTable;
-    this.requestedSchema = requestedSchema;
+    this.snapshotId = snapshotId;
     this.refreshEagerly = refreshEagerly;
-
-    if (requestedSchema != null) {
-      // convert the requested schema to throw an exception if any requested fields are unknown
-      SparkSchemaUtil.convert(icebergTable.schema(), requestedSchema);
-    }

Review comment:
       I pointed this out in #1508 and I'll point it out again here:
   
   I removed requestedSchema from SparkTable because with #1783, the Spark 3 IcebergSource changed to be a SupportsCatalogOptions, not just a TableProvider. Since DataFrameReader does not support specifying a schema when reading from an IcebergSource:
   ```
       DataSource.lookupDataSourceV2(source, sparkSession.sessionState.conf).map { provider =>
         ...
         val (table, catalog, ident) = provider match {
           case _: SupportsCatalogOptions if userSpecifiedSchema.nonEmpty =>
             throw new IllegalArgumentException(
               s"$source does not support user specified schema. Please don't specify the schema.")
   ```
   (see https://github.com/apache/spark/blob/v3.2.0/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L220-L223)
   there is no reason to have a requestedSchema field as we cannot make use of it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org