You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/06 00:24:34 UTC

[GitHub] [iceberg] wypoon opened a new pull request #3852: Core, Spark: Fallback when snapshot does not have schema id

wypoon opened a new pull request #3852:
URL: https://github.com/apache/iceberg/pull/3852


   In `SnapshotUtil`, when a snapshot does not have a schema id (written before schema id was added to snapshots), we fall back to reading each of the previous metadata files until we find one whose current snapshot id matches the snapshot id we seek, and read its schema from there.
   
   We introduce a setting for testing purposes that makes the `SnapshotParser` write JSON without `schema-id` for snapshots. We add variants of existing tests for reading snapshots after schema evolution where the metadata is written without `schema-id` in the snapshots. The tests fail without the change in `SnapshotUtil`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #3852: Core, Spark: Fallback when snapshot does not have schema id

Posted by GitBox <gi...@apache.org>.
wypoon commented on pull request #3852:
URL: https://github.com/apache/iceberg/pull/3852#issuecomment-1028253273


   @rdblue when you have time, can you please review? (This seems to have fallen off the radar.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #3852: Core, Spark: Fallback when snapshot does not have schema id

Posted by GitBox <gi...@apache.org>.
wypoon commented on pull request #3852:
URL: https://github.com/apache/iceberg/pull/3852#issuecomment-1006214040


   Hmm, the Flink CI must be flaky. The failing test passes for me locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a change in pull request #3852: Core, Spark: Fallback when snapshot does not have schema id

Posted by GitBox <gi...@apache.org>.
wypoon commented on a change in pull request #3852:
URL: https://github.com/apache/iceberg/pull/3852#discussion_r779224225



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotParser.java
##########
@@ -89,8 +89,12 @@ static void toJson(Snapshot snapshot, JsonGenerator generator)
       generator.writeEndArray();
     }
 
-    // schema ID might be null for snapshots written by old writers
-    if (snapshot.schemaId() != null) {
+    // For test purposes, we want to emulate an old writer; we do this if the system property
+    // iceberg.snapshot.no-schema-id is set to true.
+    boolean noSchemaId = SystemProperties.getBoolean("iceberg.snapshot.no-schema-id", false);

Review comment:
       I'm open to suggestions for other ways of getting the metadata to be written without `schema-id` in the `snapshot`s, or even other ways of testing this. But this was a simple, unintrusive way (very little code change needed; in particular, no API changes needed).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #3852: Core, Spark: Fallback when snapshot does not have schema id

Posted by GitBox <gi...@apache.org>.
wypoon commented on pull request #3852:
URL: https://github.com/apache/iceberg/pull/3852#issuecomment-1006772135


   @rdblue @jackye1995 @yyanyy this is the fallback part of using the snapshot schema when reading a snapshot. (It does not have to make it into 0.13.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a change in pull request #3852: Core, Spark: Fallback when snapshot does not have schema id

Posted by GitBox <gi...@apache.org>.
wypoon commented on a change in pull request #3852:
URL: https://github.com/apache/iceberg/pull/3852#discussion_r779224225



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotParser.java
##########
@@ -89,8 +89,12 @@ static void toJson(Snapshot snapshot, JsonGenerator generator)
       generator.writeEndArray();
     }
 
-    // schema ID might be null for snapshots written by old writers
-    if (snapshot.schemaId() != null) {
+    // For test purposes, we want to emulate an old writer; we do this if the system property
+    // iceberg.snapshot.no-schema-id is set to true.
+    boolean noSchemaId = SystemProperties.getBoolean("iceberg.snapshot.no-schema-id", false);

Review comment:
       I'm open to suggestions for other ways of getting the metadata to be written without `schema-id` in the `snapshot`s, or even other ways of testing this. But this was a simple, unintrusive way (very little code change needed, in particular, no API changes needed).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org