You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/24 18:35:38 UTC

[GitHub] [iceberg] moulimukherjee opened a new pull request #1241: Plumbing to add extra custom metadata to snapshot summary

moulimukherjee opened a new pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241


   Plumbing to add extra custom metadata to snapshot summary via write options.
   
   Context: We make use of airflow to run spark jobs, and want to pass a separate external ID and some versioning information linking to an airflow run for each snapshot created. Currently to do this, we have to fork the Writer just to override the commitOperation. 
   
   The changes in this PR would allow custom metadata via write options which start with a particular prefix (`extra-metadata.`). 
   
   cc? @rdblue 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moulimukherjee commented on pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
moulimukherjee commented on pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#issuecomment-663729163


   Ack, will update docs too 👍 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moulimukherjee edited a comment on pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
moulimukherjee edited a comment on pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#issuecomment-663753539


   cc? @thomaschow @stripe-jafc


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moulimukherjee commented on pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
moulimukherjee commented on pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#issuecomment-663753539


   cc? @thomaschow 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moulimukherjee commented on pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
moulimukherjee commented on pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#issuecomment-663747302


   @rdblue Added docs. Btw, I changed the prefix slightly to use `-` instead `.` as I noticed all other options have dashes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moulimukherjee edited a comment on pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
moulimukherjee edited a comment on pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#issuecomment-663753539






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#issuecomment-663724285


   Looks good to me. I just want to find a good prefix for the feature.
   
   It would also be good to document this in the write options docs: http://iceberg.apache.org/configuration/#write-options


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moulimukherjee edited a comment on pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
moulimukherjee edited a comment on pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#issuecomment-663753539


   cc? @thomaschow @jafc-stripe


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#issuecomment-663757084


   Thanks @moulimukherjee!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moulimukherjee commented on a change in pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
moulimukherjee commented on a change in pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#discussion_r460223612



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotSummary.java
##########
@@ -47,6 +47,7 @@
   public static final String PUBLISHED_WAP_ID_PROP = "published-wap-id";
   public static final String SOURCE_SNAPSHOT_ID_PROP = "source-snapshot-id";
   public static final String REPLACE_PARTITIONS_PROP = "replace-partitions";
+  public static final String EXTRA_METADATA_PREFIX = "extra-metadata.";

Review comment:
       happy to change if a different prefix string would be more suitable




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moulimukherjee commented on a change in pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
moulimukherjee commented on a change in pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#discussion_r460282327



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotSummary.java
##########
@@ -47,6 +47,7 @@
   public static final String PUBLISHED_WAP_ID_PROP = "published-wap-id";
   public static final String SOURCE_SNAPSHOT_ID_PROP = "source-snapshot-id";
   public static final String REPLACE_PARTITIONS_PROP = "replace-partitions";
+  public static final String EXTRA_METADATA_PREFIX = "extra-metadata.";

Review comment:
       `snapshot.property` sounds good. I'll make changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moulimukherjee commented on a change in pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
moulimukherjee commented on a change in pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#discussion_r460232691



##########
File path: spark/src/test/java/org/apache/iceberg/spark/source/TestDataSourceOptions.java
##########
@@ -364,4 +364,28 @@ public void testDefaultMetadataSplitSize() throws IOException {
     int partitionNum = metadataDf.javaRDD().getNumPartitions();
     Assert.assertEquals("Spark partitions should match", expectedSplits, partitionNum);
   }
+
+  @Test
+  public void testExtraSnapshotMetadata() throws IOException {
+    String tableLocation = temp.newFolder("iceberg-table").toString();
+    HadoopTables tables = new HadoopTables(CONF);
+    tables.create(SCHEMA, PartitionSpec.unpartitioned(), Maps.newHashMap(), tableLocation);
+
+    List<SimpleRecord> expectedRecords = Lists.newArrayList(
+            new SimpleRecord(1, "a"),
+            new SimpleRecord(2, "b")
+    );
+    Dataset<Row> originalDf = spark.createDataFrame(expectedRecords, SimpleRecord.class);
+    originalDf.select("id", "data").write()
+            .format("iceberg")
+            .mode("append")
+            .option("extra-metadata.extra-key", "someValue")
+            .option("extra-metadata.another-key", "anotherValue")
+            .save(tableLocation);
+
+    Table table = tables.load(tableLocation);
+
+    Assert.assertTrue(table.currentSnapshot().summary().get("extra-key").equals("someValue"));

Review comment:
       plumbing to pass extra information from write options




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#discussion_r460276424



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotSummary.java
##########
@@ -47,6 +47,7 @@
   public static final String PUBLISHED_WAP_ID_PROP = "published-wap-id";
   public static final String SOURCE_SNAPSHOT_ID_PROP = "source-snapshot-id";
   public static final String REPLACE_PARTITIONS_PROP = "replace-partitions";
+  public static final String EXTRA_METADATA_PREFIX = "extra-metadata.";

Review comment:
       Sounds good to me. What about `context.` or `snapshot.property.`? I want it to be somewhat obvious what the prefix indicates, but as short as possible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moulimukherjee edited a comment on pull request #1241: Plumbing to add extra custom metadata to snapshot summary

Posted by GitBox <gi...@apache.org>.
moulimukherjee edited a comment on pull request #1241:
URL: https://github.com/apache/iceberg/pull/1241#issuecomment-663753539


   cc? @thomaschow @jafc-stripe @smcnamara-stripe


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org