You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by "CodingCat (via GitHub)" <gi...@apache.org> on 2023/05/30 17:11:17 UTC

[GitHub] [iceberg] CodingCat opened a new pull request, #7743: add doc about commitmetadata

CodingCat opened a new pull request, #7743:
URL: https://github.com/apache/iceberg/pull/7743

   followup PR of https://github.com/apache/iceberg/commit/893af4a19841ae23e18b1e2196df9176d9d90bc2
   
   adding doc about org.apache.iceberg.spark.CommitMetadata


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] CodingCat commented on a diff in pull request #7743: add doc about commitmetadata

Posted by "CodingCat (via GitHub)" <gi...@apache.org>.

CodingCat commented on code in PR #7743:
URL: https://github.com/apache/iceberg/pull/7743#discussion_r1229886859


##########
docs/spark-configuration.md:
##########
@@ -194,3 +194,17 @@ df.write
 | check-ordering       | true        | Checks if input schema and table schema are same  |
 | isolation-level | null | Desired isolation level for Dataframe overwrite operations.  `null` => no checks (for idempotent writes), `serializable` => check for concurrent inserts or deletes in destination partitions, `snapshot` => checks for concurrent deletes in destination partitions. |
 | validate-from-snapshot-id | null | If isolation level is set, id of base snapshot from which to check concurrent write conflicts into a table. Should be the snapshot before any reads from the table. Can be obtained via [Table API](../../api#table-metadata) or [Snapshots table](../spark-queries#snapshots). If null, the table's oldest known snapshot is used. |
+
+specifically, if you run SQL statements, you could use `org.apache.iceberg.spark.CommitMetadata` to add entries with custom-keys and corresponding values in the snapshot summary
+
+```java
+import org.apache.iceberg.spark.CommitMetadata;

Review Comment:
   sure, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] flyrain commented on pull request #7743: add doc about commitmetadata

Posted by "flyrain (via GitHub)" <gi...@apache.org>.

flyrain commented on PR #7743:
URL: https://github.com/apache/iceberg/pull/7743#issuecomment-1592419272

   Merged, thanks @CodingCat !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] CodingCat commented on pull request #7743: add doc about commitmetadata

Posted by "CodingCat (via GitHub)" <gi...@apache.org>.

CodingCat commented on PR #7743:
URL: https://github.com/apache/iceberg/pull/7743#issuecomment-1590051495

   @flyrain sorry for the delay, just addressed the comments, please take another look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] flyrain commented on a diff in pull request #7743: add doc about commitmetadata

Posted by "flyrain (via GitHub)" <gi...@apache.org>.

flyrain commented on code in PR #7743:
URL: https://github.com/apache/iceberg/pull/7743#discussion_r1228761849


##########
docs/spark-configuration.md:
##########
@@ -194,3 +194,17 @@ df.write
 | check-ordering       | true        | Checks if input schema and table schema are same  |
 | isolation-level | null | Desired isolation level for Dataframe overwrite operations.  `null` => no checks (for idempotent writes), `serializable` => check for concurrent inserts or deletes in destination partitions, `snapshot` => checks for concurrent deletes in destination partitions. |
 | validate-from-snapshot-id | null | If isolation level is set, id of base snapshot from which to check concurrent write conflicts into a table. Should be the snapshot before any reads from the table. Can be obtained via [Table API](../../api#table-metadata) or [Snapshots table](../spark-queries#snapshots). If null, the table's oldest known snapshot is used. |
+
+specifically, if you run SQL statements, you could use `org.apache.iceberg.spark.CommitMetadata` to add entries with custom-keys and corresponding values in the snapshot summary
+
+```java
+import org.apache.iceberg.spark.CommitMetadata;

Review Comment:
   Sorry, I was suggesting an empty under the `import` line.  Minor thing though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] flyrain commented on a diff in pull request #7743: add doc about commitmetadata

Posted by "flyrain (via GitHub)" <gi...@apache.org>.

flyrain commented on code in PR #7743:
URL: https://github.com/apache/iceberg/pull/7743#discussion_r1223688581


##########
docs/spark-configuration.md:
##########
@@ -194,3 +194,17 @@ df.write
 | check-ordering       | true        | Checks if input schema and table schema are same  |
 | isolation-level | null | Desired isolation level for Dataframe overwrite operations.  `null` => no checks (for idempotent writes), `serializable` => check for concurrent inserts or deletes in destination partitions, `snapshot` => checks for concurrent deletes in destination partitions. |
 | validate-from-snapshot-id | null | If isolation level is set, id of base snapshot from which to check concurrent write conflicts into a table. Should be the snapshot before any reads from the table. Can be obtained via [Table API](../../api#table-metadata) or [Snapshots table](../spark-queries#snapshots). If null, the table's oldest known snapshot is used. |
+
+specifically, if you run SQL statements, you could use `org.apache.iceberg.spark.CommitMetadata` to add entries with custom-keys and corresponding values in the snapshot summary

Review Comment:
   Minor suggestion:
   `CommitMetadata` provides an interface to add custom metadata to a snapshot summary during a SQL execution, which can be beneficial for purposes such as auditing or change tracking. Here is an example:



##########
docs/spark-configuration.md:
##########
@@ -194,3 +194,17 @@ df.write
 | check-ordering       | true        | Checks if input schema and table schema are same  |
 | isolation-level | null | Desired isolation level for Dataframe overwrite operations.  `null` => no checks (for idempotent writes), `serializable` => check for concurrent inserts or deletes in destination partitions, `snapshot` => checks for concurrent deletes in destination partitions. |
 | validate-from-snapshot-id | null | If isolation level is set, id of base snapshot from which to check concurrent write conflicts into a table. Should be the snapshot before any reads from the table. Can be obtained via [Table API](../../api#table-metadata) or [Snapshots table](../spark-queries#snapshots). If null, the table's oldest known snapshot is used. |
+
+specifically, if you run SQL statements, you could use `org.apache.iceberg.spark.CommitMetadata` to add entries with custom-keys and corresponding values in the snapshot summary
+
+```java
+import org.apache.iceberg.spark.CommitMetadata;
+Map<String, String> properties = Maps.newHashMap();
+properties.put("property_key", "property_value");
+CommitMetadata.withCommitProperties(properties,
+        () -> {
+        spark.sql("DELETE FROM " + tableName + " where id = 1");
+        return 0;

Review Comment:
   Indentation?



##########
docs/spark-configuration.md:
##########
@@ -194,3 +194,17 @@ df.write
 | check-ordering       | true        | Checks if input schema and table schema are same  |
 | isolation-level | null | Desired isolation level for Dataframe overwrite operations.  `null` => no checks (for idempotent writes), `serializable` => check for concurrent inserts or deletes in destination partitions, `snapshot` => checks for concurrent deletes in destination partitions. |
 | validate-from-snapshot-id | null | If isolation level is set, id of base snapshot from which to check concurrent write conflicts into a table. Should be the snapshot before any reads from the table. Can be obtained via [Table API](../../api#table-metadata) or [Snapshots table](../spark-queries#snapshots). If null, the table's oldest known snapshot is used. |
+
+specifically, if you run SQL statements, you could use `org.apache.iceberg.spark.CommitMetadata` to add entries with custom-keys and corresponding values in the snapshot summary
+
+```java
+import org.apache.iceberg.spark.CommitMetadata;

Review Comment:
   Style suggestion: add an empty line



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] CodingCat commented on pull request #7743: add doc about commitmetadata

Posted by "CodingCat (via GitHub)" <gi...@apache.org>.

CodingCat commented on PR #7743:
URL: https://github.com/apache/iceberg/pull/7743#issuecomment-1568786990

   Hi, @flyrain , just filed the followup PR here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] flyrain commented on a diff in pull request #7743: add doc about commitmetadata

Posted by "flyrain (via GitHub)" <gi...@apache.org>.

flyrain commented on code in PR #7743:
URL: https://github.com/apache/iceberg/pull/7743#discussion_r1228761849


##########
docs/spark-configuration.md:
##########
@@ -194,3 +194,17 @@ df.write
 | check-ordering       | true        | Checks if input schema and table schema are same  |
 | isolation-level | null | Desired isolation level for Dataframe overwrite operations.  `null` => no checks (for idempotent writes), `serializable` => check for concurrent inserts or deletes in destination partitions, `snapshot` => checks for concurrent deletes in destination partitions. |
 | validate-from-snapshot-id | null | If isolation level is set, id of base snapshot from which to check concurrent write conflicts into a table. Should be the snapshot before any reads from the table. Can be obtained via [Table API](../../api#table-metadata) or [Snapshots table](../spark-queries#snapshots). If null, the table's oldest known snapshot is used. |
+
+specifically, if you run SQL statements, you could use `org.apache.iceberg.spark.CommitMetadata` to add entries with custom-keys and corresponding values in the snapshot summary
+
+```java
+import org.apache.iceberg.spark.CommitMetadata;

Review Comment:
   Sorry, I was suggestion an empty under the `import` line.  Minor thing though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] flyrain merged pull request #7743: add doc about commitmetadata

Posted by "flyrain (via GitHub)" <gi...@apache.org>.

flyrain merged PR #7743:
URL: https://github.com/apache/iceberg/pull/7743


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org