You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/08 21:25:38 UTC

[GitHub] [iceberg] jfz opened a new pull request, #5230: Core,API: Add table property transform to support stateful property update.

jfz opened a new pull request, #5230:
URL: https://github.com/apache/iceberg/pull/5230

   This PR adds a transform API to table properties and the transforms are re-applied based on refreshed properties when there's a conflict, and this make it possible for truly stateful and transactional property updates.
   
   Currently, the table property update commit is basically "last commit wins", and the property value is updated out side of iceberg library so it will not be re-applied during commit retries when there's a conflict, that will cause some commits "effectively" being overwritten if the update is based on existing value.
   
   Consider below example use case: 
   Table property "subscribers" is used to track who should be notified whenever there's a new data write, and "subscribers" is a set of users and we do add/remove to manage users.
   
   Steps for a value update loss:
   1. initial there's only 1 user, subscribers = <u1>
   2. commit A attempts to add user u2 to subscribers: set value to  <u1,u2>
   3. commit B attempts to add user u3 to subscribers: set value to  <u1,u3>
   4. commit B succeeded first, now subscribers value is: <u1,u3>
   5. Commit A failed, and retry succeeded, setting new value to: <u1,u2>
   6. now both A and B succeeded, final value: <u1,u2>    =>   commit B is effectively lost.
   
   With transform API, commits can request a transform like "add user u2 to existing set" instead of "set it to <u1,u2>", and it will never be lost because transform is re-applied for commit retries.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5230: Core,API: Add table property transform to support stateful property update.

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5230:
URL: https://github.com/apache/iceberg/pull/5230#issuecomment-1194554908

   I'm not sure about this. At first, it seems like a reasonable thing to support. But, Iceberg table properties are not intended to be used for arbitrary user metadata like this. They are intended for information to configure reading or writing a table. The existing behavior is that set is idempotent, so it is a stretch to say that Iceberg should support a new feature so you can use table properties for a different purpose.
   
   There is also an implementation problem with this. Not all catalogs can guarantee the behavior introduced in this PR. For example, the REST protocol's `SetPropertiesUpdate` is an idempotent change. There's no way to communicate what the catalog service should do if there is a conflict, or even to signal that it should fail and retry in the event of a conflict. At a minimum, we would need to add a validation for table properties that enabled failing the server-side commit.
   
   I'm not sure that those changes are worth it. Have you considered using a prefix instead of a single option? Instead of `subscribers=a,b,c` you could set `subscribers.a.enabled=true`, `subscribers.b.enabled=true`, etc. That gives you the ability to customize each one as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5230: Core,API: Add table property transform to support stateful property update.

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5230:
URL: https://github.com/apache/iceberg/pull/5230#issuecomment-1200274162

   > > They are intended for information to configure reading or writing a table.
   > 
   > Time to evolve? :)
   
   I think this is a great demonstration of why we want to be strict. This would be a convenient place to stash properties, but then you run into situations like this one where properties don't provide the behavior guarantees that you want for another use case. If we think about implementing those behaviors, then we need to consider things like how to make assertions on the state of table properties in the REST API so that concurrent updates will fail and the client can retry. That's a lot of work and it makes the REST spec more difficult to implement.
   
   I think it is best to continue with the current behavior, especially given that there is a reasonable work-around.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jfz commented on a diff in pull request #5230: Core,API: Add table property transform to support stateful property update.

Posted by GitBox <gi...@apache.org>.
jfz commented on code in PR #5230:
URL: https://github.com/apache/iceberg/pull/5230#discussion_r922370048


##########
core/src/main/java/org/apache/iceberg/PropertiesUpdate.java:
##########
@@ -53,18 +55,34 @@ public UpdateProperties set(String key, String value) {
     Preconditions.checkNotNull(value, "Value cannot be null");
     Preconditions.checkArgument(!removals.contains(key),
         "Cannot remove and update the same key: %s", key);
+    Preconditions.checkArgument(!transforms.containsKey(key),
+        "Cannot transform and update the same key: %s", key);
 
     updates.put(key, value);
 
     return this;
   }
 
+  @Override
+  public UpdateProperties transform(String key, Function<String, String> transformFunc) {
+    Preconditions.checkNotNull(key, "Key cannot be null");
+    Preconditions.checkNotNull(transformFunc, "Transform function cannot be null");
+    Preconditions.checkArgument(!updates.containsKey(key),
+        "Cannot updates and transform the same key: %s", key);

Review Comment:
   @rdblue @kbendick can you take a look at this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jfz commented on pull request #5230: Core,API: Add table property transform to support stateful property update.

Posted by GitBox <gi...@apache.org>.
jfz commented on PR #5230:
URL: https://github.com/apache/iceberg/pull/5230#issuecomment-1195009103

   @rdblue thanks for looking into this and the insights!
   
   > They are intended for information to configure reading or writing a table.
   
   Time to evolve? :) I feel this can be useful and unlock new use cases that require stateful/transactional update to any table specific meta info.
   
   > implementation challenges, e.g. REST Cagalog
   
   I think this can be a generic challenge for passing any sort of transformation to server that's worth looking into, a quick thought would be either serialize/deserialize or predefined transforms like `partition transforms`. Or we can choose not to support in REST Catalog to start with.
   
   > using a prefix instead of a single option
   
   I think it will reduce but not eliminate the chance of overwrite on commit conflicts.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jfz commented on a diff in pull request #5230: Core,API: Add table property transform to support stateful property update.

Posted by GitBox <gi...@apache.org>.
jfz commented on code in PR #5230:
URL: https://github.com/apache/iceberg/pull/5230#discussion_r917561856


##########
core/src/main/java/org/apache/iceberg/PropertiesUpdate.java:
##########
@@ -53,18 +55,34 @@ public UpdateProperties set(String key, String value) {
     Preconditions.checkNotNull(value, "Value cannot be null");
     Preconditions.checkArgument(!removals.contains(key),
         "Cannot remove and update the same key: %s", key);
+    Preconditions.checkArgument(!transforms.containsKey(key),
+        "Cannot transform and update the same key: %s", key);
 
     updates.put(key, value);
 
     return this;
   }
 
+  @Override
+  public UpdateProperties transform(String key, Function<String, String> transformFunc) {
+    Preconditions.checkNotNull(key, "Key cannot be null");
+    Preconditions.checkNotNull(transformFunc, "Transform function cannot be null");
+    Preconditions.checkArgument(!updates.containsKey(key),
+        "Cannot updates and transform the same key: %s", key);

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed pull request #5230: Core,API: Add table property transform to support stateful property update.

Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #5230: Core,API: Add table property transform to support stateful property update.
URL: https://github.com/apache/iceberg/pull/5230


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5230: Core,API: Add table property transform to support stateful property update.

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5230:
URL: https://github.com/apache/iceberg/pull/5230#discussion_r917456613


##########
core/src/main/java/org/apache/iceberg/PropertiesUpdate.java:
##########
@@ -53,18 +55,34 @@ public UpdateProperties set(String key, String value) {
     Preconditions.checkNotNull(value, "Value cannot be null");
     Preconditions.checkArgument(!removals.contains(key),
         "Cannot remove and update the same key: %s", key);
+    Preconditions.checkArgument(!transforms.containsKey(key),
+        "Cannot transform and update the same key: %s", key);
 
     updates.put(key, value);
 
     return this;
   }
 
+  @Override
+  public UpdateProperties transform(String key, Function<String, String> transformFunc) {
+    Preconditions.checkNotNull(key, "Key cannot be null");
+    Preconditions.checkNotNull(transformFunc, "Transform function cannot be null");
+    Preconditions.checkArgument(!updates.containsKey(key),
+        "Cannot updates and transform the same key: %s", key);

Review Comment:
   Nit: should be "update" rather than "updates".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org