You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "sodonnel (via GitHub)" <gi...@apache.org> on 2024/04/05 17:14:20 UTC

[PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

sodonnel opened a new pull request, #6482:
URL: https://github.com/apache/ozone/pull/6482

   ## What changes were proposed in this pull request?
   
   Design doc - see content in the PR.
   
   For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. This design outlines a minimal change to allow this feature in the Ozone API.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-10657
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1577988891


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+The intended usage of this API, is that the existing key details are read, then used to open the new key, and then data is written, eg:
+
+```
+OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
+try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
+    existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), newRepConfig) {
+  os.write(bucket.readKey(keyName))
+}
+```
+
+## Upgrade and Compatibility
+
+If a newer client is talking to an older server, it could call the new atomic API but the server will ignore it without error. This is the case for any API change.

Review Comment:
   Good point, I have added a note about using the upgrade framework to avoid this problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1556607139


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.

Review Comment:
   I guess this depends on an implementation detail that still needs to be specified in the doc:
   
   - If rewrite keeps the same update ID that was present at the time of being replaced, then using the update ID field makes sense, because it is just storing its final value as intended. This is probably not a good way to do it though because we want the rewrite to count as an update to the key as well.
   
   - If rewrite increments the update ID, then a new field is probably better. That way we can see rewrite as a new operation. So I guess we want the following in the doc:
     - Rewrite increments the update ID on commit, the same as any other commit operation.
     - If we go the route of persisting rewriteID (or whatever the field is called) to the open key table, do we also persist it to the DB?
       - This would give us an indication that the file was rewritten, but also that is more what the audit logs are for than a DB dump.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1557452239


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.

Review Comment:
   Quoting from the original comment:
   
   {quote}
   ACLs
   This one looks more concerning. I haven't tested this yet, but it looks like the ACLs at the time of create are what are also committed to the final key, without checking if the key being replaced had ACL updates in the mean time. For example:
   
   key1 exists with acl1
   
   key1' is created at the same path as key1
   
   ACLs for key1 are updated to acl2 by another user/admin.
   
   key1' is committed with acl1 that was read at create time.
   
   Now the ACLs have gone back in time without the admin or user intending to make this change.
   {quote}
   
   This cannot happen, as a change to the ACLs will modify the key updateID and the commit would fail.
   
   Any change to the key - data or metadata - changes the updateID and then the initial create or commit will fail depending on the timing of the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1576220522


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,191 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == overwriteExpectedUpdateID. If so, it opens the key and stored the details including the overwriteExpectedUpdateID in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its updateID is unchanged. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing an rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected updateID to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches update ID to the commit request to indicate a rewrite instead of a put
+3. OM checks the update ID if present and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the overwriteUpdateID to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the overwriteExpectedUpdateID keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing overWriteUpdateID, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets.

Review Comment:
   I added a note to the doc about this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1576968299


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.

Review Comment:
   `OmKeyInfo` is used in many places outside of just the open key table:
   - All open key, committed key, deleted key tables. I wouldn't really consider these "wire protocol" since they aren't part of the network.
   - On the client as part of `RpcClient#getKeyInfo`, where it is then wrapped/converted to `OzoneKeyDetails`



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:

Review Comment:
   +1 for the new method.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.

Review Comment:
   If the original key is deleted, this also counts as an update ID "change" that will fail the commit operation, right?



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.

Review Comment:
   More generally, it is not required to be stored in `OmKeyInfo`, which is stored in all key related tables. I know an empty protobuf field will not take up extra space, but it still reduces the scope of the change.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+The intended usage of this API, is that the existing key details are read, then used to open the new key, and then data is written, eg:
+
+```
+OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
+try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
+    existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), newRepConfig) {
+  os.write(bucket.readKey(keyName))
+}
+```
+
+## Upgrade and Compatibility
+
+If a newer client is talking to an older server, it could call the new atomic API but the server will ignore it without error. This is the case for any API change.

Review Comment:
   This is not a new API, it is a new method on the client that uses the existing get/put APIs with a new field. In this distinction lies the problem: the new client will think it has done a consistent, atomic rewrite because the server acks all requests, but actually it may have overwritten new data because the server does not support such functionality. We need to use the client/server versioning framework to have the client fail if the server's component version is too old to support rewrite.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+The intended usage of this API, is that the existing key details are read, then used to open the new key, and then data is written, eg:
+
+```
+OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
+try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
+    existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), newRepConfig) {
+  os.write(bucket.readKey(keyName))

Review Comment:
   Basically size and generation parameters could be removed and the method could pull them itself.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == overwriteExpectedUpdateID. If so, it opens the key and stored the details including the overwriteExpectedUpdateID in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its updateID is unchanged. If so the key is committed, otherwise an error is returned to the client.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The overwriteExpectedUpdateID needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The overwriteExpectedUpdateID needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new overwriteExpectedUpdateID and perform the checked.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the overwriteExpectedUpdateID to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called replaceKeyIfUnchanged, passing either the OzoneKeyDetails of the existing key (which includes the key name and existing updateID, or by passing the key name and updateID explicitly, eg:
+ 
+ ```
+ public OzoneOutputStream replaceKeyIfUnchanged(OzoneKeyDetails keyToOverwrite, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+	  
+// Alternatively or additionally
+
+ public OzoneOutputStream replaceKeyIfUnchanged(String volumeName, String bucketName, String keyName, long size, long expectedUpdateID, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+Note the omission of the metaData map, as the intention of this API is to copy that from what already exisits on the server.

Review Comment:
   I think we are good here. Sounds like we are in agreement that metadata copying will work as usual from the server and API perspective, but the client's methods don't need to expose this functionality right now.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+The intended usage of this API, is that the existing key details are read, then used to open the new key, and then data is written, eg:
+
+```
+OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
+try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
+    existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), newRepConfig) {
+  os.write(bucket.readKey(keyName))
+}
+```
+
+## Upgrade and Compatibility
+
+If a newer client is talking to an older server, it could call the new atomic API but the server will ignore it without error. This is the case for any API change.
+
+There are no changes to protobuf methods.
+
+A single extra field is added to the KeyArgs object, which is passed from the client to OM on key open and commit. This is a new field, so it will be null if not set, and the server will ignore it if it does not expect it.
+
+A single extra field is added to the OMKeyInfo object which is stored in the openKey table. This is a new field, so it will be null if not set, and the server will ignore it if it does not expect it.
+
+There should be not impact on upgrade / downgrade with the new field added in this way.

Review Comment:
   It would be easier to follow if this section was separated into client/server compatibility and disk layout compatibility. I think disk layout compatibility is fine without extra handling, but client/server will need a new version.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+The intended usage of this API, is that the existing key details are read, then used to open the new key, and then data is written, eg:
+
+```
+OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
+try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
+    existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), newRepConfig) {
+  os.write(bucket.readKey(keyName))

Review Comment:
   Wouldn't it be easier to just give `rewriteKey` the path to the key and the fields you want to change, and have the method do the get and put operations inside of it? This seems like a lot of parameter copying for the common use case.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.

Review Comment:
   This needs to be quantified. "appears complex" seems like actual investigation of this approach was not done. The doc can site #5524 and the `atomicKeyCreation` field added. Only 3 files were changed to add this field:
   - `ECKeyOutputStream`
   - `KeyDataStreamOutput`
   - `KeyOutputStream`
   Now whether that is considered an excessive amount of change to rule out this approach is debatable, but at least the doc provides readers with all the information.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+The intended usage of this API, is that the existing key details are read, then used to open the new key, and then data is written, eg:
+
+```
+OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
+try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
+    existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), newRepConfig) {
+  os.write(bucket.readKey(keyName))
+}
+```
+
+## Upgrade and Compatibility
+
+If a newer client is talking to an older server, it could call the new atomic API but the server will ignore it without error. This is the case for any API change.
+
+There are no changes to protobuf methods.

Review Comment:
   What do you mean by "protobuf methods"? The new protobuf fields will cause protobuf to generate new methods. Do you mean there are no new methods that take protobuf parameters, as in no new APIs? Is this referring to OM to DB protocol?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on PR #6482:
URL: https://github.com/apache/ozone/pull/6482#issuecomment-2076110548

   > >Server manages the update ID
   > This does not work in the general case, where a client reads a key, inspects it and decides that it needs rewritten
   > ...  That is the entire point of this change.
   
   This comes back to [this discussion](https://github.com/apache/ozone/pull/6482#discussion_r1578655136). You are correct that this doesn't work for when there is an "inspect" between the read and write, but this doesn't happen in the one example provided by the document. It seems the document is missing a section demonstrating "the entire point of the change".
   
   > An addition which I had not yet considered, is that even on block allocation the generation could be checked against that which is in the key table, so for a large object it could be check at each block boundary too.
   
   This is a great point. I also had only thought about failing early in the context of create key, not on each block operation. Storing the expected ID on the server makes the check on each block boundary possible. Let's add it to the doc.
   
   >  Sometimes it is better to stick with the conventions already in place, rather than going in a new direction that is possibly better, but possibly not. In my opinion, both have their pros and cons and there is no clear best answer.
   
   In this context I was trying to look at approaches from a top down API level, as in what does the client see and is it clear what is happening. While the conventions you mention are here are important to consider too, they are internal details that we have already discussed. It seems there has not been much discussion on what things look like to the client which was the point of this comment.
   
   The reason that the third approach looks strange from the client's perspective is that if you visualize generation ID as a sort of optimistic lock, it looks like the lock is released, and thus loses its guarantees, when create is called. For example:
   ```
   info, genID = getInfo(key) // lock "acquired" optimistically
   if info != expected:
     ostream = rewrite(key, genID) // To the casual reader, it looks like the lock is "released" here.
     write to ostream // Is this safe? Yes, even though it doesn't look like it.
     commit(key) // Is the lock from earlier still respected? Yes, but unintuitive from the API structure due to server "magic".
   ```
   
   However, I think your idea for failing early on block allocations is solid and outweighs the odd looks of this API and wider spread proto changes. With some doc updates to outline this case as only possible when the ID is passed on create, I'm ok to go forward with this implementation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #6482:
URL: https://github.com/apache/ozone/pull/6482#issuecomment-2082284845

   I went ahead and merged it. We can create followup PRs based on the implementation, and I don't like "completed" PRs hanging around in the queue unnecessarily.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1557452239


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.

Review Comment:
   Quoting from the original comment:
   
   {quote}
   ACLs
   This one looks more concerning. I haven't tested this yet, but it looks like the ACLs at the time of create are what are also committed to the final key, without checking if the key being replaced had ACL updates in the mean time. For example:
   
   key1 exists with acl1
   
   key1' is created at the same path as key1
   
   ACLs for key1 are updated to acl2 by another user/admin.
   
   key1' is committed with acl1 that was read at create time.
   
   Now the ACLs have gone back in time without the admin or user intending to make this change.
   {quote}
   
   This cannot happen, as a change to the ACLs will modify the key updateID and the commit would fail.
   
   Any change to the key - data or metadata - changes the updateID and then the initial create or commit will fail depending on the timing of the change.
   
   The change for overwrite suggested here is actually better than the existing "last writer wins" in this regard, as that could result in ACLs loss in the way you described.
   
   I have added a note about the current limitation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on PR #6482:
URL: https://github.com/apache/ozone/pull/6482#issuecomment-2073803476

   I've been able to think about this a bit more and I think a good way to differentiate the approaches is by who manages the update ID during the write. The two most intuitive options would be that either the client manages the update ID (currently the second proposal), or the server manages the update ID (not yet discussed). In the first proposal listed in the doc, both the client and the server are managing the update ID at different parts of the operation and I think this is why it feels "off" to me. Hopefully defining the options in this way can clarify the differences:
   
   - Server manages the update ID:
       1. The key create request would take a flag indicating that this should be an atomic replacement of an existing key.
       2. The server saves the update ID at the time of create in the open key table, and returns an outputstream to the client.
       3. The client reads, writes, and commits the data to rewrite the same as before.
       4. The server checks the update ID saved with the open key on commit.
   
       - Pseudocode:
       ```
       ostream = create(/v1/b1/k1, newRepType, rewrite=true)
       read /v1/b1/k1 into ostream
       commit(/v1/b1/k1)
       ```
   
   - Client manages the update ID (currently proposal 2 in the doc):
       1. The key create request would return an outputstream to the client the same as before.
       2. The client gets the update ID of the key to overwrite.
       3. The client reads and writes the data to rewrite the same as before.
       4. The client commits the key, including the update ID
       5. The server checks the update ID saved with the open key on commit.
       
       - Pseudocode:
       ```
       stream = create(/v1/b1/k1, newRepType)
       genID = getInfo(/v1/b1/k1)
       read /v1/b1/k1 into ostream
       commit(/v1/b1/k1, genID)
       ```
   
   
   
   - Client and server manage the update ID (currently proposal 1 in the doc):
       1. The client gets the update ID of the key to overwrite.
       1. The key create request would take this update ID and return an outputstream to the client.
       3. The client reads and writes the data to rewrite the same as before.
       4. The client commits the key, the same as before.
       5. The server checks the update ID saved with the open key on commit.
   
       - Pseudocode:
       ```
       genID = getInfo(/v1/b1/k1)
       ostream = create(/v1/b1/k1, newRepType, genID)
       read /v1/b1/k1 into ostream
       commit(/v1/b1/k1)
       ```
   
   To me, either of the first two options where only one side is responsible for storing the update ID as the write is ongoing make sense. The third option is a mashup of the others, which IMO is the least intuitive option. The client reads something from the server and then immediately gives the same thing back for the server to manage. It also unnecessarily spreads the update ID into the client/server and server/disk protocols when it only needs to be in one or the other.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1556246528


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.

Review Comment:
   In general, its a bad idea to overload the use of fields in order to save adding a new field. An optional proto field that is not passed carries no overhead. The code and definitions better explain themselves when a field is used for its single intended purpose and not overloaded.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1576156046


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == overwriteExpectedUpdateID. If so, it opens the key and stored the details including the overwriteExpectedUpdateID in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its updateID is unchanged. If so the key is committed, otherwise an error is returned to the client.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The overwriteExpectedUpdateID needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The overwriteExpectedUpdateID needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new overwriteExpectedUpdateID and perform the checked.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the overwriteExpectedUpdateID to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called replaceKeyIfUnchanged, passing either the OzoneKeyDetails of the existing key (which includes the key name and existing updateID, or by passing the key name and updateID explicitly, eg:
+ 
+ ```
+ public OzoneOutputStream replaceKeyIfUnchanged(OzoneKeyDetails keyToOverwrite, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+	  
+// Alternatively or additionally
+
+ public OzoneOutputStream replaceKeyIfUnchanged(String volumeName, String bucketName, String keyName, long size, long expectedUpdateID, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 

Review Comment:
   Yea I can change it to rewrite key. @kerneltime would like use to use "generation" as the ID name, so I will go with that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1577995040


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+The intended usage of this API, is that the existing key details are read, then used to open the new key, and then data is written, eg:
+
+```
+OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
+try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
+    existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), newRepConfig) {
+  os.write(bucket.readKey(keyName))
+}
+```
+
+## Upgrade and Compatibility
+
+If a newer client is talking to an older server, it could call the new atomic API but the server will ignore it without error. This is the case for any API change.
+
+There are no changes to protobuf methods.
+
+A single extra field is added to the KeyArgs object, which is passed from the client to OM on key open and commit. This is a new field, so it will be null if not set, and the server will ignore it if it does not expect it.
+
+A single extra field is added to the OMKeyInfo object which is stored in the openKey table. This is a new field, so it will be null if not set, and the server will ignore it if it does not expect it.
+
+There should be not impact on upgrade / downgrade with the new field added in this way.

Review Comment:
   I added an extra heading for the disk and wire parts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "kerneltime (via GitHub)" <gi...@apache.org>.
kerneltime commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1571160333


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,191 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == overwriteExpectedUpdateID. If so, it opens the key and stored the details including the overwriteExpectedUpdateID in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its updateID is unchanged. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing an rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected updateID to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches update ID to the commit request to indicate a rewrite instead of a put
+3. OM checks the update ID if present and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the overwriteUpdateID to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the overwriteExpectedUpdateID keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing overWriteUpdateID, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets.

Review Comment:
   What is the additional complexity to do it for both?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1557442602


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.

Review Comment:
   Any change to a key, including rewrite will modify the updateID. I thought that was implied by how things work today, but I will add a note to make that clear.
   
   Aside from the openKeyTable, I don't believe it should be written to the DB - it is not needed for anything after the rewrite completes, and it would use extra space forever in RocksDB.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #6482:
URL: https://github.com/apache/ozone/pull/6482#issuecomment-2044897188

   > We should probably restrict this to a single bucket to allow sharding in the future. Lets call that out explicitly.
   
   I don't understand what this means. At key is in a single bucket, and the operations are on a single key ...
   
   > I think we are only handling this for OBS buckets right now, but with plans to have handling for FSO later.
   
   The intention is to handle it for OBS and FSO buckets. OBS is to be worked on first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1576153738


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,191 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == overwriteExpectedUpdateID. If so, it opens the key and stored the details including the overwriteExpectedUpdateID in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its updateID is unchanged. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing an rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected updateID to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches update ID to the commit request to indicate a rewrite instead of a put
+3. OM checks the update ID if present and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the overwriteUpdateID to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the overwriteExpectedUpdateID keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing overWriteUpdateID, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets.

Review Comment:
   With FSO we need to decide on something things like, for example, what happens if the key is moved to a new location. I feel there is enough to figure out with OBS buckets without getting involved in FSO buckets at this stage. It is very much the intention to figure out FSO bucket with an addition to this design after we get OBS buckets working.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1557452239


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.

Review Comment:
   Quoting from the original comment:
   
   {quote}
   ACLs
   This one looks more concerning. I haven't tested this yet, but it looks like the ACLs at the time of create are what are also committed to the final key, without checking if the key being replaced had ACL updates in the mean time. For example:
   
   key1 exists with acl1
   
   key1' is created at the same path as key1
   
   ACLs for key1 are updated to acl2 by another user/admin.
   
   key1' is committed with acl1 that was read at create time.
   
   Now the ACLs have gone back in time without the admin or user intending to make this change.
   {quote}
   
   This cannot happen, as a change to the ACLs will modify the key updateID and the commit would fail.
   
   Any change to the key - data or metadata - changes the updateID and then the initial create or commit will fail depending on the timing of the change.
   
   The change for overwrite suggested here is actually better than the existing "last writer wins" in this regard, as that could result in ACLs loss in the way you described.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1556242669


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == overwriteExpectedUpdateID. If so, it opens the key and stored the details including the overwriteExpectedUpdateID in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its updateID is unchanged. If so the key is committed, otherwise an error is returned to the client.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The overwriteExpectedUpdateID needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The overwriteExpectedUpdateID needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new overwriteExpectedUpdateID and perform the checked.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the overwriteExpectedUpdateID to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called replaceKeyIfUnchanged, passing either the OzoneKeyDetails of the existing key (which includes the key name and existing updateID, or by passing the key name and updateID explicitly, eg:
+ 
+ ```
+ public OzoneOutputStream replaceKeyIfUnchanged(OzoneKeyDetails keyToOverwrite, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+	  
+// Alternatively or additionally
+
+ public OzoneOutputStream replaceKeyIfUnchanged(String volumeName, String bucketName, String keyName, long size, long expectedUpdateID, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+Note the omission of the metaData map, as the intention of this API is to copy that from what already exisits on the server.

Review Comment:
   The existing code, on overwriting a key, expects new meta-data to be passed. It does not copy what is there on the server to the new key. This is, for reasons unknown to me, different from ACLs, which are copied.
   
   The omission of the metadata map, is therefore to remove the ability to pass new metadata, but also to remove the need to copy the old metadata over, making it simpler to use.
   
   I can of course allow the metadata map to be supplied as with other createKey operations, but then it makes this new API perhaps harder to use, and easier to make a mistake with.
   
   If all you want to do, is change the replicationType of a key, then you need to ensure you copy the old metadata map client side. Having a null metadata map is a valid option too, so you cannot tell server side what the intent was if it is missing.
   
   I am happy to go either way on this, but the above is my reasoning for why I designed it this way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1557483588


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:

Review Comment:
   I did look at storing the overwrite ID in the client prior to describing my preferred approach in this document. The two primary reasons I prefer storing the rewriteID in the open key table are:
   
   1. The code which commits a key is very far away from that which opens it, and differs based on Ratis vs EC. That value will need to be passed down though many methods before it gets there. Each replication type will need new tests. In short, I think the code change is more impactful and larger to do it that way.
   2. My suggested technique keeps with the conventions already established in the code, which involves the metadata, ACLs, creation time and replicationType all being established at key open time. While there may be merits in doing all these things a different way, that is not how it is today. I feel it is better to stick with existing conventions rather than fork this in a new direction.
   
   The design described in the PR does not have any compatibility concerns as I understand it. The only concern is that a newer client could make the call and expect the atomic behavior, and the server silently does not behave as expected. That is the case with what you have described too.
   
   I will add a section covering this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1576998707


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.

Review Comment:
   This needs to be quantified. "appears complex" seems like actual investigation of this approach was not done. The doc can site #5524 and the `atomicKeyCreation` field added. Only 3 files were changed to add this field:
   - `ECKeyOutputStream`
   - `KeyDataStreamOutput`
   - `KeyOutputStream`
   
   Now whether that is considered an excessive amount of change to rule out this approach is debatable, but at least the doc provides readers with all the information.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1554331312


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.

Review Comment:
   Why do we need a new field for this? The same `OmKeyInfo` object is used for open and committed keys, so open keys already have an update ID field that is not being used.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.

Review Comment:
   Purely talking about what is already in Ozone here, note #6445 which fixed a bug in this space because encryption keys were not being replaced on overwrite. Also note [this comment](https://github.com/apache/ozone/pull/6385#discussion_r1538383070) which discusses a potential problem with the ACL copying.
   
   I think its important to call out the potential issues with the current design in this section so readers are aware when it is referenced later. 



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == overwriteExpectedUpdateID. If so, it opens the key and stored the details including the overwriteExpectedUpdateID in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its updateID is unchanged. If so the key is committed, otherwise an error is returned to the client.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The overwriteExpectedUpdateID needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The overwriteExpectedUpdateID needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new overwriteExpectedUpdateID and perform the checked.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the overwriteExpectedUpdateID to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called replaceKeyIfUnchanged, passing either the OzoneKeyDetails of the existing key (which includes the key name and existing updateID, or by passing the key name and updateID explicitly, eg:
+ 
+ ```
+ public OzoneOutputStream replaceKeyIfUnchanged(OzoneKeyDetails keyToOverwrite, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+	  
+// Alternatively or additionally
+
+ public OzoneOutputStream replaceKeyIfUnchanged(String volumeName, String bucketName, String keyName, long size, long expectedUpdateID, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 

Review Comment:
   Just some thoughts on the names used, we can continue to discuss:
   
   Would calling the API `rewriteKey` and having it take a `rewriteID` be sufficiently descriptive while less verbose? It seems like a pretty literal description of what the API does.
   
   I think we should avoid the term "update ID" in the API since that is an OM internal detail about how we check the key didn't get changed. By giving this field a more generic name, we have the option to switch this to other things like key version in the future. Right now this is just a token the client gets and gives back to the OM to check. Where the value came from within the OM is not important.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == overwriteExpectedUpdateID. If so, it opens the key and stored the details including the overwriteExpectedUpdateID in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its updateID is unchanged. If so the key is committed, otherwise an error is returned to the client.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The overwriteExpectedUpdateID needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The overwriteExpectedUpdateID needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new overwriteExpectedUpdateID and perform the checked.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the overwriteExpectedUpdateID to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called replaceKeyIfUnchanged, passing either the OzoneKeyDetails of the existing key (which includes the key name and existing updateID, or by passing the key name and updateID explicitly, eg:
+ 
+ ```
+ public OzoneOutputStream replaceKeyIfUnchanged(OzoneKeyDetails keyToOverwrite, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+	  
+// Alternatively or additionally
+
+ public OzoneOutputStream replaceKeyIfUnchanged(String volumeName, String bucketName, String keyName, long size, long expectedUpdateID, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+Note the omission of the metaData map, as the intention of this API is to copy that from what already exisits on the server.

Review Comment:
   Why not allow the API to do atomic rewrite of certain metadata fields in addition to the data rewrite? It seems like more work to enforce this restriction than allow it, and I'm not sure what error it is trying to prevent.



##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:

Review Comment:
   There's at least one other approach I can think of:
   1. Create key always returns the update ID. When calling the rewrite API we save it in memory at the client
   2. Client attaches update ID to the commit request to indicate a rewrite instead of a put
   3. OM checks the update ID if present and returns the corresponding success/fail result
   
   It would be good to turn this section into a pro/con analysis of each approach. Currently I see the above having advantages over the approach listed here, especially given this goal at the end of the doc:
   > The intention of this initial design is to make as few changes to Ozone as possible to enable overwriting a key if it has not changed.
   
   - Fewer proto changes:
     - Server side DB protos do not need to be changed
     - Client side create key request proto does not need to be changed
   - This implies fewer upgrade concerns
     - Only need to worry about client-server cross compatibility, as expected with an API change.
     - No concerns for server upgrade/downgrade.
   - Independent of the system's already problematic metadata overwrite/update semantics described above.
   
   The current proposal may have advantages too, but we really need a side-by-side comparison to make an informed decision.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1577087852


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+The intended usage of this API, is that the existing key details are read, then used to open the new key, and then data is written, eg:
+
+```
+OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
+try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
+    existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), newRepConfig) {
+  os.write(bucket.readKey(keyName))

Review Comment:
   ```
   ostream rewriteKey(key, repType) {
       genID = getInfo(key)
       return create(key, genID)
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1577988292


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+The intended usage of this API, is that the existing key details are read, then used to open the new key, and then data is written, eg:
+
+```
+OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
+try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
+    existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), newRepConfig) {
+  os.write(bucket.readKey(keyName))
+}
+```
+
+## Upgrade and Compatibility
+
+If a newer client is talking to an older server, it could call the new atomic API but the server will ignore it without error. This is the case for any API change.
+
+There are no changes to protobuf methods.

Review Comment:
   There are no new protobuf messages needed. Only a new field is added to an existing method. I reworded this line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "kerneltime (via GitHub)" <gi...@apache.org>.
kerneltime commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1571157453


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,191 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.

Review Comment:
   The choice to base it on Ratis transaction ID is purely opportunistic for now, we could choose implement it entirely differently in the future. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1557452239


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.

Review Comment:
   Quoting from the original comment:
   
   <quote> 
   ACLs
   This one looks more concerning. I haven't tested this yet, but it looks like the ACLs at the time of create are what are also committed to the final key, without checking if the key being replaced had ACL updates in the mean time. For example:
   
   key1 exists with acl1
   
   key1' is created at the same path as key1
   
   ACLs for key1 are updated to acl2 by another user/admin.
   
   key1' is committed with acl1 that was read at create time.
   
   Now the ACLs have gone back in time without the admin or user intending to make this change.
   </quote>
   
   This cannot happen, as a change to the ACLs will modify the key updateID and the commit would fail.
   
   Any change to the key - data or metadata - changes the updateID and then the initial create or commit will fail depending on the timing of the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1576215498


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.

Review Comment:
   @errose28  have you any further comments on this comment after the details I have provided and updated in the doc?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1576983649


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+The intended usage of this API, is that the existing key details are read, then used to open the new key, and then data is written, eg:
+
+```
+OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
+try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
+    existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), newRepConfig) {
+  os.write(bucket.readKey(keyName))
+}
+```
+
+## Upgrade and Compatibility
+
+If a newer client is talking to an older server, it could call the new atomic API but the server will ignore it without error. This is the case for any API change.
+
+There are no changes to protobuf methods.

Review Comment:
   What do you mean by "protobuf methods"? The new protobuf fields will cause protobuf to generate new methods. Do you mean there are no new methods that take protobuf parameters, as in no new APIs? Or is this referring to OM to DB protocol?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel merged PR #6482:
URL: https://github.com/apache/ozone/pull/6482


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1556244957


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == overwriteExpectedUpdateID. If so, it opens the key and stored the details including the overwriteExpectedUpdateID in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its updateID is unchanged. If so the key is committed, otherwise an error is returned to the client.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The overwriteExpectedUpdateID needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The overwriteExpectedUpdateID needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new overwriteExpectedUpdateID and perform the checked.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the overwriteExpectedUpdateID to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called replaceKeyIfUnchanged, passing either the OzoneKeyDetails of the existing key (which includes the key name and existing updateID, or by passing the key name and updateID explicitly, eg:
+ 
+ ```
+ public OzoneOutputStream replaceKeyIfUnchanged(OzoneKeyDetails keyToOverwrite, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+	  
+// Alternatively or additionally
+
+ public OzoneOutputStream replaceKeyIfUnchanged(String volumeName, String bucketName, String keyName, long size, long expectedUpdateID, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+Note the omission of the metaData map, as the intention of this API is to copy that from what already exisits on the server.

Review Comment:
   > Why not allow the API to do atomic rewrite of certain metadata fields in addition to the data rewrite? 
   
   Of course that would be possible, but it expands this design beyond the current need, and in the absence of any users requesting such a feature, I am reluctant to add it.
   
   To allow for metadata only change, there are different APIs need to be modified (eg a metadata only change, wuold not call openKey / commitKey). Probably there needs to be a metadataUpdateID added to existing keys. Old keys will not have that ID, so there may be more forward / backward compatibility concerns.
   
   To keep the scope as small as possible, I am not considering metadata only changes at the current time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "kerneltime (via GitHub)" <gi...@apache.org>.
kerneltime commented on PR #6482:
URL: https://github.com/apache/ozone/pull/6482#issuecomment-2064764273

   The general approach seems fine, to elaborate on my previous feedback in the code PR, I think the internal implementation choices leak out too much to the public/client facing APIs. I would like this PR to be the basis of a first class feature that we can expose via S3 APIs analogous to Google's Object Store. For now the main change I would like to focus is nomenclature clean up (use generation, it is well understood in this context in other storage systems as to what is being discussed, `updateID` is a new name we are introducing and we can choose to do this as a building block for future features) and API name clean up. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1577977566


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.

Review Comment:
   From the line you commented on:
   
   >  On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged
   
   If a key is deleted, it will not exist. Or if it was deleted and re-created, it will have a new updateID based on how things currently work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1576219984


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == overwriteExpectedUpdateID. If so, it opens the key and stored the details including the overwriteExpectedUpdateID in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its updateID is unchanged. If so the key is committed, otherwise an error is returned to the client.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The overwriteExpectedUpdateID needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The overwriteExpectedUpdateID needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new overwriteExpectedUpdateID and perform the checked.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the overwriteExpectedUpdateID to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called replaceKeyIfUnchanged, passing either the OzoneKeyDetails of the existing key (which includes the key name and existing updateID, or by passing the key name and updateID explicitly, eg:
+ 
+ ```
+ public OzoneOutputStream replaceKeyIfUnchanged(OzoneKeyDetails keyToOverwrite, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+	  
+// Alternatively or additionally
+
+ public OzoneOutputStream replaceKeyIfUnchanged(String volumeName, String bucketName, String keyName, long size, long expectedUpdateID, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+Note the omission of the metaData map, as the intention of this API is to copy that from what already exisits on the server.

Review Comment:
   @errose28 is there anything still outstanding here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1576218557


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == overwriteExpectedUpdateID. If so, it opens the key and stored the details including the overwriteExpectedUpdateID in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its updateID is unchanged. If so the key is committed, otherwise an error is returned to the client.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The overwriteExpectedUpdateID needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The overwriteExpectedUpdateID needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new overwriteExpectedUpdateID and perform the checked.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the overwriteExpectedUpdateID to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called replaceKeyIfUnchanged, passing either the OzoneKeyDetails of the existing key (which includes the key name and existing updateID, or by passing the key name and updateID explicitly, eg:
+ 
+ ```
+ public OzoneOutputStream replaceKeyIfUnchanged(OzoneKeyDetails keyToOverwrite, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+	  
+// Alternatively or additionally
+
+ public OzoneOutputStream replaceKeyIfUnchanged(String volumeName, String bucketName, String keyName, long size, long expectedUpdateID, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+Note the omission of the metaData map, as the intention of this API is to copy that from what already exisits on the server.

Review Comment:
   I am happy to implement this inline with the current createKey when overwrite - which is the meta data will have to be explicitly copied from the existing key into the new key by the client. That keeps things consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1578052544


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.

Review Comment:
   I added a reference to that PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1556663174


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,149 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read updateID in a new field. Call this new field overwriteExpectedUpdateID.
+3. On OM, it receives the openKey request as usual and detects the presence of the overwriteExpectedUpdateID.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == overwriteExpectedUpdateID. If so, it opens the key and stored the details including the overwriteExpectedUpdateID in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the overwriteExpectedUpdateID again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its updateID is unchanged. If so the key is committed, otherwise an error is returned to the client.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The overwriteExpectedUpdateID needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The overwriteExpectedUpdateID needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new overwriteExpectedUpdateID and perform the checked.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the overwriteExpectedUpdateID to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called replaceKeyIfUnchanged, passing either the OzoneKeyDetails of the existing key (which includes the key name and existing updateID, or by passing the key name and updateID explicitly, eg:
+ 
+ ```
+ public OzoneOutputStream replaceKeyIfUnchanged(OzoneKeyDetails keyToOverwrite, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+	  
+// Alternatively or additionally
+
+ public OzoneOutputStream replaceKeyIfUnchanged(String volumeName, String bucketName, String keyName, long size, long expectedUpdateID, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+Note the omission of the metaData map, as the intention of this API is to copy that from what already exisits on the server.

Review Comment:
   > The existing code, on overwriting a key, expects new meta-data to be passed. It does not copy what is there on the server to the new key. This is, for reasons unknown to me, different from ACLs, which are copied.
   
   Yeah I think this ACL handling is wrong too, I don't think we need to worry about that part here.
   
   > The omission of the metadata map, is therefore to remove the ability to pass new metadata, but also to remove the need to copy the old metadata over, making it simpler to use.
   
   This ability is defined by the protos and the server, not the wrapper method we provide at the client. Since we are re-using the existing create and commit requests which support metadata alterations, the rewrite API that re-uses these requests would also end up supporting it. To "not support it" the server has to actually reject requests that alter these fields. This is why I said it is actually easier to let metadata updates with create/commit happen as part of the rewrite.
   
   > To keep the scope as small as possible, I am not considering metadata only changes at the current time.
   
   Agreed. I was only referring to cases where metadata happens to be changed as part of the rewrite, not suggesting a separate operation that leaves all the blocks and does a metadata only swap. Actually that operation would be a single transaction so I think it just works already.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1578655136


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+The intended usage of this API, is that the existing key details are read, then used to open the new key, and then data is written, eg:
+
+```
+OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
+try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
+    existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), newRepConfig) {
+  os.write(bucket.readKey(keyName))

Review Comment:
   > Therefore I believe the method must be passed the keyInfo or the details of the key it is to overwrite.
   
   Ok I see my comment here wasn't clear. I was trying to abstract generation ID inside the rewrite method, not get rid of the key parameters. Key parameters here are fine.
   
   > in an earlier version I simply passed the keyInfo, but you and @kerneltime didn't seem to like that
   
   This makes the objection sound arbitrary. There was technical validation to this point. Separating the key parameters like this is consistent with `RpcClient#createKey`. This also leaves `OMKeyInfo` as an output given to the client like its current usage, not input from the client to OM. That's what `KeyArgs` would be for.
   
   > The point of the API is that the application pulls some details and then makes a decision based on those - this key needs updated.
   
   This actually relates to both this comment and [this one](https://github.com/apache/ozone/pull/6482#issuecomment-2075163977)
   
   > This does not work in the general case, where a client reads a key, inspects it and decides that it needs rewritten
   
   I think I see the confusion. They *do* work for the use case presented here in the document, where there are no reads between when the generation ID is read and the rewrite starts. "Use Cases" should really be its own section in the doc with more thorough examples, as it helps answer questions like this.
   Where the suggestion here *doesn't* work is for a use case that your comments seem to imply but the document does not define:
   
   ```
   OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
   if (existingKey.getReplicationType() == RATIS) { // Important addition that changes the guarantees required from the client methods and API
     try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
         existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), EC) {
       os.write(bucket.readKey(keyName))
     }
   }
   ```
   
   So the proposal here looks good, but a use cases section will help illustrate both now and in the future why certain decisions were made instead of others.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #6482:
URL: https://github.com/apache/ozone/pull/6482#issuecomment-2075163977

   > Server manages the update ID
   
   This does not work in the general case, where a client reads a key, inspects it and decides that it needs rewritten. The key on the server could have changed in the meantime resulting in lost updates. The client must pass the generation it expects to overwrite based on what it has read. It cannot just trust that whatever is currently on the server has not changed. That is the entire point of this change.
   
   > Client manages the update ID (currently proposal 2 in the doc):
   >
   >   * The key create request would return an outputstream to the client the same as before.
   >   * The client gets the update ID of the key to overwrite.
   >   * The client reads and writes the data to rewrite the same as before.
   
   This is doable, but not quite as you described. The client would need to read the existing key first to get its meta data. Then, ideally it passes the generation on key open so it can fail fast. If the key has already changed, there is little point in going ahead and writing it all out only to fail at the end.
   
   An addition which I had not yet considered, is that even on block allocation the generation could be checked against that which is in the key table, so for a large object it could be check at each block boundary too. I have not looked at the block allocation code, but I think it must persist the allocated blocks in the open key table along with the key to allow for them to be garbage collected later if the client should crash. I am also not sure what the block allocation protocol looks like, but by storing the expectedGeneration on the server, we avoid any changes to the block allocation protocol and gain this feature.
   
   > To me, either of the first two options where only one side is responsible for storing the update ID as the write is ongoing make sense. The third option is a mashup of the others, which IMO is the least intuitive option. 
   
   But the third option, is how things currently work for the other metadata fields in a key. To do differently is less intuitive as now this solution goes against how all the other fields are stored. To give an analogy from web-development - the current structure is to have the session store on the server, rather than in a cookie. What you want, is to split this new area into something like a cookie session which we also have the server side session. You have already cited that you don't like the current approach, but we are not going to change that. Sometimes it is better to stick with the conventions already in place, rather than going in a new direction that is possibly better, but possibly not. In my opinion, both have their pros and cons and there is no clear best answer.
   
   The HSync code has added information to the openKey table. It has added it to the MetaData map, so it has avoided adding an extra protobuf field in the protobuf at all. That is also something I could consider, but it will be less efficient and kind of sidesteps a lot of the static type checking Java can do for us, so bugs are easier to get in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #6482:
URL: https://github.com/apache/ozone/pull/6482#issuecomment-2072247375

   @kerneltime @errose28 I think I have addressed all comments and added them to the design. Please take another look and let me know if there is anything else you would like changed or added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #6482:
URL: https://github.com/apache/ozone/pull/6482#discussion_r1577710289


##########
hadoop-hdds/docs/content/design/overwrite-key-only-if-unchanged.md:
##########
@@ -0,0 +1,190 @@
+---
+title: Overwriting an Ozone Key only if it has not changed.
+summary: A minimal design illustrating how to replace a key in Ozone only if it has not changes since it was read.
+date: 2024-04-05
+jira: HDDS-10657
+status: accepted
+author: Stephen ODonnell
+---
+
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+
+Ozone offers write semantics where the last writer to commit a key wins. Therefore multiple writers can concurrently write the same key, and which ever commits last will effectively overwrite all data that came before it.
+
+As an extension of this, there is no "locking" on a key which is being replaced.
+
+For any key, but especially a large key, it can take significant time to read and write it. There are scenarios where it would be desirable to replace a key in Ozone, but only if the key has not changed since it was read. With the absence of a lock, such an operation is not possible today.
+
+## As Things Stand
+
+Internally, all Ozone keys have both an objectID and UpdateID which are stored in OM as part of the key metadata.
+
+Each time something changes on the key, whether it is data or metadata, the updateID is changed. It comes from the ratis transactionID and is generally an increasing number.
+
+When an existing key is over written, its existing metadata including the ObjectID and ACLs are mirrored onto the new key version. The only metadata which is replaced is any custom metadata stored on the key by the user. Upon commit, the updateID is also changed to the current Ratis transaction ID.
+
+Writing a key in Ozone is a 3 step process:
+
+1. The key is opened via an Open Key request from the client to OM
+2. The client writes data to the data nodes
+3. The client commits the key to OM via a Commit Key call.
+
+Note, that as things stand, it is possible to lose metadata updates (eg ACL changes) when a key is overwritten.
+
+1. If the key exists, then a new copy of the key is open for writing.
+2. While the new copy is open, another process updates the ACLs for the key
+3. On commit, the new ACLs are not copied to the new key as the new key made a copy of the existing metadata at the time the key was opened.
+
+With the technique described in the next section, that problem is removed in this design, as the ACL update will change the updateID, and the key will not be committed.
+
+## Atomic Key Replacement
+
+In relational database applications, records are often assigned an update counter similar to the updateID for a key in Ozone. The data record can be read and displayed on a UI to be edited, and then written back to the database. However another user could have made an edit to the same record in the mean time, and if the record is written back without any checks, those edits could be lost.
+
+To combat this, "optimistic locking" is used. With Optimistic locking, no locks are actually involved. The client reads the data along with the update counter. When it attempts to write the data back, it validates the record has not change by including the updateID in the update statement, eg:
+
+```
+update customerDetails
+set <columns = values>
+where customerID = :b1
+and updateCounter = :b2
+```
+If no records are updated, the application must display an error or reload the customer record to handle the problem.
+
+In Ozone the same concept can be used to perform an atomic update of a key only if it has not changed since the key details were originally read.
+
+To do this:
+
+1. The client reads the key details as usual. The key details can be extended to include the existing updateID as it is currently not passed to the client. This field already exists, but when exposed to the client it will be referred to as the key generation.
+2. The client opens a new key for writing with the same key name as the original, passing the previously read generation in a new field. Call this new field expectedGeneration.
+3. On OM, it receives the openKey request as usual and detects the presence of the expectedGeneration field.
+4. On OM, it first ensures that a key is present with the given key name and having a updateID == expectedGeneration. If so, it opens the key and stored the details including the expectedGeneration in the openKeyTable. As things stand, the other existing key metadata copied from the original key is stored in the openKeyTable too.
+5. The client continues to write the data as usual.
+6. On commit key, the client does not need to send the expectedGeneration again, as the open key contains it.
+7. On OM, on commit key, it validates the key still exists with the given key name and its stored updateID is unchanged when compared with the expectedGeneration. If so the key is committed, otherwise an error is returned to the client.
+
+Note that any change to a key will change the updateID. This is existing behaviour, and committing a rewritten key will also modify the updateID. Note this also offers protection against concurrent rewrites. 
+
+### Alternative Proposal
+
+1. Pass the expected expectedGeneration to the rewrite API which passes it down to the relevant key stream, effectively saving it on the client
+2. Client attaches the expectedGeneration to the commit request to indicate a rewrite instead of a put
+3. OM checks the passed generation against the stored update ID and returns the corresponding success/fail result
+
+The advantage of this alternative approach is that it does not require the expectedGeneration to be stored in the openKey table.
+
+However the client code required to implement this appears more complex due to having different key commit logic for Ratis and EC and the parameter needing to be passed through many method calls.
+
+The existing implementation for key creation stores various attributes (metadata, creation time, ACLs, ReplicationConfig) in the openKey table, so storing the expectedGeneration keeps with that convention, which is less confusing for future developers.
+
+In terms of forward / backward compatibility both solutions are equivalent. Only a new parameter is required within the KeyArgs passed to create and commit Key.
+
+If an upgraded server is rolled back, it will still be able to deal with an openKey entry containing expectedGeneration, but it will not process it atomically.
+
+### Scope
+
+The intention is to first implement this for OBS buckets. Then address FSO buckets. FSO bucket handling will reuse the same fields, but the handlers on OM are different. We also need to decide on what should happen if a key is renamed or moved folders during the rewrite.
+
+Multi-part keys need more investigation and hence are also excluded in the initial version.
+
+## Changes Required
+
+In order to enable the above steps on Ozone, several small changes are needed.
+
+### Wire Protocol
+
+1. The expectedGeneration needs to be added to the KeyInfo protobuf object so it can be stored in the openKey table.
+2. The expectedGeneration needs to be added to the keyArgs protobuf object, which is passed from the client to OM when creating a key.
+
+No new messages need to be defined.
+
+### On OM
+
+No new OM handlers are needed. The existing OpenKey and CommitKey handlers will receive the new expectedGeneration and perform the checks.
+
+No new locks are needed on OM. As part of the openKey and commitKey, there are existing locks taken to ensure the key open / commit is atomic. The new checks are performed under those locks, and come down to a couple of long comparisons, so add negligible overhead.
+
+### On The Client
+
+ 1. We need to allow the updateID (called generation on the client) of an existing key to be accessible when an existing details are read, by adding it to OzoneKey and OzoneKeyDetails. There are internal object changes and do no impact any APIs.
+ 2. To pass the expectedGeneration to OM on key open, it would be possible to overload the existing OzoneBucket.createKey() method, which already has several overloaded versions, or create a new explicit method on Ozone bucket called rewriteKey, passing the expectedGeneration, eg:
+ 
+ ```
+
+ public OzoneOutputStream rewriteKey(String volumeName, String bucketName, String keyName, long size, long expectedGeneration, ReplicationConfig replicationConfigOfNewKey)
+      throws IOException 
+      
+// Can also add an overloaded version of these methods to pass a metadata map, as with the existing
+// create key method.      
+
+	  
+ ```
+This specification is roughly in line with the exiting createKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig,
+      Map<String, String> metadata)
+```
+
+An alternative, is to create a new overloaded createKey, but it is probably less confusing to have the new rewriteKey method:
+
+```
+  public OzoneOutputStream createKey(
+      String volumeName, String bucketName, String keyName, long size,
+      ReplicationConfig replicationConfig, long expectedUpdateID)
+```
+
+The intended usage of this API, is that the existing key details are read, then used to open the new key, and then data is written, eg:
+
+```
+OzoneKeyDetails exisitingKey = bucket.getKey(keyName);
+try (OutputStream os = bucket.rewriteKey(existingKey.getBucket, existingKey.getVolume, 
+    existingKey.getKeyName, existingKey.getSize(), existingKey.getGeneration(), newRepConfig) {
+  os.write(bucket.readKey(keyName))

Review Comment:
   The key is returned to the client as an object (KeyInfo) - in an earlier version I simply passed the keyInfo, but you and @kerneltime didn't seem to like that, so I changed it to look like the existing create API.
   
   In the general case whatever application is using the rewrite API has to pull a keys details and then decided based on the key metadata or content if it wants to rewrite it or not. Therefore having the rewrite method pull the key details will result in:
   
   1. Two pulls from OM to get the key info when one would have done.
   2. The potential for different key details to be returned between 1 and 2, and the details from 2 may not want to be overwritten by the application.
   
   The point of the API is that the application pulls some details and then makes a decision based on those - this key needs updated. And it determines that "this key" has not changed by the generation / updateID it received then.
   
   Therefore I believe the method must be passed the keyInfo or the details of the key it is to overwrite.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #6482:
URL: https://github.com/apache/ozone/pull/6482#issuecomment-2077002067

   @errose28 I have enhanced the sections about the use case to make it more clear "immediate rewrite" is no the only goal. I have also added a note about the "fail early" on block allocation idea.
   
   Please check and let me know if you are happy, and then I think we can commit this design PR and return to the original code PR after moving it to a branch rather than master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-10657. Design Doc for overwriting a key if it has not changed [ozone]

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on PR #6482:
URL: https://github.com/apache/ozone/pull/6482#issuecomment-2081512198

   > For reconciliation we are leaving the design doc PR open until that phase of development is complete so we can easily update the doc if we find problems in the original plan when implementing.
   
   My 2 cents: creating follow-up PRs for any design change based on implementation experience makes them more visible.  In the single PR case, Git history preserves only the final commit, and readers have to refer to the PR (which is GitHub-specific).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org