You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/02 02:52:29 UTC

[GitHub] [beam] ahmedabu98 opened a new pull request, #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

ahmedabu98 opened a new pull request, #22138:
URL: https://github.com/apache/beam/pull/22138

   There is an edge case when Beam issues multiple rewrites to a GCS bucket. A bundle that rewrites files can succeed, but the fact of its success may not persist in Beam (e.g. autoscaling workers, worker restart). On bundle retry, the rewrite is issued again with the same destination. If this happens when writing to a bucket with a retention policy, we will run into the retentionPolicyNotMet error.
   
   PR adds support for this case by handling the retentionPolicyNotMet error as such:
   - If source and destination files have the same checksum, assume they are identical and skip the rewrite
   - otherwise, throw an early error
   
   More information in this design document: https://docs.google.com/document/d/11kXzI90KmAyknszSFmtfPcL_GWaVzpt8MojQfifZOoM/edit#heading=h.f8k9hty17wf5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on a diff in pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on code in PR #22138:
URL: https://github.com/apache/beam/pull/22138#discussion_r922333293


##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -897,6 +898,35 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE
         } else {
           throw new FileNotFoundException(from.toString());
         }
+      } else if (e.getCode() == 403
+          && e.getErrors().size() == 1
+          && e.getErrors().get(0).getReason().equals("retentionPolicyNotMet")) {
+        List<StorageObjectOrIOException> srcAndDestObjects = getObjects(Arrays.asList(from, to));
+        if (srcAndDestObjects
+            .get(0)
+            .storageObject()
+            .getMd5Hash()
+            .equals(srcAndDestObjects.get(1).storageObject().getMd5Hash())) {

Review Comment:
   How about,
   (srcAndDestObjects
               .get(0)..storageObject()..getMd5Hash() != null && srcAndDestObjects.get(0).storageObject().getMd5Hash().equals(...)) 
   
   just to be safe :) ?
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1197416534

   Run PostCommit_Java_DataflowV2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj merged pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
chamikaramj merged PR #22138:
URL: https://github.com/apache/beam/pull/22138


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on code in PR #22138:
URL: https://github.com/apache/beam/pull/22138#discussion_r922452601


##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -897,6 +898,35 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE
         } else {
           throw new FileNotFoundException(from.toString());
         }
+      } else if (e.getCode() == 403
+          && e.getErrors().size() == 1
+          && e.getErrors().get(0).getReason().equals("retentionPolicyNotMet")) {
+        List<StorageObjectOrIOException> srcAndDestObjects = getObjects(Arrays.asList(from, to));
+        if (srcAndDestObjects
+            .get(0)
+            .storageObject()
+            .getMd5Hash()
+            .equals(srcAndDestObjects.get(1).storageObject().getMd5Hash())) {

Review Comment:
   yeah SGTM!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1172820176

   R: @johnjcasey 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1199874711

   Run PostCommit_Java_DataflowV2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1177776933

   Run PostCommit_Java_DataflowV2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1176268677

   Run PostCommit_Java_DataflowV2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on code in PR #22138:
URL: https://github.com/apache/beam/pull/22138#discussion_r922299041


##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -897,6 +898,35 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE
         } else {
           throw new FileNotFoundException(from.toString());
         }
+      } else if (e.getCode() == 403
+          && e.getErrors().size() == 1
+          && e.getErrors().get(0).getReason().equals("retentionPolicyNotMet")) {
+        List<StorageObjectOrIOException> srcAndDestObjects = getObjects(Arrays.asList(from, to));
+        if (srcAndDestObjects
+            .get(0)

Review Comment:
   We get `retentionPolicyNotMet` error when we prematurely (1) try to overwrite an existing file or (2) try to delete an existing file: https://cloud.google.com/storage/docs/bucket-lock#retention-policy
   
   Deletions are handled in a separate `onFailure()` method [here](https://github.com/apache/beam/blob/209daedff3e60f86188bc9e2d3e7e331e283373b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L815-L843), so at this point I believe we can assume both source and destination exist.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on a diff in pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on code in PR #22138:
URL: https://github.com/apache/beam/pull/22138#discussion_r921730603


##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -897,6 +898,35 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE
         } else {
           throw new FileNotFoundException(from.toString());
         }
+      } else if (e.getCode() == 403
+          && e.getErrors().size() == 1
+          && e.getErrors().get(0).getReason().equals("retentionPolicyNotMet")) {
+        List<StorageObjectOrIOException> srcAndDestObjects = getObjects(Arrays.asList(from, to));
+        if (srcAndDestObjects
+            .get(0)

Review Comment:
   Are both the source and the destination guaranteed to exist at this point ?



##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -897,6 +898,35 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE
         } else {
           throw new FileNotFoundException(from.toString());
         }
+      } else if (e.getCode() == 403
+          && e.getErrors().size() == 1
+          && e.getErrors().get(0).getReason().equals("retentionPolicyNotMet")) {
+        List<StorageObjectOrIOException> srcAndDestObjects = getObjects(Arrays.asList(from, to));
+        if (srcAndDestObjects
+            .get(0)
+            .storageObject()
+            .getMd5Hash()
+            .equals(srcAndDestObjects.get(1).storageObject().getMd5Hash())) {

Review Comment:
   Please make sure that this does not pass trivially due to default values matching.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1203108283

   Run PostCommit_Java_DataflowV2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1205549210

   Run Java_Examples_Dataflow PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1194454265

   run java precommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1176260374

   LGTM, very nice work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on code in PR #22138:
URL: https://github.com/apache/beam/pull/22138#discussion_r922306382


##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -897,6 +898,35 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE
         } else {
           throw new FileNotFoundException(from.toString());
         }
+      } else if (e.getCode() == 403
+          && e.getErrors().size() == 1
+          && e.getErrors().get(0).getReason().equals("retentionPolicyNotMet")) {
+        List<StorageObjectOrIOException> srcAndDestObjects = getObjects(Arrays.asList(from, to));
+        if (srcAndDestObjects
+            .get(0)
+            .storageObject()
+            .getMd5Hash()
+            .equals(srcAndDestObjects.get(1).storageObject().getMd5Hash())) {

Review Comment:
   Hmmm I'm not sure I follow. From my understanding MD5 hashes are calculated depending on the file so there wouldn't be default values?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1194454451

   run java postcommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1176263283

   run java precommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1176644256

   Run Java PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org