You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/02 02:52:29 UTC
[GitHub] [beam] ahmedabu98 opened a new pull request, #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
ahmedabu98 opened a new pull request, #22138:
URL: https://github.com/apache/beam/pull/22138
There is an edge case when Beam issues multiple rewrites to a GCS bucket. A bundle that rewrites files can succeed, but the fact of its success may not persist in Beam (e.g. autoscaling workers, worker restart). On bundle retry, the rewrite is issued again with the same destination. If this happens when writing to a bucket with a retention policy, we will run into the retentionPolicyNotMet error.
PR adds support for this case by handling the retentionPolicyNotMet error as such:
- If source and destination files have the same checksum, assume they are identical and skip the rewrite
- otherwise, throw an early error
More information in this design document: https://docs.google.com/document/d/11kXzI90KmAyknszSFmtfPcL_GWaVzpt8MojQfifZOoM/edit#heading=h.f8k9hty17wf5
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] chamikaramj commented on a diff in pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
chamikaramj commented on code in PR #22138:
URL: https://github.com/apache/beam/pull/22138#discussion_r922333293
##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -897,6 +898,35 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE
} else {
throw new FileNotFoundException(from.toString());
}
+ } else if (e.getCode() == 403
+ && e.getErrors().size() == 1
+ && e.getErrors().get(0).getReason().equals("retentionPolicyNotMet")) {
+ List<StorageObjectOrIOException> srcAndDestObjects = getObjects(Arrays.asList(from, to));
+ if (srcAndDestObjects
+ .get(0)
+ .storageObject()
+ .getMd5Hash()
+ .equals(srcAndDestObjects.get(1).storageObject().getMd5Hash())) {
Review Comment:
How about,
(srcAndDestObjects
.get(0)..storageObject()..getMd5Hash() != null && srcAndDestObjects.get(0).storageObject().getMd5Hash().equals(...))
just to be safe :) ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
chamikaramj commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1197416534
Run PostCommit_Java_DataflowV2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] chamikaramj merged pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
chamikaramj merged PR #22138:
URL: https://github.com/apache/beam/pull/22138
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on code in PR #22138:
URL: https://github.com/apache/beam/pull/22138#discussion_r922452601
##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -897,6 +898,35 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE
} else {
throw new FileNotFoundException(from.toString());
}
+ } else if (e.getCode() == 403
+ && e.getErrors().size() == 1
+ && e.getErrors().get(0).getReason().equals("retentionPolicyNotMet")) {
+ List<StorageObjectOrIOException> srcAndDestObjects = getObjects(Arrays.asList(from, to));
+ if (srcAndDestObjects
+ .get(0)
+ .storageObject()
+ .getMd5Hash()
+ .equals(srcAndDestObjects.get(1).storageObject().getMd5Hash())) {
Review Comment:
yeah SGTM!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1172820176
R: @johnjcasey
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
chamikaramj commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1199874711
Run PostCommit_Java_DataflowV2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1177776933
Run PostCommit_Java_DataflowV2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1176268677
Run PostCommit_Java_DataflowV2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on code in PR #22138:
URL: https://github.com/apache/beam/pull/22138#discussion_r922299041
##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -897,6 +898,35 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE
} else {
throw new FileNotFoundException(from.toString());
}
+ } else if (e.getCode() == 403
+ && e.getErrors().size() == 1
+ && e.getErrors().get(0).getReason().equals("retentionPolicyNotMet")) {
+ List<StorageObjectOrIOException> srcAndDestObjects = getObjects(Arrays.asList(from, to));
+ if (srcAndDestObjects
+ .get(0)
Review Comment:
We get `retentionPolicyNotMet` error when we prematurely (1) try to overwrite an existing file or (2) try to delete an existing file: https://cloud.google.com/storage/docs/bucket-lock#retention-policy
Deletions are handled in a separate `onFailure()` method [here](https://github.com/apache/beam/blob/209daedff3e60f86188bc9e2d3e7e331e283373b/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java#L815-L843), so at this point I believe we can assume both source and destination exist.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] chamikaramj commented on a diff in pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
chamikaramj commented on code in PR #22138:
URL: https://github.com/apache/beam/pull/22138#discussion_r921730603
##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -897,6 +898,35 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE
} else {
throw new FileNotFoundException(from.toString());
}
+ } else if (e.getCode() == 403
+ && e.getErrors().size() == 1
+ && e.getErrors().get(0).getReason().equals("retentionPolicyNotMet")) {
+ List<StorageObjectOrIOException> srcAndDestObjects = getObjects(Arrays.asList(from, to));
+ if (srcAndDestObjects
+ .get(0)
Review Comment:
Are both the source and the destination guaranteed to exist at this point ?
##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -897,6 +898,35 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE
} else {
throw new FileNotFoundException(from.toString());
}
+ } else if (e.getCode() == 403
+ && e.getErrors().size() == 1
+ && e.getErrors().get(0).getReason().equals("retentionPolicyNotMet")) {
+ List<StorageObjectOrIOException> srcAndDestObjects = getObjects(Arrays.asList(from, to));
+ if (srcAndDestObjects
+ .get(0)
+ .storageObject()
+ .getMd5Hash()
+ .equals(srcAndDestObjects.get(1).storageObject().getMd5Hash())) {
Review Comment:
Please make sure that this does not pass trivially due to default values matching.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1203108283
Run PostCommit_Java_DataflowV2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
chamikaramj commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1205549210
Run Java_Examples_Dataflow PreCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] johnjcasey commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
johnjcasey commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1194454265
run java precommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] johnjcasey commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
johnjcasey commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1176260374
LGTM, very nice work
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on code in PR #22138:
URL: https://github.com/apache/beam/pull/22138#discussion_r922306382
##########
sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java:
##########
@@ -897,6 +898,35 @@ public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOE
} else {
throw new FileNotFoundException(from.toString());
}
+ } else if (e.getCode() == 403
+ && e.getErrors().size() == 1
+ && e.getErrors().get(0).getReason().equals("retentionPolicyNotMet")) {
+ List<StorageObjectOrIOException> srcAndDestObjects = getObjects(Arrays.asList(from, to));
+ if (srcAndDestObjects
+ .get(0)
+ .storageObject()
+ .getMd5Hash()
+ .equals(srcAndDestObjects.get(1).storageObject().getMd5Hash())) {
Review Comment:
Hmmm I'm not sure I follow. From my understanding MD5 hashes are calculated depending on the file so there wouldn't be default values?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] johnjcasey commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
johnjcasey commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1194454451
run java postcommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] johnjcasey commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
johnjcasey commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1176263283
run java precommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #22138: Add support when writing to locked buckets by handling retentionPolicyNotMet error
Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #22138:
URL: https://github.com/apache/beam/pull/22138#issuecomment-1176644256
Run Java PostCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org