You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/02/16 22:01:35 UTC

[GitHub] [beam] reuvenlax commented on a change in pull request #13558: [BEAM-11494] FileIO stops overwriting files on retries

reuvenlax commented on a change in pull request #13558:
URL: https://github.com/apache/beam/pull/13558#discussion_r577167815



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
##########
@@ -401,12 +412,23 @@ public ResourceId apply(@Nonnull Metadata input) {
     List<ResourceId> srcToHandle = new ArrayList<>();
     List<ResourceId> destToHandle = new ArrayList<>();
 
-    List<MatchResult> matchResults = matchResources(srcResourceIds);
-    for (int i = 0; i < matchResults.size(); ++i) {
-      if (!matchResults.get(i).status().equals(Status.NOT_FOUND)) {
-        srcToHandle.add(srcResourceIds.get(i));
-        destToHandle.add(destResourceIds.get(i));
+    List<MatchResult> matchSrcResults = matchResources(srcResourceIds);
+    List<MatchResult> matchDestResults = matchResources(destResourceIds);

Review comment:
       only populate matchDestResults if skipExistingDest is ste.

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
##########
@@ -401,12 +412,23 @@ public ResourceId apply(@Nonnull Metadata input) {
     List<ResourceId> srcToHandle = new ArrayList<>();
     List<ResourceId> destToHandle = new ArrayList<>();
 
-    List<MatchResult> matchResults = matchResources(srcResourceIds);
-    for (int i = 0; i < matchResults.size(); ++i) {
-      if (!matchResults.get(i).status().equals(Status.NOT_FOUND)) {
-        srcToHandle.add(srcResourceIds.get(i));
-        destToHandle.add(destResourceIds.get(i));
+    List<MatchResult> matchSrcResults = matchResources(srcResourceIds);
+    List<MatchResult> matchDestResults = matchResources(destResourceIds);
+
+    for (int i = 0; i < matchSrcResults.size(); ++i) {
+      if (matchSrcResults.get(i).status().equals(Status.NOT_FOUND) && ignoreMissingSrc) {
+        // If the source is not found, and we are ignoring found source files, then we skip it.
+        continue;
+      }
+      if (matchDestResults.get(i).status().equals(Status.OK)
+          && matchDestResults.get(i).metadata().get(0).sizeBytes()
+              == matchSrcResults.get(i).metadata().get(0).sizeBytes()
+          && skipExistingDest) {

Review comment:
       put skipExistingDest first in the conditional.

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
##########
@@ -401,12 +412,23 @@ public ResourceId apply(@Nonnull Metadata input) {
     List<ResourceId> srcToHandle = new ArrayList<>();
     List<ResourceId> destToHandle = new ArrayList<>();
 
-    List<MatchResult> matchResults = matchResources(srcResourceIds);
-    for (int i = 0; i < matchResults.size(); ++i) {
-      if (!matchResults.get(i).status().equals(Status.NOT_FOUND)) {
-        srcToHandle.add(srcResourceIds.get(i));
-        destToHandle.add(destResourceIds.get(i));
+    List<MatchResult> matchSrcResults = matchResources(srcResourceIds);
+    List<MatchResult> matchDestResults = matchResources(destResourceIds);
+
+    for (int i = 0; i < matchSrcResults.size(); ++i) {
+      if (matchSrcResults.get(i).status().equals(Status.NOT_FOUND) && ignoreMissingSrc) {
+        // If the source is not found, and we are ignoring found source files, then we skip it.
+        continue;
+      }
+      if (matchDestResults.get(i).status().equals(Status.OK)
+          && matchDestResults.get(i).metadata().get(0).sizeBytes()
+              == matchSrcResults.get(i).metadata().get(0).sizeBytes()
+          && skipExistingDest) {

Review comment:
       Wish there was something better than byte size we could use here (e.g. file hash).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org