You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "waltczhang (via GitHub)" <gi...@apache.org> on 2023/04/19 08:07:39 UTC

[GitHub] [iceberg] waltczhang opened a new pull request, #7361: API, Core, Spark:Add failed group info in rewrite result

waltczhang opened a new pull request, #7361:
URL: https://github.com/apache/iceberg/pull/7361

   Background:
   A table with schema (int id, timestamp ts), When configured 'partial-progress.enabled = true' in rewrite job , the job will return success but in fact rewrite failed, so in order to solve this kind of problem, rewrite result should return group failure information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1191710981


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -337,14 +337,21 @@ private Result doExecuteWithPartialProgress(
         commitManager.service(groupsPerCommit);
     commitService.start();
 
+    List<FileGroupFailureResult> rewriteFailures = Lists.newCopyOnWriteArrayList();

Review Comment:
   In general this would probably be an expensive data-structure for this operation since we never traverse. I don't think this is ever going to be that huge, but let's use something a bit more efficient. 
   
   I think something like 
   ConcurrentLinkedQueue would be fine here, and then we could just create a list out of it when we are done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang closed pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang closed pull request #7361: API, Core, Spark:Add failed group info in rewrite result
URL: https://github.com/apache/iceberg/pull/7361


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1174514707


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -343,8 +344,12 @@ private Result doExecuteWithPartialProgress(
         .executeWith(rewriteService)
         .noRetry()
         .onFailure(
-            (fileGroup, exception) ->
-                LOG.error("Failure during rewrite group {}", fileGroup.info(), exception))
+            (fileGroup, exception) -> {

Review Comment:
   Yes, it is ok that keep a list internally, According to the comments above , it should be like this :
   
   List< FailedFileGroupRewriteResult > failures;
   ...
   onFailure(
      (fileGroup, exception) -> {
         LOG.error("...")
         failures.addFailure(fileGroup, exception)
     }
   ...
    return ImmutableRewriteDataFiles.Result.builder()
                            .rewriteResults(rewriteResults)
                            failedRewriteREsults(failures)
                           .build();



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra merged pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra merged PR #7361:
URL: https://github.com/apache/iceberg/pull/7361


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1190079268


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -337,14 +337,21 @@ private Result doExecuteWithPartialProgress(
         commitManager.service(groupsPerCommit);
     commitService.start();
 
+    List<FileGroupFailureResult> rewriteFailures = Lists.newArrayList();

Review Comment:
   Race Condition here, need to use a concurrent structure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1174514707


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -343,8 +344,12 @@ private Result doExecuteWithPartialProgress(
         .executeWith(rewriteService)
         .noRetry()
         .onFailure(
-            (fileGroup, exception) ->
-                LOG.error("Failure during rewrite group {}", fileGroup.info(), exception))
+            (fileGroup, exception) -> {

Review Comment:
   Yes, it is ok that keep a list internally, According to the comments above , it should be like this :
   ```
   List< FailedFileGroupRewriteResult > failures;
   ...
   onFailure(
      (fileGroup, exception) -> {
         LOG.error("...")
         failures.addFailure(fileGroup, exception)
     }
   ...
    return ImmutableRewriteDataFiles.Result.builder()
                            .rewriteResults(rewriteResults)
                            .failedRewriteREsults(failures)
                           .build();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1176970337


##########
api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java:
##########
@@ -197,6 +197,11 @@ default int rewrittenDataFilesCount() {
     default long rewrittenBytesCount() {
       return rewriteResults().stream().mapToLong(FileGroupRewriteResult::rewrittenBytesCount).sum();
     }
+
+    @Value.Default

Review Comment:
   I think we should probably have result have two new interfaces, something like
   ```java
   List<FileGroupFailureResult> rewriteFailures();
   
   default long failedGroupCount() {
         return rewriteFailures().size();
   }
   
   default long failedDataFilesRewrites() {
       return rewriteFailures().stream().mapToInt(FailedFileGroup::filescount).sum()
   }
   ```
   Then the two failures



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#issuecomment-1545845275

   LGTM, will merge after tests pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#issuecomment-1512454864

   @szehon-ho and I should be able to review this at some point


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1170995299


##########
core/src/main/java/org/apache/iceberg/actions/RewriteFileGroup.java:
##########
@@ -35,26 +35,32 @@
 public class RewriteFileGroup {
   private final FileGroupInfo info;
   private final List<FileScanTask> fileScanTasks;
+  private boolean isFailed;
 
   private Set<DataFile> addedFiles = Collections.emptySet();
 
-  public RewriteFileGroup(FileGroupInfo info, List<FileScanTask> fileScanTasks) {
+  public RewriteFileGroup(FileGroupInfo info, List<FileScanTask> fileScanTasks, boolean isFailed) {

Review Comment:
   It is better to set the default, thanks nastra.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1174514707


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -343,8 +344,12 @@ private Result doExecuteWithPartialProgress(
         .executeWith(rewriteService)
         .noRetry()
         .onFailure(
-            (fileGroup, exception) ->
-                LOG.error("Failure during rewrite group {}", fileGroup.info(), exception))
+            (fileGroup, exception) -> {

Review Comment:
   Yes, it is ok that keep a list internally, According to the comments above , it should be like this :
   ```List< FailedFileGroupRewriteResult > failures;
   ...
   onFailure(
      (fileGroup, exception) -> {
         LOG.error("...")
         failures.addFailure(fileGroup, exception)
     }
   ...
    return ImmutableRewriteDataFiles.Result.builder()
                            .rewriteResults(rewriteResults)
                            failedRewriteREsults(failures)
                           .build();```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1186611885


##########
core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java:
##########
@@ -199,12 +201,28 @@ public void offer(RewriteFileGroup group) {
       commitReadyCommitGroups();
     }
 
+    /**
+     * Places a failed file group in the queue to be asynchronously added to {@link
+     * #committerService} when the service has been closed.
+     *
+     * @param group file group had failed
+     */
+    public void failedRewrite(RewriteFileGroup group) {
+      LOG.debug("Offered to failed service: {}", group);
+      Preconditions.checkState(
+          running.get(), "Cannot add rewrites to a service which has already been closed");
+      failedRewrites.add(group);
+    }
+
     /** Returns all File groups which have been committed */
     public List<RewriteFileGroup> results() {
+      List<RewriteFileGroup> results = Lists.newArrayList();
       Preconditions.checkState(
           committerService.isShutdown(),
           "Cannot get results from a service which has not been closed");
-      return committedRewrites;
+      results.addAll(committedRewrites);
+      results.addAll(failedRewrites);

Review Comment:
   Thanks RussellSpitzer,
   It has been modified here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1170991762


##########
.palantir/revapi.yml:
##########
@@ -426,6 +426,12 @@ acceptedBreaks:
     - code: "java.field.removedWithConstant"
       old: "field org.apache.iceberg.TableProperties.HMS_TABLE_OWNER"
       justification: "Removing deprecations for 1.3.0"
+    - code: "java.method.numberOfParametersChanged"

Review Comment:
   Thanks nastra, It is not reasonable to modify here, I will fix it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1176962658


##########
core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java:
##########
@@ -199,12 +201,28 @@ public void offer(RewriteFileGroup group) {
       commitReadyCommitGroups();
     }
 
+    /**
+     * Places a failed file group in the queue to be asynchronously added to {@link
+     * #committerService} when the service has been closed.
+     *
+     * @param group file group had failed
+     */
+    public void failedRewrite(RewriteFileGroup group) {
+      LOG.debug("Offered to failed service: {}", group);
+      Preconditions.checkState(
+          running.get(), "Cannot add rewrites to a service which has already been closed");
+      failedRewrites.add(group);
+    }
+
     /** Returns all File groups which have been committed */
     public List<RewriteFileGroup> results() {
+      List<RewriteFileGroup> results = Lists.newArrayList();
       Preconditions.checkState(
           committerService.isShutdown(),
           "Cannot get results from a service which has not been closed");
-      return committedRewrites;
+      results.addAll(committedRewrites);
+      results.addAll(failedRewrites);

Review Comment:
   Not sure we should mix these in the same list? Maybe  we should split this into just two separate lists (or methods)?
   
   Results() ++
   Failures() ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1174186347


##########
api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java:
##########
@@ -197,6 +197,11 @@ default int rewrittenDataFilesCount() {
     default long rewrittenBytesCount() {
       return rewriteResults().stream().mapToLong(FileGroupRewriteResult::rewrittenBytesCount).sum();
     }
+
+    @Value.Default

Review Comment:
   Instead of re-using the FileGroupRewriteResult, would it make more sense to have a FailedFileGroupRewriteResult, with just failure?    
   
   I dont think the bytesCount/fileCount make too much sense on failure, and we can also put the exception there? cc @aokolnychyi @RussellSpitzer for thoughts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1176960270


##########
core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java:
##########
@@ -199,12 +201,28 @@ public void offer(RewriteFileGroup group) {
       commitReadyCommitGroups();
     }
 
+    /**

Review Comment:
   This method confuses me a bit,
   
   1. It's called failedRewrite but sounds like its only for a very specific case of failure
   2. It will never actually succeed since if it is called when service is closed it will fail the precondition and throw an exception



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1189973642


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java:
##########
@@ -756,6 +756,8 @@ public void testPartialProgressWithRewriteFailure() {
     RewriteDataFiles.Result result = spyRewrite.execute();
 
     Assert.assertEquals("Should have 7 fileGroups", result.rewriteResults().size(), 7);
+    Assert.assertEquals("Should have 3 failed fileGroups", result.rewriteFailures().size(), 3);

Review Comment:
   Thank you for your patience in explaining, I got it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1191713621


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -337,14 +337,21 @@ private Result doExecuteWithPartialProgress(
         commitManager.service(groupsPerCommit);
     commitService.start();
 
+    List<FileGroupFailureResult> rewriteFailures = Lists.newCopyOnWriteArrayList();

Review Comment:
   Just to note, I don't think we actually would have performance issues with copy on write array list, it just seems a bit wasteful to me in this context.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1174514707


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -343,8 +344,12 @@ private Result doExecuteWithPartialProgress(
         .executeWith(rewriteService)
         .noRetry()
         .onFailure(
-            (fileGroup, exception) ->
-                LOG.error("Failure during rewrite group {}", fileGroup.info(), exception))
+            (fileGroup, exception) -> {

Review Comment:
   Yes, it is ok that keep a list internally, According to the comments above , it should be like this :
   `List< FailedFileGroupRewriteResult > failures;
   ...
   onFailure(
      (fileGroup, exception) -> {
         LOG.error("...")
         failures.addFailure(fileGroup, exception)
     }
   ...
    return ImmutableRewriteDataFiles.Result.builder()
                            .rewriteResults(rewriteResults)
                            failedRewriteREsults(failures)
                           .build();`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1176956340


##########
api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java:
##########
@@ -197,6 +197,11 @@ default int rewrittenDataFilesCount() {
     default long rewrittenBytesCount() {
       return rewriteResults().stream().mapToLong(FileGroupRewriteResult::rewrittenBytesCount).sum();
     }
+
+    @Value.Default

Review Comment:
   Not sure why we would need a "boolean" wouldn't it be this class because it failed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1190079268


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -337,14 +337,21 @@ private Result doExecuteWithPartialProgress(
         commitManager.service(groupsPerCommit);
     commitService.start();
 
+    List<FileGroupFailureResult> rewriteFailures = Lists.newArrayList();

Review Comment:
   Race Condition here, need to use a concurrent structure.
   
   The code below adds to this arraylist in a concurrent manner, if multiple groups fail at the same time we will end up call "add" simultaneously which will have some unexpected behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1189564974


##########
api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java:
##########
@@ -197,6 +199,10 @@ default int rewrittenDataFilesCount() {
     default long rewrittenBytesCount() {
       return rewriteResults().stream().mapToLong(FileGroupRewriteResult::rewrittenBytesCount).sum();
     }
+
+    default int failedDataFilesCount() {

Review Comment:
   this should have `@Value.Default`



##########
core/src/main/java/org/apache/iceberg/actions/BaseRewriteDataFilesResult.java:
##########
@@ -30,12 +31,21 @@
 public class BaseRewriteDataFilesResult implements Result {
   private final List<FileGroupRewriteResult> rewriteResults;
 
-  public BaseRewriteDataFilesResult(List<FileGroupRewriteResult> rewriteResults) {
+  private final List<FileGroupFailureResult> rewriteFailures;

Review Comment:
   this class is deprecated and not used anymore, so no need to add changes to it. Not modifying this class also ensures that no API-breaking changes are introduced (as indicated by RevAPI)



##########
api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java:
##########
@@ -181,6 +181,8 @@ default RewriteDataFiles zOrder(String... columns) {
   interface Result {
     List<FileGroupRewriteResult> rewriteResults();
 
+    List<FileGroupFailureResult> rewriteFailures();

Review Comment:
   to avoid introducing API-breaking changes, what you'd rather want to do is 
   ```
       @Value.Default
       default List<FileGroupFailureResult> rewriteFailures() {
         return ImmutableList.of();
       }
   ```



##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java:
##########
@@ -756,6 +756,8 @@ public void testPartialProgressWithRewriteFailure() {
     RewriteDataFiles.Result result = spyRewrite.execute();
 
     Assert.assertEquals("Should have 7 fileGroups", result.rewriteResults().size(), 7);
+    Assert.assertEquals("Should have 3 failed fileGroups", result.rewriteFailures().size(), 3);

Review Comment:
   I think it's better to change this to 
   ```
       assertThat(result.rewriteResults()).hasSize(7);
       assertThat(result.rewriteFailures()).hasSize(3);
       assertThat(result.failedDataFilesCount()).isEqualTo(6);
   ```
   
   the advantage here is that the content of `result.rewriteResults()` / `result.rewriteFailures()` will be shown if the assertion ever fails, which makes debugging a lot easier.
   
   Could you please do the same changes further below in L801ff



##########
.palantir/revapi.yml:
##########
@@ -426,6 +426,12 @@ acceptedBreaks:
     - code: "java.field.removedWithConstant"
       old: "field org.apache.iceberg.TableProperties.HMS_TABLE_OWNER"
       justification: "Removing deprecations for 1.3.0"
+    - code: "java.method.numberOfParametersChanged"

Review Comment:
   @waltczhang can you please revert all changes to the `revapi.yml`? Those shouldn't be necessary anymore once my other comments are applied



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1189969219


##########
api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java:
##########
@@ -181,6 +181,8 @@ default RewriteDataFiles zOrder(String... columns) {
   interface Result {
     List<FileGroupRewriteResult> rewriteResults();
 
+    List<FileGroupFailureResult> rewriteFailures();

Review Comment:
   Thanks for your tips, I know how to do and modify.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#issuecomment-1549012077

   Merging, since we have 2 approving reviews and CI passed. Thanks @waltczhang.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1174188399


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -343,8 +344,12 @@ private Result doExecuteWithPartialProgress(
         .executeWith(rewriteService)
         .noRetry()
         .onFailure(
-            (fileGroup, exception) ->
-                LOG.error("Failure during rewrite group {}", fileGroup.info(), exception))
+            (fileGroup, exception) -> {

Review Comment:
   I'm wondering if we can't avoid touching the commitService.  It doesnt seem like its really the responsibility of the commit service to handle the failed ones.  Can't this method just keep a list internally?  ie,
   
   ```
   
   List<Failure> failures;
   ...
   onFailure(
      (fileGroup, exception) -> {
         LOG.error("...")
         failures.addFailure(fileGroup, exception)
     }
   ```



##########
api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java:
##########
@@ -197,6 +197,11 @@ default int rewrittenDataFilesCount() {
     default long rewrittenBytesCount() {
       return rewriteResults().stream().mapToLong(FileGroupRewriteResult::rewrittenBytesCount).sum();
     }
+
+    @Value.Default

Review Comment:
   Would it make more sense to have a FailedFileGroupRewriteResult, with just failure?  Can also put exception message there then?  cc @aokolnychyi @RussellSpitzer for thoughts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1174514707


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -343,8 +344,12 @@ private Result doExecuteWithPartialProgress(
         .executeWith(rewriteService)
         .noRetry()
         .onFailure(
-            (fileGroup, exception) ->
-                LOG.error("Failure during rewrite group {}", fileGroup.info(), exception))
+            (fileGroup, exception) -> {

Review Comment:
   Yes, it is ok that keep a list internally, According to the comments above , it should be like this :
   ```
   List< FailedFileGroupRewriteResult > failures;
   ...
   onFailure(
      (fileGroup, exception) -> {
         LOG.error("...")
         failures.addFailure(fileGroup, exception)
     }
   ...
    return ImmutableRewriteDataFiles.Result.builder()
                            .rewriteResults(rewriteResults)
                            failedRewriteREsults(failures)
                           .build();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1176970337


##########
api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java:
##########
@@ -197,6 +197,11 @@ default int rewrittenDataFilesCount() {
     default long rewrittenBytesCount() {
       return rewriteResults().stream().mapToLong(FileGroupRewriteResult::rewrittenBytesCount).sum();
     }
+
+    @Value.Default

Review Comment:
   I think we should probably have result have two new interfaces, something like
   ```java
   List<FileGroupFailureResult> rewriteFailures();
   
   default long failedGroupCount() {
         return rewriteFailures().size();
   }
   
   default long failedDataFilesRewrites() {
       return rewriteFailures().stream().mapToInt(FailedFileGroup::filescount).sum()
   }
   ```
   Then we won't mix failures and successes in the same list



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1186611595


##########
api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java:
##########
@@ -197,6 +197,11 @@ default int rewrittenDataFilesCount() {
     default long rewrittenBytesCount() {
       return rewriteResults().stream().mapToLong(FileGroupRewriteResult::rewrittenBytesCount).sum();
     }
+
+    @Value.Default

Review Comment:
   Thanks @RussellSpitzer reply,
   Yes, "boolean" is redundant, They are great advice, I wiil fix them according to your advice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1191845490


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -337,14 +337,21 @@ private Result doExecuteWithPartialProgress(
         commitManager.service(groupsPerCommit);
     commitService.start();
 
+    List<FileGroupFailureResult> rewriteFailures = Lists.newCopyOnWriteArrayList();

Review Comment:
   After reading the introduction of newCopyOnWriteArrayList and ConcurrentLinkedQueue, it is true that the latter is more efficient, thank you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1169694391


##########
.palantir/revapi.yml:
##########
@@ -426,6 +426,12 @@ acceptedBreaks:
     - code: "java.field.removedWithConstant"
       old: "field org.apache.iceberg.TableProperties.HMS_TABLE_OWNER"
       justification: "Removing deprecations for 1.3.0"
+    - code: "java.method.numberOfParametersChanged"

Review Comment:
   we typically don't want to introduce breaking API changes that haven't gone through a deprecation cycle first. See https://iceberg.apache.org/contribute/#minor-version-deprecations-required for some details.



##########
core/src/main/java/org/apache/iceberg/actions/RewriteFileGroup.java:
##########
@@ -35,26 +35,32 @@
 public class RewriteFileGroup {
   private final FileGroupInfo info;
   private final List<FileScanTask> fileScanTasks;
+  private boolean isFailed;
 
   private Set<DataFile> addedFiles = Collections.emptySet();
 
-  public RewriteFileGroup(FileGroupInfo info, List<FileScanTask> fileScanTasks) {
+  public RewriteFileGroup(FileGroupInfo info, List<FileScanTask> fileScanTasks, boolean isFailed) {

Review Comment:
   do we really need to pass a boolean here or could we derive that info maybe in a different way?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1174514233


##########
api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java:
##########
@@ -197,6 +197,11 @@ default int rewrittenDataFilesCount() {
     default long rewrittenBytesCount() {
       return rewriteResults().stream().mapToLong(FileGroupRewriteResult::rewrittenBytesCount).sum();
     }
+
+    @Value.Default

Review Comment:
   I think it is ok that have a FailedFileGroupRewriteResult.  What about the following data structure?
    FailedFileGroupRewriteResult {
       FileGroupInfo info();
       boolean isFailedGroup()
   }
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add failed group info in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1186611832


##########
core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java:
##########
@@ -199,12 +201,28 @@ public void offer(RewriteFileGroup group) {
       commitReadyCommitGroups();
     }
 
+    /**

Review Comment:
   Thanks RussellSpitzer,
   It has been modified here, using szehon-ho's suggestion above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1190057009


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java:
##########
@@ -756,6 +756,8 @@ public void testPartialProgressWithRewriteFailure() {
     RewriteDataFiles.Result result = spyRewrite.execute();
 
     Assert.assertEquals("Should have 7 fileGroups", result.rewriteResults().size(), 7);
+    assertThat(result.rewriteFailures()).hasSize(3);

Review Comment:
   nit: could you also update the line above please?



##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java:
##########
@@ -797,6 +799,8 @@ public void testParallelPartialProgressWithRewriteFailure() {
     RewriteDataFiles.Result result = spyRewrite.execute();
 
     Assert.assertEquals("Should have 7 fileGroups", result.rewriteResults().size(), 7);
+    assertThat(result.rewriteFailures()).hasSize(3);

Review Comment:
   nit: could you also update the line above please?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on a diff in pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on code in PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#discussion_r1190566676


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java:
##########
@@ -337,14 +337,21 @@ private Result doExecuteWithPartialProgress(
         commitManager.service(groupsPerCommit);
     commitService.start();
 
+    List<FileGroupFailureResult> rewriteFailures = Lists.newArrayList();

Review Comment:
   Thanks RussellSpitzer for pointing out this problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waltczhang commented on pull request #7361: API, Core, Spark:Add file groups failure in rewrite result

Posted by "waltczhang (via GitHub)" <gi...@apache.org>.
waltczhang commented on PR #7361:
URL: https://github.com/apache/iceberg/pull/7361#issuecomment-1548862400

   I noticed that the unit tests have all passed. Could you please merge them when you have time? Thanks.
   @RussellSpitzer @nastra @aokolnychyi @szehon-ho 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org