You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/06/21 01:17:28 UTC

[GitHub] [iceberg] RussellSpitzer opened a new issue, #5098: Stack Overflow due to repeated Set Unions - RewriteDataFilesCommitManager

RussellSpitzer opened a new issue, #5098:
URL: https://github.com/apache/iceberg/issues/5098

   Guava Sets union creates a view with every api call, this ends up with a deeply nested set in RewriteDataFilesCommitManager where we attempt to merge all of the data files added and deleted together. If there is a huge number of sets then we end up with a set which is a view of a set which is a view of a set .... creating a very large stack. This should probably be changed to just using a raw "addAll"
   
   ```java
   diff --git a/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java b/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java
   index 9a5cc4c94..9c9b23988 100644
   --- a/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java
   +++ b/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java
   @@ -74,8 +74,8 @@ public class RewriteDataFilesCommitManager {
        Set<DataFile> rewrittenDataFiles = Sets.newHashSet();
        Set<DataFile> addedDataFiles = Sets.newHashSet();
        for (RewriteFileGroup group : fileGroups) {
   -      rewrittenDataFiles = Sets.union(rewrittenDataFiles, group.rewrittenFiles());
   -      addedDataFiles = Sets.union(addedDataFiles, group.addedFiles());
   +      rewrittenDataFiles.addAll(group.rewrittenFiles();
   +      addedDataFiles.addAll(group.addedFiles());
        }
   
        RewriteFiles rewrite = table.newRewrite().validateFromSnapshot(startingSnapshotId);
    ```
    
    
    -- Error Message after creating large Union
    
    ```
    Caused by: java.lang.StackOverflowError
   	at org.apache.iceberg.relocated.com.google.common.collect.Sets$1$1.<init>(Sets.java:729)
   	at org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:729)
   	at org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:710)
   	at org.apache.iceberg.relocated.com.google.common.collect.Sets$1$1.<init>(Sets.java:730)
   	at org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:729)
   	at org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:710)
   	at org.apache.iceberg.relocated.com.google.common.collect.Sets$1$1.<init>(Sets.java:730)
   	at org.apache.iceberg.relocated.com.google.common.collect.Sets$1.iterator(Sets.java:729)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5098: Stack Overflow due to repeated Set Unions - RewriteDataFilesCommitManager

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #5098:
URL: https://github.com/apache/iceberg/issues/5098#issuecomment-1356911714

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5098: Stack Overflow due to repeated Set Unions - RewriteDataFilesCommitManager

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #5098:
URL: https://github.com/apache/iceberg/issues/5098#issuecomment-1368572621

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] closed issue #5098: Stack Overflow due to repeated Set Unions - RewriteDataFilesCommitManager

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #5098: Stack Overflow due to repeated Set Unions - RewriteDataFilesCommitManager
URL: https://github.com/apache/iceberg/issues/5098


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org