You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/06 19:50:35 UTC

[GitHub] [iceberg] RussellSpitzer opened a new issue, #6367: Partial Progress Compaction can Timeout on Very Large Manfiest Commits

RussellSpitzer opened a new issue, #6367:
URL: https://github.com/apache/iceberg/issues/6367

   ### Apache Iceberg version
   
   1.1.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Partial progress currently works in the following psuedo-code
   
   
   ```
   Rewrite Job Thread Pool In parallel {
      rewriteFiles for a partition/fileGroup // Datafiles generated here
      add result of rewrite to commit queue 
   }
   
   Commit Thread {
      when enough fileGroups have been rewritten perform a commit // Manifests generated at this point in time
   }
   
   Once in parallel has completed {
      Await Termination of Single Threaded (10 Minutes or die)
   }
   ```
   
   See
   https://github.com/apache/iceberg/blob/f5f79a98b5bead5b976378cc2fc45c9454ac7731/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java#L350-L357
   https://github.com/apache/iceberg/blob/f5f79a98b5bead5b976378cc2fc45c9454ac7731/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java#L179-L188
   And 
   https://github.com/apache/iceberg/blob/f5f79a98b5bead5b976378cc2fc45c9454ac7731/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java#L228-L240
   
   The original assumption here is that 10 minutes after the rewrite has completed we should be finished performing all the commits as the commit phase should be relatively fast and the rewrite phase is long. There are a few issues with this, for some users they may be using a very large cluster for the "parallel" phase allowing them to complete the rewrites quickly but these new files will require a huge amount new metadata which in turns would require a large amount of new manifest files. 
   
   In one of our internal examples we have a very large partial progress rewrite in 10 parts. The rewrites start finishing all around the same time basically just enqueuing all the commits to then occur in sequence. The timeline looks basically like this (imagine there are only five commit groups):
   
   ```
   All Rewrites Begin
   1/5 of files Rewritten
   1st Commit Begins
   2/5 of files groups rewritten
   3/5 of files groups rewritten
   4/5 of files groups rewritten
   1st Commit Finishes
   2nd Commit Begins
   5/5 of files groups rewritten
   10 Minute Timer Begins to Finish Commits
   2nd Commit Finishes
   3rd Commit Begins
   // Timeout! 
   ```
   
   I think the best way to improve this, and increase throughput of the operation  is to move the actual writing of manifests into the parallel portion of the operation. In this case we could probably do this by building our commit groups in the Service's offer method rather than in the service thread itself, the the service thread can just be checking for completed commit groups.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] closed issue #6367: Partial Progress Compaction can Timeout on Very Large Manfiest Commits

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #6367: Partial Progress Compaction can Timeout on Very Large Manfiest Commits 
URL: https://github.com/apache/iceberg/issues/6367


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #6367: Partial Progress Compaction can Timeout on Very Large Manfiest Commits

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #6367:
URL: https://github.com/apache/iceberg/issues/6367#issuecomment-1341500255

   Filed a quick PR to just extend the timeout


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #6367: Partial Progress Compaction can Timeout on Very Large Manfiest Commits

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #6367:
URL: https://github.com/apache/iceberg/issues/6367#issuecomment-1339951359

   The first thing to do is to make it so that we don't terminate the commit service while progress is still being made.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on issue #6367: Partial Progress Compaction can Timeout on Very Large Manfiest Commits

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on issue #6367:
URL: https://github.com/apache/iceberg/issues/6367#issuecomment-1600252549

   @RussellSpitzer should we re-open this? Seems like something that users could run into. Or was this fixed by #6539?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #6367: Partial Progress Compaction can Timeout on Very Large Manfiest Commits

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on issue #6367:
URL: https://github.com/apache/iceberg/issues/6367#issuecomment-1600708291

   This should be more or less fixed. Even the very large commits were not taking that long and now that the commit process is asynch we should be safe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #6367: Partial Progress Compaction can Timeout on Very Large Manfiest Commits

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6367:
URL: https://github.com/apache/iceberg/issues/6367#issuecomment-1577715380

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on issue #6367: Partial Progress Compaction can Timeout on Very Large Manfiest Commits

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on issue #6367:
URL: https://github.com/apache/iceberg/issues/6367#issuecomment-1651107411

   Hi @RussellSpitzer @nastra @singhpk234 , to address the above problem, #6539 changes the committing from a single thread to multiple threads. However, in our production, we notice this increased the probability of the commit failing due to concurrent committing. I have submitted a follow-up to address the problem. Please take a look when you are free. Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #6367: Partial Progress Compaction can Timeout on Very Large Manfiest Commits

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6367:
URL: https://github.com/apache/iceberg/issues/6367#issuecomment-1599760524

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org