You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/13 21:24:39 UTC

[GitHub] [iceberg] e-gat opened a new issue, #6421: Running rewriteDataFiles on multiple executors in Spark

e-gat opened a new issue, #6421:
URL: https://github.com/apache/iceberg/issues/6421

   ### Query engine
   
   Spark/EMR
   
   ### Question
   
   Can Spark (3.2.1) / EMR 6.7 with iceberg 1.1 supports running rewriteDataFiles across multiple executors or only on one?
   If so, what is the recommended way to determine how many vCPUs and memory the machine will need and the value for max-concurrent-file-group-rewrites?
   
   Also, same question for rewriteManifests action. 
   
   Thanks 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] e-gat closed issue #6421: Running rewriteDataFiles on multiple executors in Spark

Posted by GitBox <gi...@apache.org>.
e-gat closed issue #6421: Running rewriteDataFiles on multiple executors in Spark
URL: https://github.com/apache/iceberg/issues/6421


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] e-gat commented on issue #6421: Running rewriteDataFiles on multiple executors in Spark

Posted by GitBox <gi...@apache.org>.
e-gat commented on issue #6421:
URL: https://github.com/apache/iceberg/issues/6421#issuecomment-1353571205

   After investigation we found that the latest iceberg versions support running the rewriteDataFiles across multiple executors in spark. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org