You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/13 18:54:22 UTC

[GitHub] [iceberg] Samrose-Ahmed commented on issue #5997: Iceberg table maintenance/compaction within AWS

Samrose-Ahmed commented on issue #5997:
URL: https://github.com/apache/iceberg/issues/5997#issuecomment-1312796748

   I would recommend running a Spark job. An AWS Glue job is the easiest to get started but considering you're running this once, it'll likely be cheaper to run on EMR (serverless or provisioned).  Also, Spark/EMR doesn't run on a single instance, it parallelizes across nodes.
   
   In the future, since you're doing streaming appends/inserts I would recommend doing regular table maintenance, so you don't end up in this situation. You can check this blog post : [Automated Iceberg table maintenance on AWS](https://www.matano.dev/blog/2022/11/04/automated-iceberg-table-maintenance) for how we do it in [Matano](https://github.com/matanolabs/matano), but its fairly simple you need to regularly run compaction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org