You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/02/09 17:49:39 UTC

[GitHub] [iceberg] pan3793 commented on issue #2221: Spark: Extend expire_snapshots procedure with an optional arg for snapshot ids

pan3793 commented on issue #2221:
URL: https://github.com/apache/iceberg/issues/2221#issuecomment-776120425


   > I don't if it's a common use case but our use case is:
   > 
   > 1. We have a CDC pipeline for an online RDBMS table. The writer consumes the CDC log and writes to Iceberg every 15min.
   > 2. The users can query/time travel hot data via 15-min snapshots.
   > 3. Once the data became cold. (usually after 2 weeks in our environment). We want to downsample the snapshots to reduce the data size. To keep the `hourly` snapshots only and remove those in between.
   > 4. (Haven't done it yet). Maybe we want to downsample to daily after a longer period. (Our previous batch ingestion pipeline is based on daily Sqoop jobs)
   
   It coincides with my idea, I am also planing to manage snapshots in such way:
   1. keep hourly snapshot in current day;
   2. keep daily snapshots in current week;
   3. keep weekly snapshots in current month;
   4. etc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org