You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Ryan Blue <rb...@netflix.com.INVALID> on 2019/05/06 19:14:04 UTC

Re: Table metadata versions expiration

Arina,

So far, we’ve kept these around to help troubleshoot format problems. It
has been a fairly cheap way to be able to see exactly what happened to the
table. But we’re also getting to the point where we no longer need to refer
back to them and should think about adding a way to remove them.
Technically, you don’t need to keep them around once you’ve committed the
new version, but an easy way to roll back is to change the database pointer
so it is nice to keep a few of them.

I think we can probably build a way to expire old metadata versions by
looking for a naming pattern, like v(num)-(uuid).metadata.json[.gz]. Would
you like to add an issue and maybe a PR for this?

rb

On Sat, May 4, 2019 at 7:43 AM Arina Yelchiyeva <ar...@gmail.com>
wrote:

> Hi all,
>
> Iceberg table has expire snapshots notion, which helps to delete snapshots
> that are no longer needed along with data files, manifest and manifest
> lists:
>
>         // clean up the expired snapshots:
>         // 1. Get a list of the snapshots that were removed
>         // 2. Delete any data files that were deleted by those snapshots
> and are not in the table
>         // 3. Delete any manifests that are no longer used by current
> snapshots
>         // 4. Delete the manifest lists
>
> But we also have table metadata which is stored in JSON. New metadata
> version is created for each metadata change.
> I was assuming that with snapshot expiration operation, unneeded metadata
> files will also be deleted but they are not.
>
> My concern is that having JSON file for each metadata change with time may
> consume lots of space (setting `iceberg.compress.metadata` to true can help
> but not for long).
> Is there an option to expire table metadata versions as well?
>
> Kind regards,
> Arina



-- 
Ryan Blue
Software Engineer
Netflix

Re: Table metadata versions expiration

Posted by Arina Yelchiyeva <ar...@gmail.com>.
Ryan, thanks for the reply.
I have created issue (https://github.com/apache/incubator-iceberg/issues/181 <https://github.com/apache/incubator-iceberg/issues/181>) and will try to come up with the PR.

Kind regards,
Arina

> On May 6, 2019, at 9:14 PM, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> 
> Arina,
> 
> So far, we’ve kept these around to help troubleshoot format problems. It has been a fairly cheap way to be able to see exactly what happened to the table. But we’re also getting to the point where we no longer need to refer back to them and should think about adding a way to remove them. Technically, you don’t need to keep them around once you’ve committed the new version, but an easy way to roll back is to change the database pointer so it is nice to keep a few of them.
> 
> I think we can probably build a way to expire old metadata versions by looking for a naming pattern, like v(num)-(uuid).metadata.json[.gz]. Would you like to add an issue and maybe a PR for this?
> 
> rb
> 
> 
> On Sat, May 4, 2019 at 7:43 AM Arina Yelchiyeva <arina.yelchiyeva@gmail.com <ma...@gmail.com>> wrote:
> Hi all,
> 
> Iceberg table has expire snapshots notion, which helps to delete snapshots that are no longer needed along with data files, manifest and manifest lists:
> 
>         // clean up the expired snapshots: 
>         // 1. Get a list of the snapshots that were removed
>         // 2. Delete any data files that were deleted by those snapshots and are not in the table 
>         // 3. Delete any manifests that are no longer used by current snapshots
>         // 4. Delete the manifest lists
> 
> But we also have table metadata which is stored in JSON. New metadata version is created for each metadata change.
> I was assuming that with snapshot expiration operation, unneeded metadata files will also be deleted but they are not.
> 
> My concern is that having JSON file for each metadata change with time may consume lots of space (setting `iceberg.compress.metadata` to true can help but not for long).
> Is there an option to expire table metadata versions as well?
> 
> Kind regards,
> Arina
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix