You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Ryan Blue <rb...@netflix.com.INVALID> on 2018/11/20 17:47:12 UTC

Re: Table commit latency seems to be proportional to the number of maintained snapshots

+dev

Hi Filip,

You’re right that commit times get worse as the number of snapshots
increases. We’ve been hitting this with tables that we are using to stress
the current implementation by doing things like committing every couple
minutes from multiple AWS regions. We see the same behavior and have a few
solutions:

   1. We have a service that expires snapshots in our tables after 3 days.
   It calls the same API that you’re using. We’ve also considered adding
   expiration to regular table commits, depending on table properties. That
   way tables would keep themselves cleaned up.
   2. We’ve added the ability to compress the metadata JSON files by
   setting iceberg.compress.metadata=true in the Hadoop Configuration passed
   to the Tables API.
   3. We are working on separating the manifest list out of the metadata
   JSON file and adding extra information for each manifest to help speed up
   operations.

That last one is the most important thing we want to change in the spec
right now because we think it will make a lot of operations faster.

rb

On Tue, Nov 20, 2018 at 7:00 AM Filip Bocse <fi...@gmail.com> wrote:

> Hi folks,
>
> It seems that table commit latency increases as the number of snapshots
> maintained in the metadata file goes up. How do we go about mitigating this
> aspect?
> Is snapshot expiration one way to recover table operation increasing
> latency?
>
>     ExpireSnapshots expireSnapshots = table.expireSnapshots();
>     Instant tenMinutesAgo = Instant.now().minus(10, ChronoUnit.DAYS); //
> back in UTC time 10 days ago
>     expireSnapshots.expireOlderThan(tenMinutesAgo.toEpochMilli()); //
> expire snapshots
>     expireSnapshots.commit();
>
> Are there other ways of containing the table operations latency?
>
> Anybody else here interested or involved with doing capacity testing or
> performance testing?
> I'm looking for more details on current challenges and pain points.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Iceberg Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to iceberg-devel+unsubscribe@googlegroups.com.
> To post to this group, send email to iceberg-devel@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iceberg-devel/471c11aa-391b-4bcc-ab67-341b31f45982%40googlegroups.com
> <https://groups.google.com/d/msgid/iceberg-devel/471c11aa-391b-4bcc-ab67-341b31f45982%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Ryan Blue
Software Engineer
Netflix