You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Piotr Findeisen <pi...@starburstdata.com> on 2022/03/16 10:04:02 UTC

Iceberg NDV stats

Hi,

We at Starburst are looking into adding number distinct values (NDV)
statistics to Iceberg tables, to let e.g. the Trino cost-based query
optimizer produce better plans when working with Iceberg tables.

The initial approach is for table-level statistics, and may be improved in
the future.
I would appreciate feedback on the design doc
https://docs.google.com/document/d/1we0BuQbbdqiJS2eUFC_-6TPSuO57GXivzKmcTzApivY


This stats topic is related to Secondary Indexes, but we need slightly
different terminology and mechanics for both. For example, indexes need to
be exact, and properly invalidated. Statistics may be outdated and still
useful, so these two things need to be coherent but separate.

Best
PF