You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Sam Redai <sa...@tabular.io> on 2022/05/06 18:00:27 UTC

Meeting Minutes from 05/04 Iceberg Sync

Hey Iceberg Community,

Here are the minutes and recording from our Iceberg Sync that took
place on *May
4th, 9am-10am PT*. Also, thank you Russell for sharing the Apache Con Call
For Papers <https://www.apachecon.com/acna2022/cfp.html> which is accepting
proposals until May 23rd for anyone who is interested in submitting a talk!

Always remember, anyone can join the discussion so feel free to share the
Iceberg-Sync <https://groups.google.com/g/iceberg-sync> google group with
anyone who is seeking an invite. All notes and agendas are posted in the live
doc
<https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit?usp=drive_web>
that's
also attached to the meeting invitation and it's a good place to add items
as you see fit so we can discuss them in the next community sync.
4 May 2022

Meeting Recording ⭕
<https://drive.google.com/file/d/1BfU7o9m87iWivYzmeukDtBi3v8HFdX7O/view>

Top of the Meeting Highlights

   -

   Added Z-order strategy to rewrite data files (Thanks, Russell!)
   -

   Added support for Flink 1.15, removed 1.12 (Thanks, Kyle!)
   -

   Added default value handling to the spec (Thanks, Raymond!)
   -

   Added RangeReadable interface for FileIO (Thanks, Dan!)
   -

   Added expression framework in Python (Thanks, Sam and Nick!)

Releases

   -

   0.13.2 patch release
   -

      This release seems overdue and should probably be released despite
      minor issues that have been blocking this, i.e. failing Flink tets
      -

      Eduard has volunteered to be the Release Manager for this release
      -

      #4687 <https://github.com/apache/iceberg/pull/4687> is one pending PR
      that should go out in this release
      -

   0.14.0 status update
   -

      Snapshot Expiration in the branching/tagging context: PR #4578
      <https://github.com/apache/iceberg/pull/4578>
      -

      PR for LICENSE updates

Agenda

   -

   Snapshot Expiration
   -

      Needs to go through the reachability code path which is only
      implemented in Spark today and is somewhat complex
      -

      The current proposal is to assume a linear history for tables and
      fail if unreferenced snapshots are found. (More strict than what
currently
      happens and this is only if the flag for cleaning up data files
      incrementally is on)
      -

      Should we eventually have an in-memory reference comparison set that
      enables a reachability analysis?
      -

         This would not work for very large tables with rather large
         metadata that won’t all fit into memory.
         -

         This would work for most cases and it’s a safe inference that very
         large-scale tables have engines available such as Spark or
Trino to perform
         snapshot expiration.
         -

   Remaining Branching and Tagging Work
   -

      Anyone interested in contributing here, reach out to Amogh or Ryan.
      Some examples of work to be done include:
      -

         Referencing branches/tags in engines
         -

         Committing directly to a branch/tag
         -

   Documentation Content Redesign (link
   <https://docs.google.com/document/d/1Y_PRv6p5oJaxg_68AUia_JHw8P4-AZIu3hP5IH2Cpsw/edit>
   )
   -

      New top-level sections (part of the common site as opposed to the
      version-based site)
      -

         Quick-start: fully runnable docker based quickstarts for every
         engine with an Iceberg integration
         -

         Concepts: “no-code” overviews of core Iceberg concepts such as
         catalogs, tables, fileio, etc
         -

      Provide a master “Configuration” page in the versioned docs site
      -

         This is a one-stop-shop for all configuration tables that include
         parameters, descriptions, and value types. Currently,
configuration tables
         are spread across multiple sections.
         -

      Format of “Docs” section pages
      -

         With configurations, quick-starts, and concepts being moved to
         dedicated sections, the version-based “Docs” section pages
will follow a
         typical format of:
         -

            Feature Name
            -

            Feature Description
            -

            Code Snippet
            -

   Reading change streams
   -

      General Draft - PR #4539 <https://github.com/apache/iceberg/pull/4539>
      -

      Adds metadata column IS_DELETED - PR #4683
      <https://github.com/apache/iceberg/pull/4683>
      -

      MVP is expected to be available in the next 1-2 months and will not
      include changes to the Java API
      -

   Idempotent Sort (design doc
   <https://docs.google.com/document/d/1rZUHljNsLn8JqsO5lYElst3F800T8OBnQ5fasUB-fp8/edit>
   )
   -

   Apache Con Call For Papers May 23rd!
   <https://www.apachecon.com/acna2022/cfp.html>


Thanks everyone!
-- 

Sam Redai <sa...@tabular.io>

Developer Advocate  |  Tabular <https://tabular.io/>