You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by OpenInx <op...@gmail.com> on 2021/04/02 09:49:02 UTC

Re: When is the next release of Iceberg ?

Hi Himanshu

If you want to try the flink + iceberg fo syncing mysql binlog to iceberg
table,  you might be interested in those PRs:

1. https://github.com/apache/iceberg/pull/2410
2. https://github.com/apache/iceberg/pull/2303

On Wed, Mar 24, 2021 at 10:34 AM OpenInx <op...@gmail.com> wrote:

> Hi Himanshu
>
> Thanks for the email,  currently we flink+iceberg support writing CDC
> events into apache iceberg table by flink datastream API, besides the
> spark/presto/hive could read those events in batch job.
>
> But there are still some issues that we do not finish yet:
>
> 1.  Expose the iceberg v2 to end users.  The row-level delete feature is
> actually built on the iceberg format v2,  there are still some blockers
> that we need to fix (pls see the document
> https://docs.google.com/document/d/1FyLJyvzcZbfbjwDMEZd6Dj-LYCfrzK1zC-Bkb3OiICc/edit),
> we iceberg team will need some resources to resolve them.
> 2.  As we know the CDC events depend on iceberg primary key
> identification  (Then we could define mysql_cdc sql table by using primary
> key cause) I saw Jack Ye has published a PR to this
> https://github.com/apache/iceberg/pull/2354,  I will review it today.
> 3.  The CDC writers will produce many small files inevitably as the
> periodic checkpoints go on,  so for the real production env we must provide
> the ability to rewrite small files into larger files ( compaction action)
> .  There are few PRs needing to be reviewing:
>        a.  https://github.com/apache/iceberg/pull/2303/files
>        b.  https://github.com/apache/iceberg/pull/2294
>        c.  https://github.com/apache/iceberg/pull/2216
>
> I think it's better to resolve all those issues before we put the
> production data into iceberg ( syncing mysql binlog via debezium).  I saw
> the last sync notes saying  the next release 0.12.0 would be released in
> end of this month ideally (
> https://lists.apache.org/x/thread.html/rdb7d1ab221295adec33cf93dcbcac2b9b7b80708b2efd903b7105511@%3Cdev.iceberg.apache.org%3E)
> ,  I think that  that deadline is too tight.  In my mind,  if the release
> 0.12.0 won't expose the format v2 to end users, then what are the core
> features that we want to release ?  If the features that we plan to release
> are not major ones,  then how about releasing the 0.11.2 ?
>
> According to my understanding of the needs of community users, the vast
> majority of iceberg users have high expectations for format v2. I think we
> may need to raise the v2 exposure to a higher priority so that our users
> can do the whole PoC tests earlier.
>
>
>
> On Wed, Mar 24, 2021 at 3:49 AM Himanshu Rathore
> <hi...@zomato.com.invalid> wrote:
>
>> We are planning for use Flink + Iceberg for syncing mysql binlog's via
>> debezium and its seams of things are dependent on next release.
>>
>