You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Danny Chan <da...@apache.org> on 2021/11/19 08:11:59 UTC

[DISCUSS] Hudi 0.10.0 Release

Hi Community,

As we draw close to doing Hudi 0.10.0 release, I am happy to share a
summary of the key features/improvements that would be going in the release
and the current blockers for everyone's visibility.

*Highlights*

   - [HUDI-1290] Implement Debezium avro source for Delta Streamer
   - [HUDI-1491] Support partition pruning for MOR snapshot query
   - [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering value
   when records within multiple log files are merged
   - [HUDI-1827] Add ORC support in Bootstrap Op
   - [HUDI-1869] Upgrading Spark3 To 3.1
   - [HUDI-2101] support z-order for hudi
   - [HUDI-2276] Enable Metadata Table by default for both writers and
   readers
   - [HUDI-2581] Analyze metadata size estimate in hudi with Hfile for col
   stats partition
   - [HUDI-2634] Improve bootstrap performance for very large tables
   - [HUDI-2086] redo the logical of mor_incremental_view for hive
   - [HUDI-2191] Bump flink version to 1.13.1
   - [HUDI-2285] Metadata Table Synchronous Design
   - [HUDI-2316] Support Flink batch upsert
   - [HUDI-2371] Improve flink streaming reader
   - [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect for
   immutable data
   - [HUDI-2449] Incremental read for Flink
   - [HUDI-2562] Embedded timeline server on JobManager

*Current Blockers*

   - [HUDI-1856] Upstream changes made in PrestoDB to eliminate file
   listing to Trino (Owner: Sagar Sumit)
   - [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all Hudi
   tables (Owner: Sagar Sumit)
   - [HUDI-1932] Hive Sync should not always update last_commit_time_sync
   (Owner: Raymond Xu)
   - [HUDI-1937] When clustering fail, generating unfinished replacecommit
   timeline. (Owner: Sagar Sumit)
   - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar Sumit)
   - [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning Ding)
   - [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
   (Owner: Rajesh Mahindra)
   - [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
   Connect (Owner: Ethan Guo)
   - [HUDI-2362] Hudi external configuration file support (Owner: Wenning
   Ding)
   - [HUDI-2409] Using HBase shaded jars in Hudi presto bundle (Owner:
   Sagar Sumit)
   - [HUDI-2443] KVComparator in HFile for metadata table is tied to HBase
   version and shading (Owner: Sagar Sumit)
   - [HUDI-2472] Tests failure follow up when metadata is enabled by
   default (Owner: Manoj Govindassamy)
   - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
   metadata (Owner: Manoj Govindassamy)
   - [HUDI-2478] Handle failure mid-way during init buckets (Owner: Vinoth
   Chandar)
   - [HUDI-2480] FileSlice after pending compaction-requested instant-time
   is ignored by MOR snapshot reader (Owner: Danny Chen)
   - [HUDI-2488] Support bootstrapping a single or more partitions in
   metadata table while regular writers and table services are in progress
   (Owner: Vinoth Chandar)
   - [HUDI-2527] Flaky test:
   TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
   (Owner: sivabalan narayanan)
   - [HUDI-2559] Ensure unique timestamps are generated for commit times
   with concurrent writers (Owner: sivabalan narayanan)
   - [HUDI-2593] Virtual keys support for metadata table (Owner: Manoj
   Govindassamy)
   - [HUDI-2599] [Performance] Lower parallelism with snapshot query on COW
   tables in Presto (Owner: Sagar Sumit)
   - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
   - [HUDI-2636] Make release notes discoverable (Owner: Kyle Weller)
   - [HUDI-2637] Triage all bugs around Multi-writer and certify the tested
   flows (Owner: sivabalan narayanan)
   - [HUDI-2641] One inflight commit rolling back other concurrent inflight
   commits causing them to fail (Owner: Udit Mehrotra)
   - [HUDI-2649] Kick off all the Hive query issues for 0.10.0 (Owner:
   Sagar Sumit)
   - [HUDI-2666] async compaction failing with timeline mismatches between
   server and client when metadata is enabled (Owner: Manoj Govindassamy)
   - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to partitions in
   AbstractTablefileSystemView (Owner: Sagar Sumit)
   - [HUDI-2671] Fix record offset handling in Kafka connect transaction
   participant (Owner: Rajesh Mahindra)
   - [HUDI-2672] Avoid empty commits and rollbacks when there is no event
   from the topic (Owner: Rajesh Mahindra)
   - [HUDI-2716] Fix InLineFS path conversions for S3FS paths (Owner: Manoj
   Govindassamy)
   - [HUDI-2725] Add precommit validators doc (Owner: Kyle Weller)
   - [HUDI-2731] Clustering should work regardless of whether there are
   base files (Owner: Sagar Sumit)
   - [HUDI-2734] Disable metadata by default for flink and java (Owner:
   sivabalan narayanan)
   - [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
   (Owner: Ethan Guo)
   - [HUDI-2737] Use earliest instant by default for compaction and
   clustering job (Owner: Ethan Guo)
   - [HUDI-2741] Validate metadata config for all readers (Owner: Sagar
   Sumit)
   - [HUDI-2745] Record count does not match input after compaction is
   scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
   - [HUDI-2762] Ensure hive can query insert only logs in MOR (Owner: agar
   Sumit)
   - [HUDI-2763] Avoid persisting redundant key field in the Metadata table
   record payload (Owner: Manoj Govindassamy)
   - [HUDI-2764] Address test failures after enabling virtual keys support
   for the metadata table (Owner: Manoj Govindassamy)
   - [HUDI-2766] Enable marker based rollback by default (Owner: sivabalan
   narayanan)
   - [HUDI-2767] Enable timeline server based marker type as default
   (Owner: sivabalan narayanan)
   - [HUDI-2770] Update docs for HoodieCompactor (compaction) and
   HoodieClusteringJob (clustering) (Owner: Kyle Weller)


Please respond to the thread if you think that I have missed capturing any
of the highlights or blockers for Hudi 0.10.0 release. For the owners of
these release blockers, can you please provide a specific timeline you are
willing to commit to for finishing these so we can cut an RC ?

Thanks,
Danny

Re: [DISCUSS] Hudi 0.10.0 Release

Posted by Danny Chan <da...@apache.org>.
Hi Community,

Glad to see that all the blockers are resolved and we can cut a RC now !

If you have any other blockers that you would like to
surface for Hudi 0.10.0, feel free to reach out.

Thanks,
Danny

Manoj Govindassamy <ma...@gmail.com> 于2021年11月27日周六 下午3:44写道:

> Hi Danny,
>
> All the planned tickets have landed in master and we are good for cutting
> 0.10 RC. Please let us know if you see any CI issues with the latest master
> and we can jump in to do the needful. Thanks for your patience.
>
> thanks,
> Manoj
>
>
>
>
> On Fri, Nov 26, 2021 at 8:07 PM Manoj Govindassamy <
> manoj.govindassamy@gmail.com> wrote:
>
> > Hi Danny,
> >
> > We have one last PR https://github.com/apache/hudi/pull/4114 to land to
> > master. We are noticing one test flakiness with this last pending PR. The
> > same test is consistently passing in the local setup though. We are
> waiting
> > for the CI to finish before the merge to master. After this PR we are
> good
> > for cutting the 0.10 RC. Will keep you posted on the status.
> >
> > thanks,
> > Manoj
> >
> >
> >
> >
> > On Sat, Nov 20, 2021 at 2:10 PM Raymond Xu <xu...@gmail.com>
> > wrote:
> >
> >> Hi Danny, I'm good with the timeline.
> >>
> >> Cheers,
> >> Raymond
> >>
> >> On Fri, Nov 19, 2021 at 7:34 PM sagar sumit <sa...@gmail.com>
> >> wrote:
> >>
> >> > Hi Danny,
> >> >
> >> > I've added one more blocker: HUDI-2742
> >> > <https://issues.apache.org/jira/browse/HUDI-2742>
> >> > I am also good with the timelines.
> >> >
> >> > Regards,
> >> > Sagar
> >> >
> >> > On Sat, Nov 20, 2021 at 8:14 AM Sivabalan <n....@gmail.com> wrote:
> >> >
> >> > > Hi Danny,
> >> > >      I am good with the timelines. All my jiras should be completed
> by
> >> > > then.
> >> > >
> >> > >
> >> > > On Fri, Nov 19, 2021 at 8:41 PM Y Ethan Guo <
> ethan.guoyihua@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi Danny,
> >> > > >
> >> > > > Thanks for summarizing the current progress towards the 0.10.0
> >> release.
> >> > > > I'm good with Nov 26th cutoff.
> >> > > >
> >> > > > Regarding my blockers:
> >> > > > - [HUDI-2332] Implement scheduling of compaction/ clustering for
> >> Kafka
> >> > > >    Connect (Owner: Ethan Guo)
> >> > > > PR is up.  I'm addressing comments.
> >> > > >
> >> > > > - [HUDI-2737] Use earliest instant by default for compaction and
> >> > > >    clustering job (Owner: Ethan Guo)
> >> > > > PR is up and approved.  It's near-landing after fixing CI
> failures.
> >> > > >
> >> > > > - [HUDI-2745] Record count does not match input after compaction
> is
> >> > > >    scheduled when running Hudi Kafka Connect sink (Owner: Ethan
> Guo)
> >> > > > HUDI-2745 is going to be blocked on HUDI-2480, which is going to
> >> > resolve
> >> > > > this issue once done.
> >> > > >
> >> > > > - [HUDI-2735] Fix archival of commits in Java client for Kafka
> >> Connect
> >> > > >    (Owner: Ethan Guo)
> >> > > > This is pending and requires investigation into the archival logic
> >> > which
> >> > > is
> >> > > > not Kafka-connect specific.
> >> > > >
> >> > > > Best,
> >> > > > - Ethan
> >> > > >
> >> > > >
> >> > > > On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra <
> >> rmahindra@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Hi Danny,
> >> > > > >
> >> > > > > I have the following blockers that have a PR up. I am working on
> >> a PR
> >> > > for
> >> > > > > the Debezium Source. I am fine with Nov 26th as cut off.
> >> > > > >
> >> > > > >    - [HUDI-2325] Implement and test Hive Sync support for Kafka
> >> > Connect
> >> > > > >    (Owner: Rajesh Mahindra)
> >> > > > >    - [HUDI-2671] Fix record offset handling in Kafka connect
> >> > > transaction
> >> > > > >    participant (Owner: Rajesh Mahindra)
> >> > > > >    - [HUDI-2672] Avoid empty commits and rollbacks when there is
> >> no
> >> > > event
> >> > > > >    from the topic (Owner: Rajesh Mahindra)
> >> > > > >
> >> > > > > ** Pending
> >> > > > >    - [HUDI-1290] Implement Debezium avro source for Delta
> Streamer
> >> > > > >
> >> > > > > Thanks
> >> > > > > Rajesh
> >> > > > >
> >> > > > >
> >> > > > > On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra <
> uditme@apache.org>
> >> > > wrote:
> >> > > > >
> >> > > > > > Hi Danny,
> >> > > > > >
> >> > > > > > I have a blocker as well
> >> > > > > > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut
> >> off
> >> > > date
> >> > > > > > works fine for me.
> >> > > > > >
> >> > > > > > Also, just an update on the above list: HUDI-2641, HUDI-2314,
> >> > > > > > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be
> >> > marked
> >> > > > > > in the highlights section. We will work on getting some doc
> >> updates
> >> > > > > > for the same by next week.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Udit
> >> > > > > >
> >> > > > > > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar <
> >> vinoth@apache.org>
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > Hi Danny,
> >> > > > > > >
> >> > > > > > > I have one blocker. I plan to complete it by end of next
> >> week. I
> >> > am
> >> > > > > good
> >> > > > > > > with the prior Nov 26 cutoff.
> >> > > > > > > Does that work for everyone?
> >> > > > > > >
> >> > > > > > > Thanks
> >> > > > > > > Vinoth
> >> > > > > > >
> >> > > > > > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <
> >> > danny0405@apache.org>
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi Community,
> >> > > > > > > >
> >> > > > > > > > As we draw close to doing Hudi 0.10.0 release, I am happy
> to
> >> > > share
> >> > > > a
> >> > > > > > > > summary of the key features/improvements that would be
> >> going in
> >> > > the
> >> > > > > > release
> >> > > > > > > > and the current blockers for everyone's visibility.
> >> > > > > > > >
> >> > > > > > > > *Highlights*
> >> > > > > > > >
> >> > > > > > > >    - [HUDI-1290] Implement Debezium avro source for Delta
> >> > > Streamer
> >> > > > > > > >    - [HUDI-1491] Support partition pruning for MOR
> snapshot
> >> > query
> >> > > > > > > >    - [HUDI-1763] DefaultHoodieRecordPayload does not honor
> >> > > ordering
> >> > > > > > value
> >> > > > > > > >    when records within multiple log files are merged
> >> > > > > > > >    - [HUDI-1827] Add ORC support in Bootstrap Op
> >> > > > > > > >    - [HUDI-1869] Upgrading Spark3 To 3.1
> >> > > > > > > >    - [HUDI-2101] support z-order for hudi
> >> > > > > > > >    - [HUDI-2276] Enable Metadata Table by default for both
> >> > > writers
> >> > > > > and
> >> > > > > > > >    readers
> >> > > > > > > >    - [HUDI-2581] Analyze metadata size estimate in hudi
> with
> >> > > Hfile
> >> > > > > for
> >> > > > > > col
> >> > > > > > > >    stats partition
> >> > > > > > > >    - [HUDI-2634] Improve bootstrap performance for very
> >> large
> >> > > > tables
> >> > > > > > > >    - [HUDI-2086] redo the logical of mor_incremental_view
> >> for
> >> > > hive
> >> > > > > > > >    - [HUDI-2191] Bump flink version to 1.13.1
> >> > > > > > > >    - [HUDI-2285] Metadata Table Synchronous Design
> >> > > > > > > >    - [HUDI-2316] Support Flink batch upsert
> >> > > > > > > >    - [HUDI-2371] Improve flink streaming reader
> >> > > > > > > >    - [HUDI-2394] [Kafka Connect Mileston 1] Implement
> kafka
> >> > > connect
> >> > > > > for
> >> > > > > > > >    immutable data
> >> > > > > > > >    - [HUDI-2449] Incremental read for Flink
> >> > > > > > > >    - [HUDI-2562] Embedded timeline server on JobManager
> >> > > > > > > >
> >> > > > > > > > *Current Blockers*
> >> > > > > > > >
> >> > > > > > > >    - [HUDI-1856] Upstream changes made in PrestoDB to
> >> eliminate
> >> > > > file
> >> > > > > > > >    listing to Trino (Owner: Sagar Sumit)
> >> > > > > > > >    - [HUDI-1912] Presto defaults to
> GenericHiveRecordCursor
> >> for
> >> > > all
> >> > > > > > Hudi
> >> > > > > > > >    tables (Owner: Sagar Sumit)
> >> > > > > > > >    - [HUDI-1932] Hive Sync should not always update
> >> > > > > > last_commit_time_sync
> >> > > > > > > >    (Owner: Raymond Xu)
> >> > > > > > > >    - [HUDI-1937] When clustering fail, generating
> unfinished
> >> > > > > > replacecommit
> >> > > > > > > >    timeline. (Owner: Sagar Sumit)
> >> > > > > > > >    - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer
> (Owner:
> >> > > Sagar
> >> > > > > > Sumit)
> >> > > > > > > >    - [HUDI-2314] Add DynamoDb based lock provider (Owner:
> >> > Wenning
> >> > > > > Ding)
> >> > > > > > > >    - [HUDI-2325] Implement and test Hive Sync support for
> >> Kafka
> >> > > > > Connect
> >> > > > > > > >    (Owner: Rajesh Mahindra)
> >> > > > > > > >    - [HUDI-2332] Implement scheduling of compaction/
> >> clustering
> >> > > for
> >> > > > > > Kafka
> >> > > > > > > >    Connect (Owner: Ethan Guo)
> >> > > > > > > >    - [HUDI-2362] Hudi external configuration file support
> >> > (Owner:
> >> > > > > > Wenning
> >> > > > > > > >    Ding)
> >> > > > > > > >    - [HUDI-2409] Using HBase shaded jars in Hudi presto
> >> bundle
> >> > > > > (Owner:
> >> > > > > > > >    Sagar Sumit)
> >> > > > > > > >    - [HUDI-2443] KVComparator in HFile for metadata table
> is
> >> > tied
> >> > > > to
> >> > > > > > HBase
> >> > > > > > > >    version and shading (Owner: Sagar Sumit)
> >> > > > > > > >    - [HUDI-2472] Tests failure follow up when metadata is
> >> > enabled
> >> > > > by
> >> > > > > > > >    default (Owner: Manoj Govindassamy)
> >> > > > > > > >    - [HUDI-2475] Rolling Upgrade downgrade story for 0.10
> &
> >> > > > enabling
> >> > > > > > > >    metadata (Owner: Manoj Govindassamy)
> >> > > > > > > >    - [HUDI-2478] Handle failure mid-way during init
> buckets
> >> > > (Owner:
> >> > > > > > Vinoth
> >> > > > > > > >    Chandar)
> >> > > > > > > >    - [HUDI-2480] FileSlice after pending
> >> compaction-requested
> >> > > > > > instant-time
> >> > > > > > > >    is ignored by MOR snapshot reader (Owner: Danny Chen)
> >> > > > > > > >    - [HUDI-2488] Support bootstrapping a single or more
> >> > > partitions
> >> > > > in
> >> > > > > > > >    metadata table while regular writers and table services
> >> are
> >> > in
> >> > > > > > progress
> >> > > > > > > >    (Owner: Vinoth Chandar)
> >> > > > > > > >    - [HUDI-2527] Flaky test:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
> >> > > > > > > >    (Owner: sivabalan narayanan)
> >> > > > > > > >    - [HUDI-2559] Ensure unique timestamps are generated
> for
> >> > > commit
> >> > > > > > times
> >> > > > > > > >    with concurrent writers (Owner: sivabalan narayanan)
> >> > > > > > > >    - [HUDI-2593] Virtual keys support for metadata table
> >> > (Owner:
> >> > > > > Manoj
> >> > > > > > > >    Govindassamy)
> >> > > > > > > >    - [HUDI-2599] [Performance] Lower parallelism with
> >> snapshot
> >> > > > query
> >> > > > > > on COW
> >> > > > > > > >    tables in Presto (Owner: Sagar Sumit)
> >> > > > > > > >    - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
> >> > > > > > > >    - [HUDI-2636] Make release notes discoverable (Owner:
> >> Kyle
> >> > > > Weller)
> >> > > > > > > >    - [HUDI-2637] Triage all bugs around Multi-writer and
> >> > certify
> >> > > > the
> >> > > > > > tested
> >> > > > > > > >    flows (Owner: sivabalan narayanan)
> >> > > > > > > >    - [HUDI-2641] One inflight commit rolling back other
> >> > > concurrent
> >> > > > > > inflight
> >> > > > > > > >    commits causing them to fail (Owner: Udit Mehrotra)
> >> > > > > > > >    - [HUDI-2649] Kick off all the Hive query issues for
> >> 0.10.0
> >> > > > > (Owner:
> >> > > > > > > >    Sagar Sumit)
> >> > > > > > > >    - [HUDI-2666] async compaction failing with timeline
> >> > > mismatches
> >> > > > > > between
> >> > > > > > > >    server and client when metadata is enabled (Owner:
> Manoj
> >> > > > > > Govindassamy)
> >> > > > > > > >    - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to
> >> > > > partitions
> >> > > > > > in
> >> > > > > > > >    AbstractTablefileSystemView (Owner: Sagar Sumit)
> >> > > > > > > >    - [HUDI-2671] Fix record offset handling in Kafka
> connect
> >> > > > > > transaction
> >> > > > > > > >    participant (Owner: Rajesh Mahindra)
> >> > > > > > > >    - [HUDI-2672] Avoid empty commits and rollbacks when
> >> there
> >> > is
> >> > > no
> >> > > > > > event
> >> > > > > > > >    from the topic (Owner: Rajesh Mahindra)
> >> > > > > > > >    - [HUDI-2716] Fix InLineFS path conversions for S3FS
> >> paths
> >> > > > (Owner:
> >> > > > > > Manoj
> >> > > > > > > >    Govindassamy)
> >> > > > > > > >    - [HUDI-2725] Add precommit validators doc (Owner: Kyle
> >> > > Weller)
> >> > > > > > > >    - [HUDI-2731] Clustering should work regardless of
> >> whether
> >> > > there
> >> > > > > are
> >> > > > > > > >    base files (Owner: Sagar Sumit)
> >> > > > > > > >    - [HUDI-2734] Disable metadata by default for flink and
> >> java
> >> > > > > (Owner:
> >> > > > > > > >    sivabalan narayanan)
> >> > > > > > > >    - [HUDI-2735] Fix archival of commits in Java client
> for
> >> > Kafka
> >> > > > > > Connect
> >> > > > > > > >    (Owner: Ethan Guo)
> >> > > > > > > >    - [HUDI-2737] Use earliest instant by default for
> >> compaction
> >> > > and
> >> > > > > > > >    clustering job (Owner: Ethan Guo)
> >> > > > > > > >    - [HUDI-2741] Validate metadata config for all readers
> >> > (Owner:
> >> > > > > Sagar
> >> > > > > > > >    Sumit)
> >> > > > > > > >    - [HUDI-2745] Record count does not match input after
> >> > > compaction
> >> > > > > is
> >> > > > > > > >    scheduled when running Hudi Kafka Connect sink (Owner:
> >> Ethan
> >> > > > Guo)
> >> > > > > > > >    - [HUDI-2762] Ensure hive can query insert only logs in
> >> MOR
> >> > > > > (Owner:
> >> > > > > > agar
> >> > > > > > > >    Sumit)
> >> > > > > > > >    - [HUDI-2763] Avoid persisting redundant key field in
> the
> >> > > > Metadata
> >> > > > > > table
> >> > > > > > > >    record payload (Owner: Manoj Govindassamy)
> >> > > > > > > >    - [HUDI-2764] Address test failures after enabling
> >> virtual
> >> > > keys
> >> > > > > > support
> >> > > > > > > >    for the metadata table (Owner: Manoj Govindassamy)
> >> > > > > > > >    - [HUDI-2766] Enable marker based rollback by default
> >> > (Owner:
> >> > > > > > sivabalan
> >> > > > > > > >    narayanan)
> >> > > > > > > >    - [HUDI-2767] Enable timeline server based marker type
> as
> >> > > > default
> >> > > > > > > >    (Owner: sivabalan narayanan)
> >> > > > > > > >    - [HUDI-2770] Update docs for HoodieCompactor
> >> (compaction)
> >> > and
> >> > > > > > > >    HoodieClusteringJob (clustering) (Owner: Kyle Weller)
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Please respond to the thread if you think that I have
> missed
> >> > > > > capturing
> >> > > > > > any
> >> > > > > > > > of the highlights or blockers for Hudi 0.10.0 release. For
> >> the
> >> > > > owners
> >> > > > > > of
> >> > > > > > > > these release blockers, can you please provide a specific
> >> > > timeline
> >> > > > > you
> >> > > > > > are
> >> > > > > > > > willing to commit to for finishing these so we can cut an
> >> RC ?
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > > Danny
> >> > > > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Take Care,
> >> > > > > Rajesh Mahindra
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Regards,
> >> > > -Sivabalan
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] Hudi 0.10.0 Release

Posted by Manoj Govindassamy <ma...@gmail.com>.
Hi Danny,

All the planned tickets have landed in master and we are good for cutting
0.10 RC. Please let us know if you see any CI issues with the latest master
and we can jump in to do the needful. Thanks for your patience.

thanks,
Manoj




On Fri, Nov 26, 2021 at 8:07 PM Manoj Govindassamy <
manoj.govindassamy@gmail.com> wrote:

> Hi Danny,
>
> We have one last PR https://github.com/apache/hudi/pull/4114 to land to
> master. We are noticing one test flakiness with this last pending PR. The
> same test is consistently passing in the local setup though. We are waiting
> for the CI to finish before the merge to master. After this PR we are good
> for cutting the 0.10 RC. Will keep you posted on the status.
>
> thanks,
> Manoj
>
>
>
>
> On Sat, Nov 20, 2021 at 2:10 PM Raymond Xu <xu...@gmail.com>
> wrote:
>
>> Hi Danny, I'm good with the timeline.
>>
>> Cheers,
>> Raymond
>>
>> On Fri, Nov 19, 2021 at 7:34 PM sagar sumit <sa...@gmail.com>
>> wrote:
>>
>> > Hi Danny,
>> >
>> > I've added one more blocker: HUDI-2742
>> > <https://issues.apache.org/jira/browse/HUDI-2742>
>> > I am also good with the timelines.
>> >
>> > Regards,
>> > Sagar
>> >
>> > On Sat, Nov 20, 2021 at 8:14 AM Sivabalan <n....@gmail.com> wrote:
>> >
>> > > Hi Danny,
>> > >      I am good with the timelines. All my jiras should be completed by
>> > > then.
>> > >
>> > >
>> > > On Fri, Nov 19, 2021 at 8:41 PM Y Ethan Guo <ethan.guoyihua@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > Hi Danny,
>> > > >
>> > > > Thanks for summarizing the current progress towards the 0.10.0
>> release.
>> > > > I'm good with Nov 26th cutoff.
>> > > >
>> > > > Regarding my blockers:
>> > > > - [HUDI-2332] Implement scheduling of compaction/ clustering for
>> Kafka
>> > > >    Connect (Owner: Ethan Guo)
>> > > > PR is up.  I'm addressing comments.
>> > > >
>> > > > - [HUDI-2737] Use earliest instant by default for compaction and
>> > > >    clustering job (Owner: Ethan Guo)
>> > > > PR is up and approved.  It's near-landing after fixing CI failures.
>> > > >
>> > > > - [HUDI-2745] Record count does not match input after compaction is
>> > > >    scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
>> > > > HUDI-2745 is going to be blocked on HUDI-2480, which is going to
>> > resolve
>> > > > this issue once done.
>> > > >
>> > > > - [HUDI-2735] Fix archival of commits in Java client for Kafka
>> Connect
>> > > >    (Owner: Ethan Guo)
>> > > > This is pending and requires investigation into the archival logic
>> > which
>> > > is
>> > > > not Kafka-connect specific.
>> > > >
>> > > > Best,
>> > > > - Ethan
>> > > >
>> > > >
>> > > > On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra <
>> rmahindra@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi Danny,
>> > > > >
>> > > > > I have the following blockers that have a PR up. I am working on
>> a PR
>> > > for
>> > > > > the Debezium Source. I am fine with Nov 26th as cut off.
>> > > > >
>> > > > >    - [HUDI-2325] Implement and test Hive Sync support for Kafka
>> > Connect
>> > > > >    (Owner: Rajesh Mahindra)
>> > > > >    - [HUDI-2671] Fix record offset handling in Kafka connect
>> > > transaction
>> > > > >    participant (Owner: Rajesh Mahindra)
>> > > > >    - [HUDI-2672] Avoid empty commits and rollbacks when there is
>> no
>> > > event
>> > > > >    from the topic (Owner: Rajesh Mahindra)
>> > > > >
>> > > > > ** Pending
>> > > > >    - [HUDI-1290] Implement Debezium avro source for Delta Streamer
>> > > > >
>> > > > > Thanks
>> > > > > Rajesh
>> > > > >
>> > > > >
>> > > > > On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra <ud...@apache.org>
>> > > wrote:
>> > > > >
>> > > > > > Hi Danny,
>> > > > > >
>> > > > > > I have a blocker as well
>> > > > > > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut
>> off
>> > > date
>> > > > > > works fine for me.
>> > > > > >
>> > > > > > Also, just an update on the above list: HUDI-2641, HUDI-2314,
>> > > > > > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be
>> > marked
>> > > > > > in the highlights section. We will work on getting some doc
>> updates
>> > > > > > for the same by next week.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Udit
>> > > > > >
>> > > > > > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar <
>> vinoth@apache.org>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > Hi Danny,
>> > > > > > >
>> > > > > > > I have one blocker. I plan to complete it by end of next
>> week. I
>> > am
>> > > > > good
>> > > > > > > with the prior Nov 26 cutoff.
>> > > > > > > Does that work for everyone?
>> > > > > > >
>> > > > > > > Thanks
>> > > > > > > Vinoth
>> > > > > > >
>> > > > > > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <
>> > danny0405@apache.org>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Community,
>> > > > > > > >
>> > > > > > > > As we draw close to doing Hudi 0.10.0 release, I am happy to
>> > > share
>> > > > a
>> > > > > > > > summary of the key features/improvements that would be
>> going in
>> > > the
>> > > > > > release
>> > > > > > > > and the current blockers for everyone's visibility.
>> > > > > > > >
>> > > > > > > > *Highlights*
>> > > > > > > >
>> > > > > > > >    - [HUDI-1290] Implement Debezium avro source for Delta
>> > > Streamer
>> > > > > > > >    - [HUDI-1491] Support partition pruning for MOR snapshot
>> > query
>> > > > > > > >    - [HUDI-1763] DefaultHoodieRecordPayload does not honor
>> > > ordering
>> > > > > > value
>> > > > > > > >    when records within multiple log files are merged
>> > > > > > > >    - [HUDI-1827] Add ORC support in Bootstrap Op
>> > > > > > > >    - [HUDI-1869] Upgrading Spark3 To 3.1
>> > > > > > > >    - [HUDI-2101] support z-order for hudi
>> > > > > > > >    - [HUDI-2276] Enable Metadata Table by default for both
>> > > writers
>> > > > > and
>> > > > > > > >    readers
>> > > > > > > >    - [HUDI-2581] Analyze metadata size estimate in hudi with
>> > > Hfile
>> > > > > for
>> > > > > > col
>> > > > > > > >    stats partition
>> > > > > > > >    - [HUDI-2634] Improve bootstrap performance for very
>> large
>> > > > tables
>> > > > > > > >    - [HUDI-2086] redo the logical of mor_incremental_view
>> for
>> > > hive
>> > > > > > > >    - [HUDI-2191] Bump flink version to 1.13.1
>> > > > > > > >    - [HUDI-2285] Metadata Table Synchronous Design
>> > > > > > > >    - [HUDI-2316] Support Flink batch upsert
>> > > > > > > >    - [HUDI-2371] Improve flink streaming reader
>> > > > > > > >    - [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka
>> > > connect
>> > > > > for
>> > > > > > > >    immutable data
>> > > > > > > >    - [HUDI-2449] Incremental read for Flink
>> > > > > > > >    - [HUDI-2562] Embedded timeline server on JobManager
>> > > > > > > >
>> > > > > > > > *Current Blockers*
>> > > > > > > >
>> > > > > > > >    - [HUDI-1856] Upstream changes made in PrestoDB to
>> eliminate
>> > > > file
>> > > > > > > >    listing to Trino (Owner: Sagar Sumit)
>> > > > > > > >    - [HUDI-1912] Presto defaults to GenericHiveRecordCursor
>> for
>> > > all
>> > > > > > Hudi
>> > > > > > > >    tables (Owner: Sagar Sumit)
>> > > > > > > >    - [HUDI-1932] Hive Sync should not always update
>> > > > > > last_commit_time_sync
>> > > > > > > >    (Owner: Raymond Xu)
>> > > > > > > >    - [HUDI-1937] When clustering fail, generating unfinished
>> > > > > > replacecommit
>> > > > > > > >    timeline. (Owner: Sagar Sumit)
>> > > > > > > >    - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner:
>> > > Sagar
>> > > > > > Sumit)
>> > > > > > > >    - [HUDI-2314] Add DynamoDb based lock provider (Owner:
>> > Wenning
>> > > > > Ding)
>> > > > > > > >    - [HUDI-2325] Implement and test Hive Sync support for
>> Kafka
>> > > > > Connect
>> > > > > > > >    (Owner: Rajesh Mahindra)
>> > > > > > > >    - [HUDI-2332] Implement scheduling of compaction/
>> clustering
>> > > for
>> > > > > > Kafka
>> > > > > > > >    Connect (Owner: Ethan Guo)
>> > > > > > > >    - [HUDI-2362] Hudi external configuration file support
>> > (Owner:
>> > > > > > Wenning
>> > > > > > > >    Ding)
>> > > > > > > >    - [HUDI-2409] Using HBase shaded jars in Hudi presto
>> bundle
>> > > > > (Owner:
>> > > > > > > >    Sagar Sumit)
>> > > > > > > >    - [HUDI-2443] KVComparator in HFile for metadata table is
>> > tied
>> > > > to
>> > > > > > HBase
>> > > > > > > >    version and shading (Owner: Sagar Sumit)
>> > > > > > > >    - [HUDI-2472] Tests failure follow up when metadata is
>> > enabled
>> > > > by
>> > > > > > > >    default (Owner: Manoj Govindassamy)
>> > > > > > > >    - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 &
>> > > > enabling
>> > > > > > > >    metadata (Owner: Manoj Govindassamy)
>> > > > > > > >    - [HUDI-2478] Handle failure mid-way during init buckets
>> > > (Owner:
>> > > > > > Vinoth
>> > > > > > > >    Chandar)
>> > > > > > > >    - [HUDI-2480] FileSlice after pending
>> compaction-requested
>> > > > > > instant-time
>> > > > > > > >    is ignored by MOR snapshot reader (Owner: Danny Chen)
>> > > > > > > >    - [HUDI-2488] Support bootstrapping a single or more
>> > > partitions
>> > > > in
>> > > > > > > >    metadata table while regular writers and table services
>> are
>> > in
>> > > > > > progress
>> > > > > > > >    (Owner: Vinoth Chandar)
>> > > > > > > >    - [HUDI-2527] Flaky test:
>> > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
>> > > > > > > >    (Owner: sivabalan narayanan)
>> > > > > > > >    - [HUDI-2559] Ensure unique timestamps are generated for
>> > > commit
>> > > > > > times
>> > > > > > > >    with concurrent writers (Owner: sivabalan narayanan)
>> > > > > > > >    - [HUDI-2593] Virtual keys support for metadata table
>> > (Owner:
>> > > > > Manoj
>> > > > > > > >    Govindassamy)
>> > > > > > > >    - [HUDI-2599] [Performance] Lower parallelism with
>> snapshot
>> > > > query
>> > > > > > on COW
>> > > > > > > >    tables in Presto (Owner: Sagar Sumit)
>> > > > > > > >    - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
>> > > > > > > >    - [HUDI-2636] Make release notes discoverable (Owner:
>> Kyle
>> > > > Weller)
>> > > > > > > >    - [HUDI-2637] Triage all bugs around Multi-writer and
>> > certify
>> > > > the
>> > > > > > tested
>> > > > > > > >    flows (Owner: sivabalan narayanan)
>> > > > > > > >    - [HUDI-2641] One inflight commit rolling back other
>> > > concurrent
>> > > > > > inflight
>> > > > > > > >    commits causing them to fail (Owner: Udit Mehrotra)
>> > > > > > > >    - [HUDI-2649] Kick off all the Hive query issues for
>> 0.10.0
>> > > > > (Owner:
>> > > > > > > >    Sagar Sumit)
>> > > > > > > >    - [HUDI-2666] async compaction failing with timeline
>> > > mismatches
>> > > > > > between
>> > > > > > > >    server and client when metadata is enabled (Owner: Manoj
>> > > > > > Govindassamy)
>> > > > > > > >    - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to
>> > > > partitions
>> > > > > > in
>> > > > > > > >    AbstractTablefileSystemView (Owner: Sagar Sumit)
>> > > > > > > >    - [HUDI-2671] Fix record offset handling in Kafka connect
>> > > > > > transaction
>> > > > > > > >    participant (Owner: Rajesh Mahindra)
>> > > > > > > >    - [HUDI-2672] Avoid empty commits and rollbacks when
>> there
>> > is
>> > > no
>> > > > > > event
>> > > > > > > >    from the topic (Owner: Rajesh Mahindra)
>> > > > > > > >    - [HUDI-2716] Fix InLineFS path conversions for S3FS
>> paths
>> > > > (Owner:
>> > > > > > Manoj
>> > > > > > > >    Govindassamy)
>> > > > > > > >    - [HUDI-2725] Add precommit validators doc (Owner: Kyle
>> > > Weller)
>> > > > > > > >    - [HUDI-2731] Clustering should work regardless of
>> whether
>> > > there
>> > > > > are
>> > > > > > > >    base files (Owner: Sagar Sumit)
>> > > > > > > >    - [HUDI-2734] Disable metadata by default for flink and
>> java
>> > > > > (Owner:
>> > > > > > > >    sivabalan narayanan)
>> > > > > > > >    - [HUDI-2735] Fix archival of commits in Java client for
>> > Kafka
>> > > > > > Connect
>> > > > > > > >    (Owner: Ethan Guo)
>> > > > > > > >    - [HUDI-2737] Use earliest instant by default for
>> compaction
>> > > and
>> > > > > > > >    clustering job (Owner: Ethan Guo)
>> > > > > > > >    - [HUDI-2741] Validate metadata config for all readers
>> > (Owner:
>> > > > > Sagar
>> > > > > > > >    Sumit)
>> > > > > > > >    - [HUDI-2745] Record count does not match input after
>> > > compaction
>> > > > > is
>> > > > > > > >    scheduled when running Hudi Kafka Connect sink (Owner:
>> Ethan
>> > > > Guo)
>> > > > > > > >    - [HUDI-2762] Ensure hive can query insert only logs in
>> MOR
>> > > > > (Owner:
>> > > > > > agar
>> > > > > > > >    Sumit)
>> > > > > > > >    - [HUDI-2763] Avoid persisting redundant key field in the
>> > > > Metadata
>> > > > > > table
>> > > > > > > >    record payload (Owner: Manoj Govindassamy)
>> > > > > > > >    - [HUDI-2764] Address test failures after enabling
>> virtual
>> > > keys
>> > > > > > support
>> > > > > > > >    for the metadata table (Owner: Manoj Govindassamy)
>> > > > > > > >    - [HUDI-2766] Enable marker based rollback by default
>> > (Owner:
>> > > > > > sivabalan
>> > > > > > > >    narayanan)
>> > > > > > > >    - [HUDI-2767] Enable timeline server based marker type as
>> > > > default
>> > > > > > > >    (Owner: sivabalan narayanan)
>> > > > > > > >    - [HUDI-2770] Update docs for HoodieCompactor
>> (compaction)
>> > and
>> > > > > > > >    HoodieClusteringJob (clustering) (Owner: Kyle Weller)
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Please respond to the thread if you think that I have missed
>> > > > > capturing
>> > > > > > any
>> > > > > > > > of the highlights or blockers for Hudi 0.10.0 release. For
>> the
>> > > > owners
>> > > > > > of
>> > > > > > > > these release blockers, can you please provide a specific
>> > > timeline
>> > > > > you
>> > > > > > are
>> > > > > > > > willing to commit to for finishing these so we can cut an
>> RC ?
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Danny
>> > > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Take Care,
>> > > > > Rajesh Mahindra
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > > -Sivabalan
>> > >
>> >
>>
>

Re: [DISCUSS] Hudi 0.10.0 Release

Posted by Manoj Govindassamy <ma...@gmail.com>.
Hi Danny,

We have one last PR https://github.com/apache/hudi/pull/4114 to land to
master. We are noticing one test flakiness with this last pending PR. The
same test is consistently passing in the local setup though. We are waiting
for the CI to finish before the merge to master. After this PR we are good
for cutting the 0.10 RC. Will keep you posted on the status.

thanks,
Manoj




On Sat, Nov 20, 2021 at 2:10 PM Raymond Xu <xu...@gmail.com>
wrote:

> Hi Danny, I'm good with the timeline.
>
> Cheers,
> Raymond
>
> On Fri, Nov 19, 2021 at 7:34 PM sagar sumit <sa...@gmail.com>
> wrote:
>
> > Hi Danny,
> >
> > I've added one more blocker: HUDI-2742
> > <https://issues.apache.org/jira/browse/HUDI-2742>
> > I am also good with the timelines.
> >
> > Regards,
> > Sagar
> >
> > On Sat, Nov 20, 2021 at 8:14 AM Sivabalan <n....@gmail.com> wrote:
> >
> > > Hi Danny,
> > >      I am good with the timelines. All my jiras should be completed by
> > > then.
> > >
> > >
> > > On Fri, Nov 19, 2021 at 8:41 PM Y Ethan Guo <et...@gmail.com>
> > > wrote:
> > >
> > > > Hi Danny,
> > > >
> > > > Thanks for summarizing the current progress towards the 0.10.0
> release.
> > > > I'm good with Nov 26th cutoff.
> > > >
> > > > Regarding my blockers:
> > > > - [HUDI-2332] Implement scheduling of compaction/ clustering for
> Kafka
> > > >    Connect (Owner: Ethan Guo)
> > > > PR is up.  I'm addressing comments.
> > > >
> > > > - [HUDI-2737] Use earliest instant by default for compaction and
> > > >    clustering job (Owner: Ethan Guo)
> > > > PR is up and approved.  It's near-landing after fixing CI failures.
> > > >
> > > > - [HUDI-2745] Record count does not match input after compaction is
> > > >    scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
> > > > HUDI-2745 is going to be blocked on HUDI-2480, which is going to
> > resolve
> > > > this issue once done.
> > > >
> > > > - [HUDI-2735] Fix archival of commits in Java client for Kafka
> Connect
> > > >    (Owner: Ethan Guo)
> > > > This is pending and requires investigation into the archival logic
> > which
> > > is
> > > > not Kafka-connect specific.
> > > >
> > > > Best,
> > > > - Ethan
> > > >
> > > >
> > > > On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra <rmahindra@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi Danny,
> > > > >
> > > > > I have the following blockers that have a PR up. I am working on a
> PR
> > > for
> > > > > the Debezium Source. I am fine with Nov 26th as cut off.
> > > > >
> > > > >    - [HUDI-2325] Implement and test Hive Sync support for Kafka
> > Connect
> > > > >    (Owner: Rajesh Mahindra)
> > > > >    - [HUDI-2671] Fix record offset handling in Kafka connect
> > > transaction
> > > > >    participant (Owner: Rajesh Mahindra)
> > > > >    - [HUDI-2672] Avoid empty commits and rollbacks when there is no
> > > event
> > > > >    from the topic (Owner: Rajesh Mahindra)
> > > > >
> > > > > ** Pending
> > > > >    - [HUDI-1290] Implement Debezium avro source for Delta Streamer
> > > > >
> > > > > Thanks
> > > > > Rajesh
> > > > >
> > > > >
> > > > > On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra <ud...@apache.org>
> > > wrote:
> > > > >
> > > > > > Hi Danny,
> > > > > >
> > > > > > I have a blocker as well
> > > > > > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut
> off
> > > date
> > > > > > works fine for me.
> > > > > >
> > > > > > Also, just an update on the above list: HUDI-2641, HUDI-2314,
> > > > > > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be
> > marked
> > > > > > in the highlights section. We will work on getting some doc
> updates
> > > > > > for the same by next week.
> > > > > >
> > > > > > Thanks,
> > > > > > Udit
> > > > > >
> > > > > > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar <
> vinoth@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > Hi Danny,
> > > > > > >
> > > > > > > I have one blocker. I plan to complete it by end of next week.
> I
> > am
> > > > > good
> > > > > > > with the prior Nov 26 cutoff.
> > > > > > > Does that work for everyone?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Vinoth
> > > > > > >
> > > > > > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <
> > danny0405@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Community,
> > > > > > > >
> > > > > > > > As we draw close to doing Hudi 0.10.0 release, I am happy to
> > > share
> > > > a
> > > > > > > > summary of the key features/improvements that would be going
> in
> > > the
> > > > > > release
> > > > > > > > and the current blockers for everyone's visibility.
> > > > > > > >
> > > > > > > > *Highlights*
> > > > > > > >
> > > > > > > >    - [HUDI-1290] Implement Debezium avro source for Delta
> > > Streamer
> > > > > > > >    - [HUDI-1491] Support partition pruning for MOR snapshot
> > query
> > > > > > > >    - [HUDI-1763] DefaultHoodieRecordPayload does not honor
> > > ordering
> > > > > > value
> > > > > > > >    when records within multiple log files are merged
> > > > > > > >    - [HUDI-1827] Add ORC support in Bootstrap Op
> > > > > > > >    - [HUDI-1869] Upgrading Spark3 To 3.1
> > > > > > > >    - [HUDI-2101] support z-order for hudi
> > > > > > > >    - [HUDI-2276] Enable Metadata Table by default for both
> > > writers
> > > > > and
> > > > > > > >    readers
> > > > > > > >    - [HUDI-2581] Analyze metadata size estimate in hudi with
> > > Hfile
> > > > > for
> > > > > > col
> > > > > > > >    stats partition
> > > > > > > >    - [HUDI-2634] Improve bootstrap performance for very large
> > > > tables
> > > > > > > >    - [HUDI-2086] redo the logical of mor_incremental_view for
> > > hive
> > > > > > > >    - [HUDI-2191] Bump flink version to 1.13.1
> > > > > > > >    - [HUDI-2285] Metadata Table Synchronous Design
> > > > > > > >    - [HUDI-2316] Support Flink batch upsert
> > > > > > > >    - [HUDI-2371] Improve flink streaming reader
> > > > > > > >    - [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka
> > > connect
> > > > > for
> > > > > > > >    immutable data
> > > > > > > >    - [HUDI-2449] Incremental read for Flink
> > > > > > > >    - [HUDI-2562] Embedded timeline server on JobManager
> > > > > > > >
> > > > > > > > *Current Blockers*
> > > > > > > >
> > > > > > > >    - [HUDI-1856] Upstream changes made in PrestoDB to
> eliminate
> > > > file
> > > > > > > >    listing to Trino (Owner: Sagar Sumit)
> > > > > > > >    - [HUDI-1912] Presto defaults to GenericHiveRecordCursor
> for
> > > all
> > > > > > Hudi
> > > > > > > >    tables (Owner: Sagar Sumit)
> > > > > > > >    - [HUDI-1932] Hive Sync should not always update
> > > > > > last_commit_time_sync
> > > > > > > >    (Owner: Raymond Xu)
> > > > > > > >    - [HUDI-1937] When clustering fail, generating unfinished
> > > > > > replacecommit
> > > > > > > >    timeline. (Owner: Sagar Sumit)
> > > > > > > >    - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner:
> > > Sagar
> > > > > > Sumit)
> > > > > > > >    - [HUDI-2314] Add DynamoDb based lock provider (Owner:
> > Wenning
> > > > > Ding)
> > > > > > > >    - [HUDI-2325] Implement and test Hive Sync support for
> Kafka
> > > > > Connect
> > > > > > > >    (Owner: Rajesh Mahindra)
> > > > > > > >    - [HUDI-2332] Implement scheduling of compaction/
> clustering
> > > for
> > > > > > Kafka
> > > > > > > >    Connect (Owner: Ethan Guo)
> > > > > > > >    - [HUDI-2362] Hudi external configuration file support
> > (Owner:
> > > > > > Wenning
> > > > > > > >    Ding)
> > > > > > > >    - [HUDI-2409] Using HBase shaded jars in Hudi presto
> bundle
> > > > > (Owner:
> > > > > > > >    Sagar Sumit)
> > > > > > > >    - [HUDI-2443] KVComparator in HFile for metadata table is
> > tied
> > > > to
> > > > > > HBase
> > > > > > > >    version and shading (Owner: Sagar Sumit)
> > > > > > > >    - [HUDI-2472] Tests failure follow up when metadata is
> > enabled
> > > > by
> > > > > > > >    default (Owner: Manoj Govindassamy)
> > > > > > > >    - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 &
> > > > enabling
> > > > > > > >    metadata (Owner: Manoj Govindassamy)
> > > > > > > >    - [HUDI-2478] Handle failure mid-way during init buckets
> > > (Owner:
> > > > > > Vinoth
> > > > > > > >    Chandar)
> > > > > > > >    - [HUDI-2480] FileSlice after pending compaction-requested
> > > > > > instant-time
> > > > > > > >    is ignored by MOR snapshot reader (Owner: Danny Chen)
> > > > > > > >    - [HUDI-2488] Support bootstrapping a single or more
> > > partitions
> > > > in
> > > > > > > >    metadata table while regular writers and table services
> are
> > in
> > > > > > progress
> > > > > > > >    (Owner: Vinoth Chandar)
> > > > > > > >    - [HUDI-2527] Flaky test:
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
> > > > > > > >    (Owner: sivabalan narayanan)
> > > > > > > >    - [HUDI-2559] Ensure unique timestamps are generated for
> > > commit
> > > > > > times
> > > > > > > >    with concurrent writers (Owner: sivabalan narayanan)
> > > > > > > >    - [HUDI-2593] Virtual keys support for metadata table
> > (Owner:
> > > > > Manoj
> > > > > > > >    Govindassamy)
> > > > > > > >    - [HUDI-2599] [Performance] Lower parallelism with
> snapshot
> > > > query
> > > > > > on COW
> > > > > > > >    tables in Presto (Owner: Sagar Sumit)
> > > > > > > >    - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
> > > > > > > >    - [HUDI-2636] Make release notes discoverable (Owner: Kyle
> > > > Weller)
> > > > > > > >    - [HUDI-2637] Triage all bugs around Multi-writer and
> > certify
> > > > the
> > > > > > tested
> > > > > > > >    flows (Owner: sivabalan narayanan)
> > > > > > > >    - [HUDI-2641] One inflight commit rolling back other
> > > concurrent
> > > > > > inflight
> > > > > > > >    commits causing them to fail (Owner: Udit Mehrotra)
> > > > > > > >    - [HUDI-2649] Kick off all the Hive query issues for
> 0.10.0
> > > > > (Owner:
> > > > > > > >    Sagar Sumit)
> > > > > > > >    - [HUDI-2666] async compaction failing with timeline
> > > mismatches
> > > > > > between
> > > > > > > >    server and client when metadata is enabled (Owner: Manoj
> > > > > > Govindassamy)
> > > > > > > >    - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to
> > > > partitions
> > > > > > in
> > > > > > > >    AbstractTablefileSystemView (Owner: Sagar Sumit)
> > > > > > > >    - [HUDI-2671] Fix record offset handling in Kafka connect
> > > > > > transaction
> > > > > > > >    participant (Owner: Rajesh Mahindra)
> > > > > > > >    - [HUDI-2672] Avoid empty commits and rollbacks when there
> > is
> > > no
> > > > > > event
> > > > > > > >    from the topic (Owner: Rajesh Mahindra)
> > > > > > > >    - [HUDI-2716] Fix InLineFS path conversions for S3FS paths
> > > > (Owner:
> > > > > > Manoj
> > > > > > > >    Govindassamy)
> > > > > > > >    - [HUDI-2725] Add precommit validators doc (Owner: Kyle
> > > Weller)
> > > > > > > >    - [HUDI-2731] Clustering should work regardless of whether
> > > there
> > > > > are
> > > > > > > >    base files (Owner: Sagar Sumit)
> > > > > > > >    - [HUDI-2734] Disable metadata by default for flink and
> java
> > > > > (Owner:
> > > > > > > >    sivabalan narayanan)
> > > > > > > >    - [HUDI-2735] Fix archival of commits in Java client for
> > Kafka
> > > > > > Connect
> > > > > > > >    (Owner: Ethan Guo)
> > > > > > > >    - [HUDI-2737] Use earliest instant by default for
> compaction
> > > and
> > > > > > > >    clustering job (Owner: Ethan Guo)
> > > > > > > >    - [HUDI-2741] Validate metadata config for all readers
> > (Owner:
> > > > > Sagar
> > > > > > > >    Sumit)
> > > > > > > >    - [HUDI-2745] Record count does not match input after
> > > compaction
> > > > > is
> > > > > > > >    scheduled when running Hudi Kafka Connect sink (Owner:
> Ethan
> > > > Guo)
> > > > > > > >    - [HUDI-2762] Ensure hive can query insert only logs in
> MOR
> > > > > (Owner:
> > > > > > agar
> > > > > > > >    Sumit)
> > > > > > > >    - [HUDI-2763] Avoid persisting redundant key field in the
> > > > Metadata
> > > > > > table
> > > > > > > >    record payload (Owner: Manoj Govindassamy)
> > > > > > > >    - [HUDI-2764] Address test failures after enabling virtual
> > > keys
> > > > > > support
> > > > > > > >    for the metadata table (Owner: Manoj Govindassamy)
> > > > > > > >    - [HUDI-2766] Enable marker based rollback by default
> > (Owner:
> > > > > > sivabalan
> > > > > > > >    narayanan)
> > > > > > > >    - [HUDI-2767] Enable timeline server based marker type as
> > > > default
> > > > > > > >    (Owner: sivabalan narayanan)
> > > > > > > >    - [HUDI-2770] Update docs for HoodieCompactor (compaction)
> > and
> > > > > > > >    HoodieClusteringJob (clustering) (Owner: Kyle Weller)
> > > > > > > >
> > > > > > > >
> > > > > > > > Please respond to the thread if you think that I have missed
> > > > > capturing
> > > > > > any
> > > > > > > > of the highlights or blockers for Hudi 0.10.0 release. For
> the
> > > > owners
> > > > > > of
> > > > > > > > these release blockers, can you please provide a specific
> > > timeline
> > > > > you
> > > > > > are
> > > > > > > > willing to commit to for finishing these so we can cut an RC
> ?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Danny
> > > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Take Care,
> > > > > Rajesh Mahindra
> > > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > -Sivabalan
> > >
> >
>

Re: [DISCUSS] Hudi 0.10.0 Release

Posted by Raymond Xu <xu...@gmail.com>.
Hi Danny, I'm good with the timeline.

Cheers,
Raymond

On Fri, Nov 19, 2021 at 7:34 PM sagar sumit <sa...@gmail.com> wrote:

> Hi Danny,
>
> I've added one more blocker: HUDI-2742
> <https://issues.apache.org/jira/browse/HUDI-2742>
> I am also good with the timelines.
>
> Regards,
> Sagar
>
> On Sat, Nov 20, 2021 at 8:14 AM Sivabalan <n....@gmail.com> wrote:
>
> > Hi Danny,
> >      I am good with the timelines. All my jiras should be completed by
> > then.
> >
> >
> > On Fri, Nov 19, 2021 at 8:41 PM Y Ethan Guo <et...@gmail.com>
> > wrote:
> >
> > > Hi Danny,
> > >
> > > Thanks for summarizing the current progress towards the 0.10.0 release.
> > > I'm good with Nov 26th cutoff.
> > >
> > > Regarding my blockers:
> > > - [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
> > >    Connect (Owner: Ethan Guo)
> > > PR is up.  I'm addressing comments.
> > >
> > > - [HUDI-2737] Use earliest instant by default for compaction and
> > >    clustering job (Owner: Ethan Guo)
> > > PR is up and approved.  It's near-landing after fixing CI failures.
> > >
> > > - [HUDI-2745] Record count does not match input after compaction is
> > >    scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
> > > HUDI-2745 is going to be blocked on HUDI-2480, which is going to
> resolve
> > > this issue once done.
> > >
> > > - [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
> > >    (Owner: Ethan Guo)
> > > This is pending and requires investigation into the archival logic
> which
> > is
> > > not Kafka-connect specific.
> > >
> > > Best,
> > > - Ethan
> > >
> > >
> > > On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra <rm...@gmail.com>
> > > wrote:
> > >
> > > > Hi Danny,
> > > >
> > > > I have the following blockers that have a PR up. I am working on a PR
> > for
> > > > the Debezium Source. I am fine with Nov 26th as cut off.
> > > >
> > > >    - [HUDI-2325] Implement and test Hive Sync support for Kafka
> Connect
> > > >    (Owner: Rajesh Mahindra)
> > > >    - [HUDI-2671] Fix record offset handling in Kafka connect
> > transaction
> > > >    participant (Owner: Rajesh Mahindra)
> > > >    - [HUDI-2672] Avoid empty commits and rollbacks when there is no
> > event
> > > >    from the topic (Owner: Rajesh Mahindra)
> > > >
> > > > ** Pending
> > > >    - [HUDI-1290] Implement Debezium avro source for Delta Streamer
> > > >
> > > > Thanks
> > > > Rajesh
> > > >
> > > >
> > > > On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra <ud...@apache.org>
> > wrote:
> > > >
> > > > > Hi Danny,
> > > > >
> > > > > I have a blocker as well
> > > > > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut off
> > date
> > > > > works fine for me.
> > > > >
> > > > > Also, just an update on the above list: HUDI-2641, HUDI-2314,
> > > > > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be
> marked
> > > > > in the highlights section. We will work on getting some doc updates
> > > > > for the same by next week.
> > > > >
> > > > > Thanks,
> > > > > Udit
> > > > >
> > > > > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar <vi...@apache.org>
> > > > wrote:
> > > > > >
> > > > > > Hi Danny,
> > > > > >
> > > > > > I have one blocker. I plan to complete it by end of next week. I
> am
> > > > good
> > > > > > with the prior Nov 26 cutoff.
> > > > > > Does that work for everyone?
> > > > > >
> > > > > > Thanks
> > > > > > Vinoth
> > > > > >
> > > > > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <
> danny0405@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Community,
> > > > > > >
> > > > > > > As we draw close to doing Hudi 0.10.0 release, I am happy to
> > share
> > > a
> > > > > > > summary of the key features/improvements that would be going in
> > the
> > > > > release
> > > > > > > and the current blockers for everyone's visibility.
> > > > > > >
> > > > > > > *Highlights*
> > > > > > >
> > > > > > >    - [HUDI-1290] Implement Debezium avro source for Delta
> > Streamer
> > > > > > >    - [HUDI-1491] Support partition pruning for MOR snapshot
> query
> > > > > > >    - [HUDI-1763] DefaultHoodieRecordPayload does not honor
> > ordering
> > > > > value
> > > > > > >    when records within multiple log files are merged
> > > > > > >    - [HUDI-1827] Add ORC support in Bootstrap Op
> > > > > > >    - [HUDI-1869] Upgrading Spark3 To 3.1
> > > > > > >    - [HUDI-2101] support z-order for hudi
> > > > > > >    - [HUDI-2276] Enable Metadata Table by default for both
> > writers
> > > > and
> > > > > > >    readers
> > > > > > >    - [HUDI-2581] Analyze metadata size estimate in hudi with
> > Hfile
> > > > for
> > > > > col
> > > > > > >    stats partition
> > > > > > >    - [HUDI-2634] Improve bootstrap performance for very large
> > > tables
> > > > > > >    - [HUDI-2086] redo the logical of mor_incremental_view for
> > hive
> > > > > > >    - [HUDI-2191] Bump flink version to 1.13.1
> > > > > > >    - [HUDI-2285] Metadata Table Synchronous Design
> > > > > > >    - [HUDI-2316] Support Flink batch upsert
> > > > > > >    - [HUDI-2371] Improve flink streaming reader
> > > > > > >    - [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka
> > connect
> > > > for
> > > > > > >    immutable data
> > > > > > >    - [HUDI-2449] Incremental read for Flink
> > > > > > >    - [HUDI-2562] Embedded timeline server on JobManager
> > > > > > >
> > > > > > > *Current Blockers*
> > > > > > >
> > > > > > >    - [HUDI-1856] Upstream changes made in PrestoDB to eliminate
> > > file
> > > > > > >    listing to Trino (Owner: Sagar Sumit)
> > > > > > >    - [HUDI-1912] Presto defaults to GenericHiveRecordCursor for
> > all
> > > > > Hudi
> > > > > > >    tables (Owner: Sagar Sumit)
> > > > > > >    - [HUDI-1932] Hive Sync should not always update
> > > > > last_commit_time_sync
> > > > > > >    (Owner: Raymond Xu)
> > > > > > >    - [HUDI-1937] When clustering fail, generating unfinished
> > > > > replacecommit
> > > > > > >    timeline. (Owner: Sagar Sumit)
> > > > > > >    - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner:
> > Sagar
> > > > > Sumit)
> > > > > > >    - [HUDI-2314] Add DynamoDb based lock provider (Owner:
> Wenning
> > > > Ding)
> > > > > > >    - [HUDI-2325] Implement and test Hive Sync support for Kafka
> > > > Connect
> > > > > > >    (Owner: Rajesh Mahindra)
> > > > > > >    - [HUDI-2332] Implement scheduling of compaction/ clustering
> > for
> > > > > Kafka
> > > > > > >    Connect (Owner: Ethan Guo)
> > > > > > >    - [HUDI-2362] Hudi external configuration file support
> (Owner:
> > > > > Wenning
> > > > > > >    Ding)
> > > > > > >    - [HUDI-2409] Using HBase shaded jars in Hudi presto bundle
> > > > (Owner:
> > > > > > >    Sagar Sumit)
> > > > > > >    - [HUDI-2443] KVComparator in HFile for metadata table is
> tied
> > > to
> > > > > HBase
> > > > > > >    version and shading (Owner: Sagar Sumit)
> > > > > > >    - [HUDI-2472] Tests failure follow up when metadata is
> enabled
> > > by
> > > > > > >    default (Owner: Manoj Govindassamy)
> > > > > > >    - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 &
> > > enabling
> > > > > > >    metadata (Owner: Manoj Govindassamy)
> > > > > > >    - [HUDI-2478] Handle failure mid-way during init buckets
> > (Owner:
> > > > > Vinoth
> > > > > > >    Chandar)
> > > > > > >    - [HUDI-2480] FileSlice after pending compaction-requested
> > > > > instant-time
> > > > > > >    is ignored by MOR snapshot reader (Owner: Danny Chen)
> > > > > > >    - [HUDI-2488] Support bootstrapping a single or more
> > partitions
> > > in
> > > > > > >    metadata table while regular writers and table services are
> in
> > > > > progress
> > > > > > >    (Owner: Vinoth Chandar)
> > > > > > >    - [HUDI-2527] Flaky test:
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
> > > > > > >    (Owner: sivabalan narayanan)
> > > > > > >    - [HUDI-2559] Ensure unique timestamps are generated for
> > commit
> > > > > times
> > > > > > >    with concurrent writers (Owner: sivabalan narayanan)
> > > > > > >    - [HUDI-2593] Virtual keys support for metadata table
> (Owner:
> > > > Manoj
> > > > > > >    Govindassamy)
> > > > > > >    - [HUDI-2599] [Performance] Lower parallelism with snapshot
> > > query
> > > > > on COW
> > > > > > >    tables in Presto (Owner: Sagar Sumit)
> > > > > > >    - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
> > > > > > >    - [HUDI-2636] Make release notes discoverable (Owner: Kyle
> > > Weller)
> > > > > > >    - [HUDI-2637] Triage all bugs around Multi-writer and
> certify
> > > the
> > > > > tested
> > > > > > >    flows (Owner: sivabalan narayanan)
> > > > > > >    - [HUDI-2641] One inflight commit rolling back other
> > concurrent
> > > > > inflight
> > > > > > >    commits causing them to fail (Owner: Udit Mehrotra)
> > > > > > >    - [HUDI-2649] Kick off all the Hive query issues for 0.10.0
> > > > (Owner:
> > > > > > >    Sagar Sumit)
> > > > > > >    - [HUDI-2666] async compaction failing with timeline
> > mismatches
> > > > > between
> > > > > > >    server and client when metadata is enabled (Owner: Manoj
> > > > > Govindassamy)
> > > > > > >    - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to
> > > partitions
> > > > > in
> > > > > > >    AbstractTablefileSystemView (Owner: Sagar Sumit)
> > > > > > >    - [HUDI-2671] Fix record offset handling in Kafka connect
> > > > > transaction
> > > > > > >    participant (Owner: Rajesh Mahindra)
> > > > > > >    - [HUDI-2672] Avoid empty commits and rollbacks when there
> is
> > no
> > > > > event
> > > > > > >    from the topic (Owner: Rajesh Mahindra)
> > > > > > >    - [HUDI-2716] Fix InLineFS path conversions for S3FS paths
> > > (Owner:
> > > > > Manoj
> > > > > > >    Govindassamy)
> > > > > > >    - [HUDI-2725] Add precommit validators doc (Owner: Kyle
> > Weller)
> > > > > > >    - [HUDI-2731] Clustering should work regardless of whether
> > there
> > > > are
> > > > > > >    base files (Owner: Sagar Sumit)
> > > > > > >    - [HUDI-2734] Disable metadata by default for flink and java
> > > > (Owner:
> > > > > > >    sivabalan narayanan)
> > > > > > >    - [HUDI-2735] Fix archival of commits in Java client for
> Kafka
> > > > > Connect
> > > > > > >    (Owner: Ethan Guo)
> > > > > > >    - [HUDI-2737] Use earliest instant by default for compaction
> > and
> > > > > > >    clustering job (Owner: Ethan Guo)
> > > > > > >    - [HUDI-2741] Validate metadata config for all readers
> (Owner:
> > > > Sagar
> > > > > > >    Sumit)
> > > > > > >    - [HUDI-2745] Record count does not match input after
> > compaction
> > > > is
> > > > > > >    scheduled when running Hudi Kafka Connect sink (Owner: Ethan
> > > Guo)
> > > > > > >    - [HUDI-2762] Ensure hive can query insert only logs in MOR
> > > > (Owner:
> > > > > agar
> > > > > > >    Sumit)
> > > > > > >    - [HUDI-2763] Avoid persisting redundant key field in the
> > > Metadata
> > > > > table
> > > > > > >    record payload (Owner: Manoj Govindassamy)
> > > > > > >    - [HUDI-2764] Address test failures after enabling virtual
> > keys
> > > > > support
> > > > > > >    for the metadata table (Owner: Manoj Govindassamy)
> > > > > > >    - [HUDI-2766] Enable marker based rollback by default
> (Owner:
> > > > > sivabalan
> > > > > > >    narayanan)
> > > > > > >    - [HUDI-2767] Enable timeline server based marker type as
> > > default
> > > > > > >    (Owner: sivabalan narayanan)
> > > > > > >    - [HUDI-2770] Update docs for HoodieCompactor (compaction)
> and
> > > > > > >    HoodieClusteringJob (clustering) (Owner: Kyle Weller)
> > > > > > >
> > > > > > >
> > > > > > > Please respond to the thread if you think that I have missed
> > > > capturing
> > > > > any
> > > > > > > of the highlights or blockers for Hudi 0.10.0 release. For the
> > > owners
> > > > > of
> > > > > > > these release blockers, can you please provide a specific
> > timeline
> > > > you
> > > > > are
> > > > > > > willing to commit to for finishing these so we can cut an RC ?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Danny
> > > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Take Care,
> > > > Rajesh Mahindra
> > > >
> > >
> >
> >
> > --
> > Regards,
> > -Sivabalan
> >
>

Re: [DISCUSS] Hudi 0.10.0 Release

Posted by sagar sumit <sa...@gmail.com>.
Hi Danny,

I've added one more blocker: HUDI-2742
<https://issues.apache.org/jira/browse/HUDI-2742>
I am also good with the timelines.

Regards,
Sagar

On Sat, Nov 20, 2021 at 8:14 AM Sivabalan <n....@gmail.com> wrote:

> Hi Danny,
>      I am good with the timelines. All my jiras should be completed by
> then.
>
>
> On Fri, Nov 19, 2021 at 8:41 PM Y Ethan Guo <et...@gmail.com>
> wrote:
>
> > Hi Danny,
> >
> > Thanks for summarizing the current progress towards the 0.10.0 release.
> > I'm good with Nov 26th cutoff.
> >
> > Regarding my blockers:
> > - [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
> >    Connect (Owner: Ethan Guo)
> > PR is up.  I'm addressing comments.
> >
> > - [HUDI-2737] Use earliest instant by default for compaction and
> >    clustering job (Owner: Ethan Guo)
> > PR is up and approved.  It's near-landing after fixing CI failures.
> >
> > - [HUDI-2745] Record count does not match input after compaction is
> >    scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
> > HUDI-2745 is going to be blocked on HUDI-2480, which is going to resolve
> > this issue once done.
> >
> > - [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
> >    (Owner: Ethan Guo)
> > This is pending and requires investigation into the archival logic which
> is
> > not Kafka-connect specific.
> >
> > Best,
> > - Ethan
> >
> >
> > On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra <rm...@gmail.com>
> > wrote:
> >
> > > Hi Danny,
> > >
> > > I have the following blockers that have a PR up. I am working on a PR
> for
> > > the Debezium Source. I am fine with Nov 26th as cut off.
> > >
> > >    - [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
> > >    (Owner: Rajesh Mahindra)
> > >    - [HUDI-2671] Fix record offset handling in Kafka connect
> transaction
> > >    participant (Owner: Rajesh Mahindra)
> > >    - [HUDI-2672] Avoid empty commits and rollbacks when there is no
> event
> > >    from the topic (Owner: Rajesh Mahindra)
> > >
> > > ** Pending
> > >    - [HUDI-1290] Implement Debezium avro source for Delta Streamer
> > >
> > > Thanks
> > > Rajesh
> > >
> > >
> > > On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra <ud...@apache.org>
> wrote:
> > >
> > > > Hi Danny,
> > > >
> > > > I have a blocker as well
> > > > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut off
> date
> > > > works fine for me.
> > > >
> > > > Also, just an update on the above list: HUDI-2641, HUDI-2314,
> > > > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be marked
> > > > in the highlights section. We will work on getting some doc updates
> > > > for the same by next week.
> > > >
> > > > Thanks,
> > > > Udit
> > > >
> > > > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar <vi...@apache.org>
> > > wrote:
> > > > >
> > > > > Hi Danny,
> > > > >
> > > > > I have one blocker. I plan to complete it by end of next week. I am
> > > good
> > > > > with the prior Nov 26 cutoff.
> > > > > Does that work for everyone?
> > > > >
> > > > > Thanks
> > > > > Vinoth
> > > > >
> > > > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <da...@apache.org>
> > > > wrote:
> > > > >
> > > > > > Hi Community,
> > > > > >
> > > > > > As we draw close to doing Hudi 0.10.0 release, I am happy to
> share
> > a
> > > > > > summary of the key features/improvements that would be going in
> the
> > > > release
> > > > > > and the current blockers for everyone's visibility.
> > > > > >
> > > > > > *Highlights*
> > > > > >
> > > > > >    - [HUDI-1290] Implement Debezium avro source for Delta
> Streamer
> > > > > >    - [HUDI-1491] Support partition pruning for MOR snapshot query
> > > > > >    - [HUDI-1763] DefaultHoodieRecordPayload does not honor
> ordering
> > > > value
> > > > > >    when records within multiple log files are merged
> > > > > >    - [HUDI-1827] Add ORC support in Bootstrap Op
> > > > > >    - [HUDI-1869] Upgrading Spark3 To 3.1
> > > > > >    - [HUDI-2101] support z-order for hudi
> > > > > >    - [HUDI-2276] Enable Metadata Table by default for both
> writers
> > > and
> > > > > >    readers
> > > > > >    - [HUDI-2581] Analyze metadata size estimate in hudi with
> Hfile
> > > for
> > > > col
> > > > > >    stats partition
> > > > > >    - [HUDI-2634] Improve bootstrap performance for very large
> > tables
> > > > > >    - [HUDI-2086] redo the logical of mor_incremental_view for
> hive
> > > > > >    - [HUDI-2191] Bump flink version to 1.13.1
> > > > > >    - [HUDI-2285] Metadata Table Synchronous Design
> > > > > >    - [HUDI-2316] Support Flink batch upsert
> > > > > >    - [HUDI-2371] Improve flink streaming reader
> > > > > >    - [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka
> connect
> > > for
> > > > > >    immutable data
> > > > > >    - [HUDI-2449] Incremental read for Flink
> > > > > >    - [HUDI-2562] Embedded timeline server on JobManager
> > > > > >
> > > > > > *Current Blockers*
> > > > > >
> > > > > >    - [HUDI-1856] Upstream changes made in PrestoDB to eliminate
> > file
> > > > > >    listing to Trino (Owner: Sagar Sumit)
> > > > > >    - [HUDI-1912] Presto defaults to GenericHiveRecordCursor for
> all
> > > > Hudi
> > > > > >    tables (Owner: Sagar Sumit)
> > > > > >    - [HUDI-1932] Hive Sync should not always update
> > > > last_commit_time_sync
> > > > > >    (Owner: Raymond Xu)
> > > > > >    - [HUDI-1937] When clustering fail, generating unfinished
> > > > replacecommit
> > > > > >    timeline. (Owner: Sagar Sumit)
> > > > > >    - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner:
> Sagar
> > > > Sumit)
> > > > > >    - [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning
> > > Ding)
> > > > > >    - [HUDI-2325] Implement and test Hive Sync support for Kafka
> > > Connect
> > > > > >    (Owner: Rajesh Mahindra)
> > > > > >    - [HUDI-2332] Implement scheduling of compaction/ clustering
> for
> > > > Kafka
> > > > > >    Connect (Owner: Ethan Guo)
> > > > > >    - [HUDI-2362] Hudi external configuration file support (Owner:
> > > > Wenning
> > > > > >    Ding)
> > > > > >    - [HUDI-2409] Using HBase shaded jars in Hudi presto bundle
> > > (Owner:
> > > > > >    Sagar Sumit)
> > > > > >    - [HUDI-2443] KVComparator in HFile for metadata table is tied
> > to
> > > > HBase
> > > > > >    version and shading (Owner: Sagar Sumit)
> > > > > >    - [HUDI-2472] Tests failure follow up when metadata is enabled
> > by
> > > > > >    default (Owner: Manoj Govindassamy)
> > > > > >    - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 &
> > enabling
> > > > > >    metadata (Owner: Manoj Govindassamy)
> > > > > >    - [HUDI-2478] Handle failure mid-way during init buckets
> (Owner:
> > > > Vinoth
> > > > > >    Chandar)
> > > > > >    - [HUDI-2480] FileSlice after pending compaction-requested
> > > > instant-time
> > > > > >    is ignored by MOR snapshot reader (Owner: Danny Chen)
> > > > > >    - [HUDI-2488] Support bootstrapping a single or more
> partitions
> > in
> > > > > >    metadata table while regular writers and table services are in
> > > > progress
> > > > > >    (Owner: Vinoth Chandar)
> > > > > >    - [HUDI-2527] Flaky test:
> > > > > >
> > > > > >
> > > >
> > >
> >
> TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
> > > > > >    (Owner: sivabalan narayanan)
> > > > > >    - [HUDI-2559] Ensure unique timestamps are generated for
> commit
> > > > times
> > > > > >    with concurrent writers (Owner: sivabalan narayanan)
> > > > > >    - [HUDI-2593] Virtual keys support for metadata table (Owner:
> > > Manoj
> > > > > >    Govindassamy)
> > > > > >    - [HUDI-2599] [Performance] Lower parallelism with snapshot
> > query
> > > > on COW
> > > > > >    tables in Presto (Owner: Sagar Sumit)
> > > > > >    - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
> > > > > >    - [HUDI-2636] Make release notes discoverable (Owner: Kyle
> > Weller)
> > > > > >    - [HUDI-2637] Triage all bugs around Multi-writer and certify
> > the
> > > > tested
> > > > > >    flows (Owner: sivabalan narayanan)
> > > > > >    - [HUDI-2641] One inflight commit rolling back other
> concurrent
> > > > inflight
> > > > > >    commits causing them to fail (Owner: Udit Mehrotra)
> > > > > >    - [HUDI-2649] Kick off all the Hive query issues for 0.10.0
> > > (Owner:
> > > > > >    Sagar Sumit)
> > > > > >    - [HUDI-2666] async compaction failing with timeline
> mismatches
> > > > between
> > > > > >    server and client when metadata is enabled (Owner: Manoj
> > > > Govindassamy)
> > > > > >    - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to
> > partitions
> > > > in
> > > > > >    AbstractTablefileSystemView (Owner: Sagar Sumit)
> > > > > >    - [HUDI-2671] Fix record offset handling in Kafka connect
> > > > transaction
> > > > > >    participant (Owner: Rajesh Mahindra)
> > > > > >    - [HUDI-2672] Avoid empty commits and rollbacks when there is
> no
> > > > event
> > > > > >    from the topic (Owner: Rajesh Mahindra)
> > > > > >    - [HUDI-2716] Fix InLineFS path conversions for S3FS paths
> > (Owner:
> > > > Manoj
> > > > > >    Govindassamy)
> > > > > >    - [HUDI-2725] Add precommit validators doc (Owner: Kyle
> Weller)
> > > > > >    - [HUDI-2731] Clustering should work regardless of whether
> there
> > > are
> > > > > >    base files (Owner: Sagar Sumit)
> > > > > >    - [HUDI-2734] Disable metadata by default for flink and java
> > > (Owner:
> > > > > >    sivabalan narayanan)
> > > > > >    - [HUDI-2735] Fix archival of commits in Java client for Kafka
> > > > Connect
> > > > > >    (Owner: Ethan Guo)
> > > > > >    - [HUDI-2737] Use earliest instant by default for compaction
> and
> > > > > >    clustering job (Owner: Ethan Guo)
> > > > > >    - [HUDI-2741] Validate metadata config for all readers (Owner:
> > > Sagar
> > > > > >    Sumit)
> > > > > >    - [HUDI-2745] Record count does not match input after
> compaction
> > > is
> > > > > >    scheduled when running Hudi Kafka Connect sink (Owner: Ethan
> > Guo)
> > > > > >    - [HUDI-2762] Ensure hive can query insert only logs in MOR
> > > (Owner:
> > > > agar
> > > > > >    Sumit)
> > > > > >    - [HUDI-2763] Avoid persisting redundant key field in the
> > Metadata
> > > > table
> > > > > >    record payload (Owner: Manoj Govindassamy)
> > > > > >    - [HUDI-2764] Address test failures after enabling virtual
> keys
> > > > support
> > > > > >    for the metadata table (Owner: Manoj Govindassamy)
> > > > > >    - [HUDI-2766] Enable marker based rollback by default (Owner:
> > > > sivabalan
> > > > > >    narayanan)
> > > > > >    - [HUDI-2767] Enable timeline server based marker type as
> > default
> > > > > >    (Owner: sivabalan narayanan)
> > > > > >    - [HUDI-2770] Update docs for HoodieCompactor (compaction) and
> > > > > >    HoodieClusteringJob (clustering) (Owner: Kyle Weller)
> > > > > >
> > > > > >
> > > > > > Please respond to the thread if you think that I have missed
> > > capturing
> > > > any
> > > > > > of the highlights or blockers for Hudi 0.10.0 release. For the
> > owners
> > > > of
> > > > > > these release blockers, can you please provide a specific
> timeline
> > > you
> > > > are
> > > > > > willing to commit to for finishing these so we can cut an RC ?
> > > > > >
> > > > > > Thanks,
> > > > > > Danny
> > > > > >
> > > >
> > >
> > >
> > > --
> > > Take Care,
> > > Rajesh Mahindra
> > >
> >
>
>
> --
> Regards,
> -Sivabalan
>

Re: [DISCUSS] Hudi 0.10.0 Release

Posted by Sivabalan <n....@gmail.com>.
Hi Danny,
     I am good with the timelines. All my jiras should be completed by
then.


On Fri, Nov 19, 2021 at 8:41 PM Y Ethan Guo <et...@gmail.com>
wrote:

> Hi Danny,
>
> Thanks for summarizing the current progress towards the 0.10.0 release.
> I'm good with Nov 26th cutoff.
>
> Regarding my blockers:
> - [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
>    Connect (Owner: Ethan Guo)
> PR is up.  I'm addressing comments.
>
> - [HUDI-2737] Use earliest instant by default for compaction and
>    clustering job (Owner: Ethan Guo)
> PR is up and approved.  It's near-landing after fixing CI failures.
>
> - [HUDI-2745] Record count does not match input after compaction is
>    scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
> HUDI-2745 is going to be blocked on HUDI-2480, which is going to resolve
> this issue once done.
>
> - [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
>    (Owner: Ethan Guo)
> This is pending and requires investigation into the archival logic which is
> not Kafka-connect specific.
>
> Best,
> - Ethan
>
>
> On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra <rm...@gmail.com>
> wrote:
>
> > Hi Danny,
> >
> > I have the following blockers that have a PR up. I am working on a PR for
> > the Debezium Source. I am fine with Nov 26th as cut off.
> >
> >    - [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
> >    (Owner: Rajesh Mahindra)
> >    - [HUDI-2671] Fix record offset handling in Kafka connect transaction
> >    participant (Owner: Rajesh Mahindra)
> >    - [HUDI-2672] Avoid empty commits and rollbacks when there is no event
> >    from the topic (Owner: Rajesh Mahindra)
> >
> > ** Pending
> >    - [HUDI-1290] Implement Debezium avro source for Delta Streamer
> >
> > Thanks
> > Rajesh
> >
> >
> > On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra <ud...@apache.org> wrote:
> >
> > > Hi Danny,
> > >
> > > I have a blocker as well
> > > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut off date
> > > works fine for me.
> > >
> > > Also, just an update on the above list: HUDI-2641, HUDI-2314,
> > > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be marked
> > > in the highlights section. We will work on getting some doc updates
> > > for the same by next week.
> > >
> > > Thanks,
> > > Udit
> > >
> > > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar <vi...@apache.org>
> > wrote:
> > > >
> > > > Hi Danny,
> > > >
> > > > I have one blocker. I plan to complete it by end of next week. I am
> > good
> > > > with the prior Nov 26 cutoff.
> > > > Does that work for everyone?
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <da...@apache.org>
> > > wrote:
> > > >
> > > > > Hi Community,
> > > > >
> > > > > As we draw close to doing Hudi 0.10.0 release, I am happy to share
> a
> > > > > summary of the key features/improvements that would be going in the
> > > release
> > > > > and the current blockers for everyone's visibility.
> > > > >
> > > > > *Highlights*
> > > > >
> > > > >    - [HUDI-1290] Implement Debezium avro source for Delta Streamer
> > > > >    - [HUDI-1491] Support partition pruning for MOR snapshot query
> > > > >    - [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering
> > > value
> > > > >    when records within multiple log files are merged
> > > > >    - [HUDI-1827] Add ORC support in Bootstrap Op
> > > > >    - [HUDI-1869] Upgrading Spark3 To 3.1
> > > > >    - [HUDI-2101] support z-order for hudi
> > > > >    - [HUDI-2276] Enable Metadata Table by default for both writers
> > and
> > > > >    readers
> > > > >    - [HUDI-2581] Analyze metadata size estimate in hudi with Hfile
> > for
> > > col
> > > > >    stats partition
> > > > >    - [HUDI-2634] Improve bootstrap performance for very large
> tables
> > > > >    - [HUDI-2086] redo the logical of mor_incremental_view for hive
> > > > >    - [HUDI-2191] Bump flink version to 1.13.1
> > > > >    - [HUDI-2285] Metadata Table Synchronous Design
> > > > >    - [HUDI-2316] Support Flink batch upsert
> > > > >    - [HUDI-2371] Improve flink streaming reader
> > > > >    - [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect
> > for
> > > > >    immutable data
> > > > >    - [HUDI-2449] Incremental read for Flink
> > > > >    - [HUDI-2562] Embedded timeline server on JobManager
> > > > >
> > > > > *Current Blockers*
> > > > >
> > > > >    - [HUDI-1856] Upstream changes made in PrestoDB to eliminate
> file
> > > > >    listing to Trino (Owner: Sagar Sumit)
> > > > >    - [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all
> > > Hudi
> > > > >    tables (Owner: Sagar Sumit)
> > > > >    - [HUDI-1932] Hive Sync should not always update
> > > last_commit_time_sync
> > > > >    (Owner: Raymond Xu)
> > > > >    - [HUDI-1937] When clustering fail, generating unfinished
> > > replacecommit
> > > > >    timeline. (Owner: Sagar Sumit)
> > > > >    - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar
> > > Sumit)
> > > > >    - [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning
> > Ding)
> > > > >    - [HUDI-2325] Implement and test Hive Sync support for Kafka
> > Connect
> > > > >    (Owner: Rajesh Mahindra)
> > > > >    - [HUDI-2332] Implement scheduling of compaction/ clustering for
> > > Kafka
> > > > >    Connect (Owner: Ethan Guo)
> > > > >    - [HUDI-2362] Hudi external configuration file support (Owner:
> > > Wenning
> > > > >    Ding)
> > > > >    - [HUDI-2409] Using HBase shaded jars in Hudi presto bundle
> > (Owner:
> > > > >    Sagar Sumit)
> > > > >    - [HUDI-2443] KVComparator in HFile for metadata table is tied
> to
> > > HBase
> > > > >    version and shading (Owner: Sagar Sumit)
> > > > >    - [HUDI-2472] Tests failure follow up when metadata is enabled
> by
> > > > >    default (Owner: Manoj Govindassamy)
> > > > >    - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 &
> enabling
> > > > >    metadata (Owner: Manoj Govindassamy)
> > > > >    - [HUDI-2478] Handle failure mid-way during init buckets (Owner:
> > > Vinoth
> > > > >    Chandar)
> > > > >    - [HUDI-2480] FileSlice after pending compaction-requested
> > > instant-time
> > > > >    is ignored by MOR snapshot reader (Owner: Danny Chen)
> > > > >    - [HUDI-2488] Support bootstrapping a single or more partitions
> in
> > > > >    metadata table while regular writers and table services are in
> > > progress
> > > > >    (Owner: Vinoth Chandar)
> > > > >    - [HUDI-2527] Flaky test:
> > > > >
> > > > >
> > >
> >
> TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
> > > > >    (Owner: sivabalan narayanan)
> > > > >    - [HUDI-2559] Ensure unique timestamps are generated for commit
> > > times
> > > > >    with concurrent writers (Owner: sivabalan narayanan)
> > > > >    - [HUDI-2593] Virtual keys support for metadata table (Owner:
> > Manoj
> > > > >    Govindassamy)
> > > > >    - [HUDI-2599] [Performance] Lower parallelism with snapshot
> query
> > > on COW
> > > > >    tables in Presto (Owner: Sagar Sumit)
> > > > >    - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
> > > > >    - [HUDI-2636] Make release notes discoverable (Owner: Kyle
> Weller)
> > > > >    - [HUDI-2637] Triage all bugs around Multi-writer and certify
> the
> > > tested
> > > > >    flows (Owner: sivabalan narayanan)
> > > > >    - [HUDI-2641] One inflight commit rolling back other concurrent
> > > inflight
> > > > >    commits causing them to fail (Owner: Udit Mehrotra)
> > > > >    - [HUDI-2649] Kick off all the Hive query issues for 0.10.0
> > (Owner:
> > > > >    Sagar Sumit)
> > > > >    - [HUDI-2666] async compaction failing with timeline mismatches
> > > between
> > > > >    server and client when metadata is enabled (Owner: Manoj
> > > Govindassamy)
> > > > >    - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to
> partitions
> > > in
> > > > >    AbstractTablefileSystemView (Owner: Sagar Sumit)
> > > > >    - [HUDI-2671] Fix record offset handling in Kafka connect
> > > transaction
> > > > >    participant (Owner: Rajesh Mahindra)
> > > > >    - [HUDI-2672] Avoid empty commits and rollbacks when there is no
> > > event
> > > > >    from the topic (Owner: Rajesh Mahindra)
> > > > >    - [HUDI-2716] Fix InLineFS path conversions for S3FS paths
> (Owner:
> > > Manoj
> > > > >    Govindassamy)
> > > > >    - [HUDI-2725] Add precommit validators doc (Owner: Kyle Weller)
> > > > >    - [HUDI-2731] Clustering should work regardless of whether there
> > are
> > > > >    base files (Owner: Sagar Sumit)
> > > > >    - [HUDI-2734] Disable metadata by default for flink and java
> > (Owner:
> > > > >    sivabalan narayanan)
> > > > >    - [HUDI-2735] Fix archival of commits in Java client for Kafka
> > > Connect
> > > > >    (Owner: Ethan Guo)
> > > > >    - [HUDI-2737] Use earliest instant by default for compaction and
> > > > >    clustering job (Owner: Ethan Guo)
> > > > >    - [HUDI-2741] Validate metadata config for all readers (Owner:
> > Sagar
> > > > >    Sumit)
> > > > >    - [HUDI-2745] Record count does not match input after compaction
> > is
> > > > >    scheduled when running Hudi Kafka Connect sink (Owner: Ethan
> Guo)
> > > > >    - [HUDI-2762] Ensure hive can query insert only logs in MOR
> > (Owner:
> > > agar
> > > > >    Sumit)
> > > > >    - [HUDI-2763] Avoid persisting redundant key field in the
> Metadata
> > > table
> > > > >    record payload (Owner: Manoj Govindassamy)
> > > > >    - [HUDI-2764] Address test failures after enabling virtual keys
> > > support
> > > > >    for the metadata table (Owner: Manoj Govindassamy)
> > > > >    - [HUDI-2766] Enable marker based rollback by default (Owner:
> > > sivabalan
> > > > >    narayanan)
> > > > >    - [HUDI-2767] Enable timeline server based marker type as
> default
> > > > >    (Owner: sivabalan narayanan)
> > > > >    - [HUDI-2770] Update docs for HoodieCompactor (compaction) and
> > > > >    HoodieClusteringJob (clustering) (Owner: Kyle Weller)
> > > > >
> > > > >
> > > > > Please respond to the thread if you think that I have missed
> > capturing
> > > any
> > > > > of the highlights or blockers for Hudi 0.10.0 release. For the
> owners
> > > of
> > > > > these release blockers, can you please provide a specific timeline
> > you
> > > are
> > > > > willing to commit to for finishing these so we can cut an RC ?
> > > > >
> > > > > Thanks,
> > > > > Danny
> > > > >
> > >
> >
> >
> > --
> > Take Care,
> > Rajesh Mahindra
> >
>


-- 
Regards,
-Sivabalan

Re: [DISCUSS] Hudi 0.10.0 Release

Posted by Y Ethan Guo <et...@gmail.com>.
Hi Danny,

Thanks for summarizing the current progress towards the 0.10.0 release.
I'm good with Nov 26th cutoff.

Regarding my blockers:
- [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
   Connect (Owner: Ethan Guo)
PR is up.  I'm addressing comments.

- [HUDI-2737] Use earliest instant by default for compaction and
   clustering job (Owner: Ethan Guo)
PR is up and approved.  It's near-landing after fixing CI failures.

- [HUDI-2745] Record count does not match input after compaction is
   scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
HUDI-2745 is going to be blocked on HUDI-2480, which is going to resolve
this issue once done.

- [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
   (Owner: Ethan Guo)
This is pending and requires investigation into the archival logic which is
not Kafka-connect specific.

Best,
- Ethan


On Fri, Nov 19, 2021 at 4:41 PM Rajesh Mahindra <rm...@gmail.com> wrote:

> Hi Danny,
>
> I have the following blockers that have a PR up. I am working on a PR for
> the Debezium Source. I am fine with Nov 26th as cut off.
>
>    - [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
>    (Owner: Rajesh Mahindra)
>    - [HUDI-2671] Fix record offset handling in Kafka connect transaction
>    participant (Owner: Rajesh Mahindra)
>    - [HUDI-2672] Avoid empty commits and rollbacks when there is no event
>    from the topic (Owner: Rajesh Mahindra)
>
> ** Pending
>    - [HUDI-1290] Implement Debezium avro source for Delta Streamer
>
> Thanks
> Rajesh
>
>
> On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra <ud...@apache.org> wrote:
>
> > Hi Danny,
> >
> > I have a blocker as well
> > https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut off date
> > works fine for me.
> >
> > Also, just an update on the above list: HUDI-2641, HUDI-2314,
> > HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be marked
> > in the highlights section. We will work on getting some doc updates
> > for the same by next week.
> >
> > Thanks,
> > Udit
> >
> > On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar <vi...@apache.org>
> wrote:
> > >
> > > Hi Danny,
> > >
> > > I have one blocker. I plan to complete it by end of next week. I am
> good
> > > with the prior Nov 26 cutoff.
> > > Does that work for everyone?
> > >
> > > Thanks
> > > Vinoth
> > >
> > > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <da...@apache.org>
> > wrote:
> > >
> > > > Hi Community,
> > > >
> > > > As we draw close to doing Hudi 0.10.0 release, I am happy to share a
> > > > summary of the key features/improvements that would be going in the
> > release
> > > > and the current blockers for everyone's visibility.
> > > >
> > > > *Highlights*
> > > >
> > > >    - [HUDI-1290] Implement Debezium avro source for Delta Streamer
> > > >    - [HUDI-1491] Support partition pruning for MOR snapshot query
> > > >    - [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering
> > value
> > > >    when records within multiple log files are merged
> > > >    - [HUDI-1827] Add ORC support in Bootstrap Op
> > > >    - [HUDI-1869] Upgrading Spark3 To 3.1
> > > >    - [HUDI-2101] support z-order for hudi
> > > >    - [HUDI-2276] Enable Metadata Table by default for both writers
> and
> > > >    readers
> > > >    - [HUDI-2581] Analyze metadata size estimate in hudi with Hfile
> for
> > col
> > > >    stats partition
> > > >    - [HUDI-2634] Improve bootstrap performance for very large tables
> > > >    - [HUDI-2086] redo the logical of mor_incremental_view for hive
> > > >    - [HUDI-2191] Bump flink version to 1.13.1
> > > >    - [HUDI-2285] Metadata Table Synchronous Design
> > > >    - [HUDI-2316] Support Flink batch upsert
> > > >    - [HUDI-2371] Improve flink streaming reader
> > > >    - [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect
> for
> > > >    immutable data
> > > >    - [HUDI-2449] Incremental read for Flink
> > > >    - [HUDI-2562] Embedded timeline server on JobManager
> > > >
> > > > *Current Blockers*
> > > >
> > > >    - [HUDI-1856] Upstream changes made in PrestoDB to eliminate file
> > > >    listing to Trino (Owner: Sagar Sumit)
> > > >    - [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all
> > Hudi
> > > >    tables (Owner: Sagar Sumit)
> > > >    - [HUDI-1932] Hive Sync should not always update
> > last_commit_time_sync
> > > >    (Owner: Raymond Xu)
> > > >    - [HUDI-1937] When clustering fail, generating unfinished
> > replacecommit
> > > >    timeline. (Owner: Sagar Sumit)
> > > >    - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar
> > Sumit)
> > > >    - [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning
> Ding)
> > > >    - [HUDI-2325] Implement and test Hive Sync support for Kafka
> Connect
> > > >    (Owner: Rajesh Mahindra)
> > > >    - [HUDI-2332] Implement scheduling of compaction/ clustering for
> > Kafka
> > > >    Connect (Owner: Ethan Guo)
> > > >    - [HUDI-2362] Hudi external configuration file support (Owner:
> > Wenning
> > > >    Ding)
> > > >    - [HUDI-2409] Using HBase shaded jars in Hudi presto bundle
> (Owner:
> > > >    Sagar Sumit)
> > > >    - [HUDI-2443] KVComparator in HFile for metadata table is tied to
> > HBase
> > > >    version and shading (Owner: Sagar Sumit)
> > > >    - [HUDI-2472] Tests failure follow up when metadata is enabled by
> > > >    default (Owner: Manoj Govindassamy)
> > > >    - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
> > > >    metadata (Owner: Manoj Govindassamy)
> > > >    - [HUDI-2478] Handle failure mid-way during init buckets (Owner:
> > Vinoth
> > > >    Chandar)
> > > >    - [HUDI-2480] FileSlice after pending compaction-requested
> > instant-time
> > > >    is ignored by MOR snapshot reader (Owner: Danny Chen)
> > > >    - [HUDI-2488] Support bootstrapping a single or more partitions in
> > > >    metadata table while regular writers and table services are in
> > progress
> > > >    (Owner: Vinoth Chandar)
> > > >    - [HUDI-2527] Flaky test:
> > > >
> > > >
> >
> TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
> > > >    (Owner: sivabalan narayanan)
> > > >    - [HUDI-2559] Ensure unique timestamps are generated for commit
> > times
> > > >    with concurrent writers (Owner: sivabalan narayanan)
> > > >    - [HUDI-2593] Virtual keys support for metadata table (Owner:
> Manoj
> > > >    Govindassamy)
> > > >    - [HUDI-2599] [Performance] Lower parallelism with snapshot query
> > on COW
> > > >    tables in Presto (Owner: Sagar Sumit)
> > > >    - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
> > > >    - [HUDI-2636] Make release notes discoverable (Owner: Kyle Weller)
> > > >    - [HUDI-2637] Triage all bugs around Multi-writer and certify the
> > tested
> > > >    flows (Owner: sivabalan narayanan)
> > > >    - [HUDI-2641] One inflight commit rolling back other concurrent
> > inflight
> > > >    commits causing them to fail (Owner: Udit Mehrotra)
> > > >    - [HUDI-2649] Kick off all the Hive query issues for 0.10.0
> (Owner:
> > > >    Sagar Sumit)
> > > >    - [HUDI-2666] async compaction failing with timeline mismatches
> > between
> > > >    server and client when metadata is enabled (Owner: Manoj
> > Govindassamy)
> > > >    - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to partitions
> > in
> > > >    AbstractTablefileSystemView (Owner: Sagar Sumit)
> > > >    - [HUDI-2671] Fix record offset handling in Kafka connect
> > transaction
> > > >    participant (Owner: Rajesh Mahindra)
> > > >    - [HUDI-2672] Avoid empty commits and rollbacks when there is no
> > event
> > > >    from the topic (Owner: Rajesh Mahindra)
> > > >    - [HUDI-2716] Fix InLineFS path conversions for S3FS paths (Owner:
> > Manoj
> > > >    Govindassamy)
> > > >    - [HUDI-2725] Add precommit validators doc (Owner: Kyle Weller)
> > > >    - [HUDI-2731] Clustering should work regardless of whether there
> are
> > > >    base files (Owner: Sagar Sumit)
> > > >    - [HUDI-2734] Disable metadata by default for flink and java
> (Owner:
> > > >    sivabalan narayanan)
> > > >    - [HUDI-2735] Fix archival of commits in Java client for Kafka
> > Connect
> > > >    (Owner: Ethan Guo)
> > > >    - [HUDI-2737] Use earliest instant by default for compaction and
> > > >    clustering job (Owner: Ethan Guo)
> > > >    - [HUDI-2741] Validate metadata config for all readers (Owner:
> Sagar
> > > >    Sumit)
> > > >    - [HUDI-2745] Record count does not match input after compaction
> is
> > > >    scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
> > > >    - [HUDI-2762] Ensure hive can query insert only logs in MOR
> (Owner:
> > agar
> > > >    Sumit)
> > > >    - [HUDI-2763] Avoid persisting redundant key field in the Metadata
> > table
> > > >    record payload (Owner: Manoj Govindassamy)
> > > >    - [HUDI-2764] Address test failures after enabling virtual keys
> > support
> > > >    for the metadata table (Owner: Manoj Govindassamy)
> > > >    - [HUDI-2766] Enable marker based rollback by default (Owner:
> > sivabalan
> > > >    narayanan)
> > > >    - [HUDI-2767] Enable timeline server based marker type as default
> > > >    (Owner: sivabalan narayanan)
> > > >    - [HUDI-2770] Update docs for HoodieCompactor (compaction) and
> > > >    HoodieClusteringJob (clustering) (Owner: Kyle Weller)
> > > >
> > > >
> > > > Please respond to the thread if you think that I have missed
> capturing
> > any
> > > > of the highlights or blockers for Hudi 0.10.0 release. For the owners
> > of
> > > > these release blockers, can you please provide a specific timeline
> you
> > are
> > > > willing to commit to for finishing these so we can cut an RC ?
> > > >
> > > > Thanks,
> > > > Danny
> > > >
> >
>
>
> --
> Take Care,
> Rajesh Mahindra
>

Re: [DISCUSS] Hudi 0.10.0 Release

Posted by Rajesh Mahindra <rm...@gmail.com>.
Hi Danny,

I have the following blockers that have a PR up. I am working on a PR for
the Debezium Source. I am fine with Nov 26th as cut off.

   - [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
   (Owner: Rajesh Mahindra)
   - [HUDI-2671] Fix record offset handling in Kafka connect transaction
   participant (Owner: Rajesh Mahindra)
   - [HUDI-2672] Avoid empty commits and rollbacks when there is no event
   from the topic (Owner: Rajesh Mahindra)

** Pending
   - [HUDI-1290] Implement Debezium avro source for Delta Streamer

Thanks
Rajesh


On Fri, Nov 19, 2021 at 4:01 PM Udit Mehrotra <ud...@apache.org> wrote:

> Hi Danny,
>
> I have a blocker as well
> https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut off date
> works fine for me.
>
> Also, just an update on the above list: HUDI-2641, HUDI-2314,
> HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be marked
> in the highlights section. We will work on getting some doc updates
> for the same by next week.
>
> Thanks,
> Udit
>
> On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar <vi...@apache.org> wrote:
> >
> > Hi Danny,
> >
> > I have one blocker. I plan to complete it by end of next week. I am good
> > with the prior Nov 26 cutoff.
> > Does that work for everyone?
> >
> > Thanks
> > Vinoth
> >
> > On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <da...@apache.org>
> wrote:
> >
> > > Hi Community,
> > >
> > > As we draw close to doing Hudi 0.10.0 release, I am happy to share a
> > > summary of the key features/improvements that would be going in the
> release
> > > and the current blockers for everyone's visibility.
> > >
> > > *Highlights*
> > >
> > >    - [HUDI-1290] Implement Debezium avro source for Delta Streamer
> > >    - [HUDI-1491] Support partition pruning for MOR snapshot query
> > >    - [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering
> value
> > >    when records within multiple log files are merged
> > >    - [HUDI-1827] Add ORC support in Bootstrap Op
> > >    - [HUDI-1869] Upgrading Spark3 To 3.1
> > >    - [HUDI-2101] support z-order for hudi
> > >    - [HUDI-2276] Enable Metadata Table by default for both writers and
> > >    readers
> > >    - [HUDI-2581] Analyze metadata size estimate in hudi with Hfile for
> col
> > >    stats partition
> > >    - [HUDI-2634] Improve bootstrap performance for very large tables
> > >    - [HUDI-2086] redo the logical of mor_incremental_view for hive
> > >    - [HUDI-2191] Bump flink version to 1.13.1
> > >    - [HUDI-2285] Metadata Table Synchronous Design
> > >    - [HUDI-2316] Support Flink batch upsert
> > >    - [HUDI-2371] Improve flink streaming reader
> > >    - [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect for
> > >    immutable data
> > >    - [HUDI-2449] Incremental read for Flink
> > >    - [HUDI-2562] Embedded timeline server on JobManager
> > >
> > > *Current Blockers*
> > >
> > >    - [HUDI-1856] Upstream changes made in PrestoDB to eliminate file
> > >    listing to Trino (Owner: Sagar Sumit)
> > >    - [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all
> Hudi
> > >    tables (Owner: Sagar Sumit)
> > >    - [HUDI-1932] Hive Sync should not always update
> last_commit_time_sync
> > >    (Owner: Raymond Xu)
> > >    - [HUDI-1937] When clustering fail, generating unfinished
> replacecommit
> > >    timeline. (Owner: Sagar Sumit)
> > >    - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar
> Sumit)
> > >    - [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning Ding)
> > >    - [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
> > >    (Owner: Rajesh Mahindra)
> > >    - [HUDI-2332] Implement scheduling of compaction/ clustering for
> Kafka
> > >    Connect (Owner: Ethan Guo)
> > >    - [HUDI-2362] Hudi external configuration file support (Owner:
> Wenning
> > >    Ding)
> > >    - [HUDI-2409] Using HBase shaded jars in Hudi presto bundle (Owner:
> > >    Sagar Sumit)
> > >    - [HUDI-2443] KVComparator in HFile for metadata table is tied to
> HBase
> > >    version and shading (Owner: Sagar Sumit)
> > >    - [HUDI-2472] Tests failure follow up when metadata is enabled by
> > >    default (Owner: Manoj Govindassamy)
> > >    - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
> > >    metadata (Owner: Manoj Govindassamy)
> > >    - [HUDI-2478] Handle failure mid-way during init buckets (Owner:
> Vinoth
> > >    Chandar)
> > >    - [HUDI-2480] FileSlice after pending compaction-requested
> instant-time
> > >    is ignored by MOR snapshot reader (Owner: Danny Chen)
> > >    - [HUDI-2488] Support bootstrapping a single or more partitions in
> > >    metadata table while regular writers and table services are in
> progress
> > >    (Owner: Vinoth Chandar)
> > >    - [HUDI-2527] Flaky test:
> > >
> > >
> TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
> > >    (Owner: sivabalan narayanan)
> > >    - [HUDI-2559] Ensure unique timestamps are generated for commit
> times
> > >    with concurrent writers (Owner: sivabalan narayanan)
> > >    - [HUDI-2593] Virtual keys support for metadata table (Owner: Manoj
> > >    Govindassamy)
> > >    - [HUDI-2599] [Performance] Lower parallelism with snapshot query
> on COW
> > >    tables in Presto (Owner: Sagar Sumit)
> > >    - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
> > >    - [HUDI-2636] Make release notes discoverable (Owner: Kyle Weller)
> > >    - [HUDI-2637] Triage all bugs around Multi-writer and certify the
> tested
> > >    flows (Owner: sivabalan narayanan)
> > >    - [HUDI-2641] One inflight commit rolling back other concurrent
> inflight
> > >    commits causing them to fail (Owner: Udit Mehrotra)
> > >    - [HUDI-2649] Kick off all the Hive query issues for 0.10.0 (Owner:
> > >    Sagar Sumit)
> > >    - [HUDI-2666] async compaction failing with timeline mismatches
> between
> > >    server and client when metadata is enabled (Owner: Manoj
> Govindassamy)
> > >    - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to partitions
> in
> > >    AbstractTablefileSystemView (Owner: Sagar Sumit)
> > >    - [HUDI-2671] Fix record offset handling in Kafka connect
> transaction
> > >    participant (Owner: Rajesh Mahindra)
> > >    - [HUDI-2672] Avoid empty commits and rollbacks when there is no
> event
> > >    from the topic (Owner: Rajesh Mahindra)
> > >    - [HUDI-2716] Fix InLineFS path conversions for S3FS paths (Owner:
> Manoj
> > >    Govindassamy)
> > >    - [HUDI-2725] Add precommit validators doc (Owner: Kyle Weller)
> > >    - [HUDI-2731] Clustering should work regardless of whether there are
> > >    base files (Owner: Sagar Sumit)
> > >    - [HUDI-2734] Disable metadata by default for flink and java (Owner:
> > >    sivabalan narayanan)
> > >    - [HUDI-2735] Fix archival of commits in Java client for Kafka
> Connect
> > >    (Owner: Ethan Guo)
> > >    - [HUDI-2737] Use earliest instant by default for compaction and
> > >    clustering job (Owner: Ethan Guo)
> > >    - [HUDI-2741] Validate metadata config for all readers (Owner: Sagar
> > >    Sumit)
> > >    - [HUDI-2745] Record count does not match input after compaction is
> > >    scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
> > >    - [HUDI-2762] Ensure hive can query insert only logs in MOR (Owner:
> agar
> > >    Sumit)
> > >    - [HUDI-2763] Avoid persisting redundant key field in the Metadata
> table
> > >    record payload (Owner: Manoj Govindassamy)
> > >    - [HUDI-2764] Address test failures after enabling virtual keys
> support
> > >    for the metadata table (Owner: Manoj Govindassamy)
> > >    - [HUDI-2766] Enable marker based rollback by default (Owner:
> sivabalan
> > >    narayanan)
> > >    - [HUDI-2767] Enable timeline server based marker type as default
> > >    (Owner: sivabalan narayanan)
> > >    - [HUDI-2770] Update docs for HoodieCompactor (compaction) and
> > >    HoodieClusteringJob (clustering) (Owner: Kyle Weller)
> > >
> > >
> > > Please respond to the thread if you think that I have missed capturing
> any
> > > of the highlights or blockers for Hudi 0.10.0 release. For the owners
> of
> > > these release blockers, can you please provide a specific timeline you
> are
> > > willing to commit to for finishing these so we can cut an RC ?
> > >
> > > Thanks,
> > > Danny
> > >
>


-- 
Take Care,
Rajesh Mahindra

Re: [DISCUSS] Hudi 0.10.0 Release

Posted by Udit Mehrotra <ud...@apache.org>.
Hi Danny,

I have a blocker as well
https://issues.apache.org/jira/browse/HUDI-2802. Nov 26th cut off date
works fine for me.

Also, just an update on the above list: HUDI-2641, HUDI-2314,
HUDI-2362 are unblocked/merged. HUDI-2314 and HUDI-2362 can be marked
in the highlights section. We will work on getting some doc updates
for the same by next week.

Thanks,
Udit

On Fri, Nov 19, 2021 at 3:49 PM Vinoth Chandar <vi...@apache.org> wrote:
>
> Hi Danny,
>
> I have one blocker. I plan to complete it by end of next week. I am good
> with the prior Nov 26 cutoff.
> Does that work for everyone?
>
> Thanks
> Vinoth
>
> On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <da...@apache.org> wrote:
>
> > Hi Community,
> >
> > As we draw close to doing Hudi 0.10.0 release, I am happy to share a
> > summary of the key features/improvements that would be going in the release
> > and the current blockers for everyone's visibility.
> >
> > *Highlights*
> >
> >    - [HUDI-1290] Implement Debezium avro source for Delta Streamer
> >    - [HUDI-1491] Support partition pruning for MOR snapshot query
> >    - [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering value
> >    when records within multiple log files are merged
> >    - [HUDI-1827] Add ORC support in Bootstrap Op
> >    - [HUDI-1869] Upgrading Spark3 To 3.1
> >    - [HUDI-2101] support z-order for hudi
> >    - [HUDI-2276] Enable Metadata Table by default for both writers and
> >    readers
> >    - [HUDI-2581] Analyze metadata size estimate in hudi with Hfile for col
> >    stats partition
> >    - [HUDI-2634] Improve bootstrap performance for very large tables
> >    - [HUDI-2086] redo the logical of mor_incremental_view for hive
> >    - [HUDI-2191] Bump flink version to 1.13.1
> >    - [HUDI-2285] Metadata Table Synchronous Design
> >    - [HUDI-2316] Support Flink batch upsert
> >    - [HUDI-2371] Improve flink streaming reader
> >    - [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect for
> >    immutable data
> >    - [HUDI-2449] Incremental read for Flink
> >    - [HUDI-2562] Embedded timeline server on JobManager
> >
> > *Current Blockers*
> >
> >    - [HUDI-1856] Upstream changes made in PrestoDB to eliminate file
> >    listing to Trino (Owner: Sagar Sumit)
> >    - [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all Hudi
> >    tables (Owner: Sagar Sumit)
> >    - [HUDI-1932] Hive Sync should not always update last_commit_time_sync
> >    (Owner: Raymond Xu)
> >    - [HUDI-1937] When clustering fail, generating unfinished replacecommit
> >    timeline. (Owner: Sagar Sumit)
> >    - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar Sumit)
> >    - [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning Ding)
> >    - [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
> >    (Owner: Rajesh Mahindra)
> >    - [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
> >    Connect (Owner: Ethan Guo)
> >    - [HUDI-2362] Hudi external configuration file support (Owner: Wenning
> >    Ding)
> >    - [HUDI-2409] Using HBase shaded jars in Hudi presto bundle (Owner:
> >    Sagar Sumit)
> >    - [HUDI-2443] KVComparator in HFile for metadata table is tied to HBase
> >    version and shading (Owner: Sagar Sumit)
> >    - [HUDI-2472] Tests failure follow up when metadata is enabled by
> >    default (Owner: Manoj Govindassamy)
> >    - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
> >    metadata (Owner: Manoj Govindassamy)
> >    - [HUDI-2478] Handle failure mid-way during init buckets (Owner: Vinoth
> >    Chandar)
> >    - [HUDI-2480] FileSlice after pending compaction-requested instant-time
> >    is ignored by MOR snapshot reader (Owner: Danny Chen)
> >    - [HUDI-2488] Support bootstrapping a single or more partitions in
> >    metadata table while regular writers and table services are in progress
> >    (Owner: Vinoth Chandar)
> >    - [HUDI-2527] Flaky test:
> >
> >  TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
> >    (Owner: sivabalan narayanan)
> >    - [HUDI-2559] Ensure unique timestamps are generated for commit times
> >    with concurrent writers (Owner: sivabalan narayanan)
> >    - [HUDI-2593] Virtual keys support for metadata table (Owner: Manoj
> >    Govindassamy)
> >    - [HUDI-2599] [Performance] Lower parallelism with snapshot query on COW
> >    tables in Presto (Owner: Sagar Sumit)
> >    - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
> >    - [HUDI-2636] Make release notes discoverable (Owner: Kyle Weller)
> >    - [HUDI-2637] Triage all bugs around Multi-writer and certify the tested
> >    flows (Owner: sivabalan narayanan)
> >    - [HUDI-2641] One inflight commit rolling back other concurrent inflight
> >    commits causing them to fail (Owner: Udit Mehrotra)
> >    - [HUDI-2649] Kick off all the Hive query issues for 0.10.0 (Owner:
> >    Sagar Sumit)
> >    - [HUDI-2666] async compaction failing with timeline mismatches between
> >    server and client when metadata is enabled (Owner: Manoj Govindassamy)
> >    - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to partitions in
> >    AbstractTablefileSystemView (Owner: Sagar Sumit)
> >    - [HUDI-2671] Fix record offset handling in Kafka connect transaction
> >    participant (Owner: Rajesh Mahindra)
> >    - [HUDI-2672] Avoid empty commits and rollbacks when there is no event
> >    from the topic (Owner: Rajesh Mahindra)
> >    - [HUDI-2716] Fix InLineFS path conversions for S3FS paths (Owner: Manoj
> >    Govindassamy)
> >    - [HUDI-2725] Add precommit validators doc (Owner: Kyle Weller)
> >    - [HUDI-2731] Clustering should work regardless of whether there are
> >    base files (Owner: Sagar Sumit)
> >    - [HUDI-2734] Disable metadata by default for flink and java (Owner:
> >    sivabalan narayanan)
> >    - [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
> >    (Owner: Ethan Guo)
> >    - [HUDI-2737] Use earliest instant by default for compaction and
> >    clustering job (Owner: Ethan Guo)
> >    - [HUDI-2741] Validate metadata config for all readers (Owner: Sagar
> >    Sumit)
> >    - [HUDI-2745] Record count does not match input after compaction is
> >    scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
> >    - [HUDI-2762] Ensure hive can query insert only logs in MOR (Owner: agar
> >    Sumit)
> >    - [HUDI-2763] Avoid persisting redundant key field in the Metadata table
> >    record payload (Owner: Manoj Govindassamy)
> >    - [HUDI-2764] Address test failures after enabling virtual keys support
> >    for the metadata table (Owner: Manoj Govindassamy)
> >    - [HUDI-2766] Enable marker based rollback by default (Owner: sivabalan
> >    narayanan)
> >    - [HUDI-2767] Enable timeline server based marker type as default
> >    (Owner: sivabalan narayanan)
> >    - [HUDI-2770] Update docs for HoodieCompactor (compaction) and
> >    HoodieClusteringJob (clustering) (Owner: Kyle Weller)
> >
> >
> > Please respond to the thread if you think that I have missed capturing any
> > of the highlights or blockers for Hudi 0.10.0 release. For the owners of
> > these release blockers, can you please provide a specific timeline you are
> > willing to commit to for finishing these so we can cut an RC ?
> >
> > Thanks,
> > Danny
> >

Re: [DISCUSS] Hudi 0.10.0 Release

Posted by Vinoth Chandar <vi...@apache.org>.
Hi Danny,

I have one blocker. I plan to complete it by end of next week. I am good
with the prior Nov 26 cutoff.
Does that work for everyone?

Thanks
Vinoth

On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <da...@apache.org> wrote:

> Hi Community,
>
> As we draw close to doing Hudi 0.10.0 release, I am happy to share a
> summary of the key features/improvements that would be going in the release
> and the current blockers for everyone's visibility.
>
> *Highlights*
>
>    - [HUDI-1290] Implement Debezium avro source for Delta Streamer
>    - [HUDI-1491] Support partition pruning for MOR snapshot query
>    - [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering value
>    when records within multiple log files are merged
>    - [HUDI-1827] Add ORC support in Bootstrap Op
>    - [HUDI-1869] Upgrading Spark3 To 3.1
>    - [HUDI-2101] support z-order for hudi
>    - [HUDI-2276] Enable Metadata Table by default for both writers and
>    readers
>    - [HUDI-2581] Analyze metadata size estimate in hudi with Hfile for col
>    stats partition
>    - [HUDI-2634] Improve bootstrap performance for very large tables
>    - [HUDI-2086] redo the logical of mor_incremental_view for hive
>    - [HUDI-2191] Bump flink version to 1.13.1
>    - [HUDI-2285] Metadata Table Synchronous Design
>    - [HUDI-2316] Support Flink batch upsert
>    - [HUDI-2371] Improve flink streaming reader
>    - [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect for
>    immutable data
>    - [HUDI-2449] Incremental read for Flink
>    - [HUDI-2562] Embedded timeline server on JobManager
>
> *Current Blockers*
>
>    - [HUDI-1856] Upstream changes made in PrestoDB to eliminate file
>    listing to Trino (Owner: Sagar Sumit)
>    - [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all Hudi
>    tables (Owner: Sagar Sumit)
>    - [HUDI-1932] Hive Sync should not always update last_commit_time_sync
>    (Owner: Raymond Xu)
>    - [HUDI-1937] When clustering fail, generating unfinished replacecommit
>    timeline. (Owner: Sagar Sumit)
>    - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar Sumit)
>    - [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning Ding)
>    - [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
>    (Owner: Rajesh Mahindra)
>    - [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
>    Connect (Owner: Ethan Guo)
>    - [HUDI-2362] Hudi external configuration file support (Owner: Wenning
>    Ding)
>    - [HUDI-2409] Using HBase shaded jars in Hudi presto bundle (Owner:
>    Sagar Sumit)
>    - [HUDI-2443] KVComparator in HFile for metadata table is tied to HBase
>    version and shading (Owner: Sagar Sumit)
>    - [HUDI-2472] Tests failure follow up when metadata is enabled by
>    default (Owner: Manoj Govindassamy)
>    - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
>    metadata (Owner: Manoj Govindassamy)
>    - [HUDI-2478] Handle failure mid-way during init buckets (Owner: Vinoth
>    Chandar)
>    - [HUDI-2480] FileSlice after pending compaction-requested instant-time
>    is ignored by MOR snapshot reader (Owner: Danny Chen)
>    - [HUDI-2488] Support bootstrapping a single or more partitions in
>    metadata table while regular writers and table services are in progress
>    (Owner: Vinoth Chandar)
>    - [HUDI-2527] Flaky test:
>
>  TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
>    (Owner: sivabalan narayanan)
>    - [HUDI-2559] Ensure unique timestamps are generated for commit times
>    with concurrent writers (Owner: sivabalan narayanan)
>    - [HUDI-2593] Virtual keys support for metadata table (Owner: Manoj
>    Govindassamy)
>    - [HUDI-2599] [Performance] Lower parallelism with snapshot query on COW
>    tables in Presto (Owner: Sagar Sumit)
>    - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
>    - [HUDI-2636] Make release notes discoverable (Owner: Kyle Weller)
>    - [HUDI-2637] Triage all bugs around Multi-writer and certify the tested
>    flows (Owner: sivabalan narayanan)
>    - [HUDI-2641] One inflight commit rolling back other concurrent inflight
>    commits causing them to fail (Owner: Udit Mehrotra)
>    - [HUDI-2649] Kick off all the Hive query issues for 0.10.0 (Owner:
>    Sagar Sumit)
>    - [HUDI-2666] async compaction failing with timeline mismatches between
>    server and client when metadata is enabled (Owner: Manoj Govindassamy)
>    - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to partitions in
>    AbstractTablefileSystemView (Owner: Sagar Sumit)
>    - [HUDI-2671] Fix record offset handling in Kafka connect transaction
>    participant (Owner: Rajesh Mahindra)
>    - [HUDI-2672] Avoid empty commits and rollbacks when there is no event
>    from the topic (Owner: Rajesh Mahindra)
>    - [HUDI-2716] Fix InLineFS path conversions for S3FS paths (Owner: Manoj
>    Govindassamy)
>    - [HUDI-2725] Add precommit validators doc (Owner: Kyle Weller)
>    - [HUDI-2731] Clustering should work regardless of whether there are
>    base files (Owner: Sagar Sumit)
>    - [HUDI-2734] Disable metadata by default for flink and java (Owner:
>    sivabalan narayanan)
>    - [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
>    (Owner: Ethan Guo)
>    - [HUDI-2737] Use earliest instant by default for compaction and
>    clustering job (Owner: Ethan Guo)
>    - [HUDI-2741] Validate metadata config for all readers (Owner: Sagar
>    Sumit)
>    - [HUDI-2745] Record count does not match input after compaction is
>    scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
>    - [HUDI-2762] Ensure hive can query insert only logs in MOR (Owner: agar
>    Sumit)
>    - [HUDI-2763] Avoid persisting redundant key field in the Metadata table
>    record payload (Owner: Manoj Govindassamy)
>    - [HUDI-2764] Address test failures after enabling virtual keys support
>    for the metadata table (Owner: Manoj Govindassamy)
>    - [HUDI-2766] Enable marker based rollback by default (Owner: sivabalan
>    narayanan)
>    - [HUDI-2767] Enable timeline server based marker type as default
>    (Owner: sivabalan narayanan)
>    - [HUDI-2770] Update docs for HoodieCompactor (compaction) and
>    HoodieClusteringJob (clustering) (Owner: Kyle Weller)
>
>
> Please respond to the thread if you think that I have missed capturing any
> of the highlights or blockers for Hudi 0.10.0 release. For the owners of
> these release blockers, can you please provide a specific timeline you are
> willing to commit to for finishing these so we can cut an RC ?
>
> Thanks,
> Danny
>

Re: [DISCUSS] Hudi 0.10.0 Release

Posted by Manoj Govindassamy <ma...@gmail.com>.
Hi Danny,

I am good with the Nov 26th cutoff as well. I am working on the below
in-progress items and have one other pending. For the rest all from the
list, PRs are out or landed. Thanks for compiling the list.

*InProgress:*
 - [HUDI-2763] Avoid persisting redundant key field in the Metadata table
   record payload (Owner: Manoj Govindassamy)
 - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
   metadata (Owner: Manoj Govindassamy)

*Pending:*
- [HUDI-2590] Validate Diff key gen w/ and w/o glob path with and w/o
metadata enabled

*Completed:*
 - [HUDI-2716] Fix InLineFS path conversions for S3FS paths (Owner: Manoj
   Govindassamy)
  - [HUDI-2593] Virtual keys support for metadata table (Owner: Manoj
   Govindassamy)
  - [HUDI-2472] Tests failure follow up when metadata is enabled by
   default (Owner: Manoj Govindassamy)
  - [HUDI-2666] async compaction failing with timeline mismatches between
   server and client when metadata is enabled (Owner: Manoj Govindassamy)
 - [HUDI-2764] Address test failures after enabling virtual keys support
   for the metadata table (Owner: Manoj Govindassamy)

On Fri, Nov 19, 2021 at 12:12 AM Danny Chan <da...@apache.org> wrote:

> Hi Community,
>
> As we draw close to doing Hudi 0.10.0 release, I am happy to share a
> summary of the key features/improvements that would be going in the release
> and the current blockers for everyone's visibility.
>
> *Highlights*
>
>    - [HUDI-1290] Implement Debezium avro source for Delta Streamer
>    - [HUDI-1491] Support partition pruning for MOR snapshot query
>    - [HUDI-1763] DefaultHoodieRecordPayload does not honor ordering value
>    when records within multiple log files are merged
>    - [HUDI-1827] Add ORC support in Bootstrap Op
>    - [HUDI-1869] Upgrading Spark3 To 3.1
>    - [HUDI-2101] support z-order for hudi
>    - [HUDI-2276] Enable Metadata Table by default for both writers and
>    readers
>    - [HUDI-2581] Analyze metadata size estimate in hudi with Hfile for col
>    stats partition
>    - [HUDI-2634] Improve bootstrap performance for very large tables
>    - [HUDI-2086] redo the logical of mor_incremental_view for hive
>    - [HUDI-2191] Bump flink version to 1.13.1
>    - [HUDI-2285] Metadata Table Synchronous Design
>    - [HUDI-2316] Support Flink batch upsert
>    - [HUDI-2371] Improve flink streaming reader
>    - [HUDI-2394] [Kafka Connect Mileston 1] Implement kafka connect for
>    immutable data
>    - [HUDI-2449] Incremental read for Flink
>    - [HUDI-2562] Embedded timeline server on JobManager
>
> *Current Blockers*
>
>    - [HUDI-1856] Upstream changes made in PrestoDB to eliminate file
>    listing to Trino (Owner: Sagar Sumit)
>    - [HUDI-1912] Presto defaults to GenericHiveRecordCursor for all Hudi
>    tables (Owner: Sagar Sumit)
>    - [HUDI-1932] Hive Sync should not always update last_commit_time_sync
>    (Owner: Raymond Xu)
>    - [HUDI-1937] When clustering fail, generating unfinished replacecommit
>    timeline. (Owner: Sagar Sumit)
>    - [HUDI-2077] Flaky test: TestHoodieDeltaStreamer (Owner: Sagar Sumit)
>    - [HUDI-2314] Add DynamoDb based lock provider (Owner: Wenning Ding)
>    - [HUDI-2325] Implement and test Hive Sync support for Kafka Connect
>    (Owner: Rajesh Mahindra)
>    - [HUDI-2332] Implement scheduling of compaction/ clustering for Kafka
>    Connect (Owner: Ethan Guo)
>    - [HUDI-2362] Hudi external configuration file support (Owner: Wenning
>    Ding)
>    - [HUDI-2409] Using HBase shaded jars in Hudi presto bundle (Owner:
>    Sagar Sumit)
>    - [HUDI-2443] KVComparator in HFile for metadata table is tied to HBase
>    version and shading (Owner: Sagar Sumit)
>    - [HUDI-2472] Tests failure follow up when metadata is enabled by
>    default (Owner: Manoj Govindassamy)
>    - [HUDI-2475] Rolling Upgrade downgrade story for 0.10 & enabling
>    metadata (Owner: Manoj Govindassamy)
>    - [HUDI-2478] Handle failure mid-way during init buckets (Owner: Vinoth
>    Chandar)
>    - [HUDI-2480] FileSlice after pending compaction-requested instant-time
>    is ignored by MOR snapshot reader (Owner: Danny Chen)
>    - [HUDI-2488] Support bootstrapping a single or more partitions in
>    metadata table while regular writers and table services are in progress
>    (Owner: Vinoth Chandar)
>    - [HUDI-2527] Flaky test:
>
>  TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
>    (Owner: sivabalan narayanan)
>    - [HUDI-2559] Ensure unique timestamps are generated for commit times
>    with concurrent writers (Owner: sivabalan narayanan)
>    - [HUDI-2593] Virtual keys support for metadata table (Owner: Manoj
>    Govindassamy)
>    - [HUDI-2599] [Performance] Lower parallelism with snapshot query on COW
>    tables in Presto (Owner: Sagar Sumit)
>    - [HUDI-2628] Fix Chinese Docs (Owner: Kyle Weller)
>    - [HUDI-2636] Make release notes discoverable (Owner: Kyle Weller)
>    - [HUDI-2637] Triage all bugs around Multi-writer and certify the tested
>    flows (Owner: sivabalan narayanan)
>    - [HUDI-2641] One inflight commit rolling back other concurrent inflight
>    commits causing them to fail (Owner: Udit Mehrotra)
>    - [HUDI-2649] Kick off all the Hive query issues for 0.10.0 (Owner:
>    Sagar Sumit)
>    - [HUDI-2666] async compaction failing with timeline mismatches between
>    server and client when metadata is enabled (Owner: Manoj Govindassamy)
>    - [HUDI-2667] Avoid fs.exists() and fs.mkdirs() call to partitions in
>    AbstractTablefileSystemView (Owner: Sagar Sumit)
>    - [HUDI-2671] Fix record offset handling in Kafka connect transaction
>    participant (Owner: Rajesh Mahindra)
>    - [HUDI-2672] Avoid empty commits and rollbacks when there is no event
>    from the topic (Owner: Rajesh Mahindra)
>    - [HUDI-2716] Fix InLineFS path conversions for S3FS paths (Owner: Manoj
>    Govindassamy)
>    - [HUDI-2725] Add precommit validators doc (Owner: Kyle Weller)
>    - [HUDI-2731] Clustering should work regardless of whether there are
>    base files (Owner: Sagar Sumit)
>    - [HUDI-2734] Disable metadata by default for flink and java (Owner:
>    sivabalan narayanan)
>    - [HUDI-2735] Fix archival of commits in Java client for Kafka Connect
>    (Owner: Ethan Guo)
>    - [HUDI-2737] Use earliest instant by default for compaction and
>    clustering job (Owner: Ethan Guo)
>    - [HUDI-2741] Validate metadata config for all readers (Owner: Sagar
>    Sumit)
>    - [HUDI-2745] Record count does not match input after compaction is
>    scheduled when running Hudi Kafka Connect sink (Owner: Ethan Guo)
>    - [HUDI-2762] Ensure hive can query insert only logs in MOR (Owner: agar
>    Sumit)
>    - [HUDI-2763] Avoid persisting redundant key field in the Metadata table
>    record payload (Owner: Manoj Govindassamy)
>    - [HUDI-2764] Address test failures after enabling virtual keys support
>    for the metadata table (Owner: Manoj Govindassamy)
>    - [HUDI-2766] Enable marker based rollback by default (Owner: sivabalan
>    narayanan)
>    - [HUDI-2767] Enable timeline server based marker type as default
>    (Owner: sivabalan narayanan)
>    - [HUDI-2770] Update docs for HoodieCompactor (compaction) and
>    HoodieClusteringJob (clustering) (Owner: Kyle Weller)
>
>
> Please respond to the thread if you think that I have missed capturing any
> of the highlights or blockers for Hudi 0.10.0 release. For the owners of
> these release blockers, can you please provide a specific timeline you are
> willing to commit to for finishing these so we can cut an RC ?
>
> Thanks,
> Danny
>