You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by Udit Mehrotra <ud...@apache.org> on 2021/08/04 00:12:45 UTC

[DISCUSS] Hudi 0.9.0 Release

Hi Community,

As we draw close to doing Hudi 0.9.0 release, I am happy to share a summary
of the key features/improvements that would be going in the release and the
current blockers for everyone's visibility.

*Highlights*

   - [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink
   writer
   - [HUDI-1738] Detect and emit deleted records for Flink MOR table
   streaming read
   - [HUDI-1867] Support streaming reads for Flink COW table
   - [HUDI-1908] Global index for flink writer
   - [HUDI-1788] Support Insert Overwrite with Flink Writer
   - [HUDI-2209] Bulk insert for flink writer
   - [HUDI-1591] Support querying using non-globbed paths for Hudi Spark
   DataSource queries
   - [HUDI-1591] Partition pruning support for read optimized queries via
   Hudi Spark DataSource
   - [HUDI-1415] Register Hudi Table as a Spark DataSource Table with
   metastore. Queries via Spark SQL will be routed through Hudi DataSource
   (instead of InputFormat), thus making it more performant due to Spark's
   native/optimized readers
   - [HUDI-1591] Partition pruning support for snapshot queries via Hudi
   Spark DataSource
   - [HUDI-1658] DML and DDL support via Spark SQL
   - [HUDI-1790] Add SqlSource for DeltaStreamer to support backfill use
   cases:
   - [HUDI-251] Add JDBC Source support for DeltaStreamer
   - [HUDI-1910] Support Kafka based checkpointing for HoodieDeltaStreamer
   - [HUDI-1371] Support metadata based listing for Spark DataSource and
   Spark SQL
   - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements to
   Metadata based listing
   - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to bring
   all configs under one roof
   - [HUDI-2124] Grafana dashboard for Hudi
   - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk Insert via
   row writing
   - [HUDI-1483] Async clustering for Delta Streamer
   - [HUDI-2235] Add virtual key support to Hudi
   - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
   - In addition, there have been significant improvements and bug fixes to
   improve the overall stability of Flink Hudi integration

*Current Blockers*

   - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
   - [HUDI-1256] Follow on improvements to HFile tables for metadata based
   listing (Owner: None)
   - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With Hudi
   (Owner: pengzhiwei)
   - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
   pengzhiwei)
   - [HUDI-1138] Re-implement marker files via timeline server (Owner:
   Ethan Guo)
   - [HUDI-1985] Website redesign implementation (Owner: Vinoth
   Govindarajan)
   - [HUDI-2232] MERGE INTO fails with table having nested struct (Owner:
   pengzhiwei)
   - [HUDI-1468] incremental read support with clustering (Owner: Liwei)
   - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner: None)
   - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar Sumit)
   - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner: Sagar
   Sumit)
   - [HUDI-1887] Setting default value to false for enabling schema post
   processor (Owner: Sivabalan)
   - [HUDI-1850] Fixing read of a empty table but with failed write (Owner:
   Sivabalan)
   - [HUDI-2151] Enable defaults for out of box performance (Owner: Udit
   Mehrotra)
   - [HUDI-2119] Ensure the rolled-back instance was previously synced to
   the Metadata Table when syncing a Rollback Instant (Owner: Prashant Wason)
   - [HUDI-1458] Support custom clustering strategies and preserve commit
   time to support incremental read (Owner: Satish Kotha)
   - [HUDI-1763] Fixing honoring of Ordering val in
   DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
   - [HUDI-1129] Improving schema evolution support in hudi (Owner:
   Sivabalan)
   - [HUDI-2120] [DOC] Update docs about schema in flink sql configuration
   (Owner: Xianghu Wang)
   - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
   pengzhiwei)

Please respond to the thread if you think that I have missed capturing any
of the highlights or blockers for Hudi 0.9.0 release. For the owners of
these release blockers, can you please provide a specific timeline you are
willing to commit to for finishing these so we can cut an RC ?

Thanks,
Udit

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Sivabalan <n....@gmail.com>.

Status update: all release blockers are landed. We are good to go ahead
with RC work.

On Fri, Aug 13, 2021 at 5:46 PM Udit Mehrotra <ud...@apache.org> wrote:

> Hi Community,
>
> Here is a quick update on 0.9.0 release status. Over the last 10 days we
> made significant progress on the release blockers previously mentioned in
> the thread, thanks to all the owners. Here are the remaining blockers the
> we are currently tracking:
>
>    - [HUDI-2305] Add MARKERS.type and fix marker-based rollback
>    - [HUDI-2268] Add upgrade and downgrade to and from 0.9.0
>    release-blockers
>    - [HUDI-2307] When using delete_partition with ds should not rely on the
>    primary key
>    - [HUDI-2151] Flipping defaults
>    - [HUDI-1897] Deltastreamer source for AWS S3
>    - [HUDI-2120] [DOC] Update docs about schema in flink sql configuration
>    - [HUDI-2119] Ensure the rolled-back instance was previously synced to
>    the Metadata Table when syncing a Rollback Instant.
>
> We plan to resolve these soon and cut a RC by *tomorrow (August 14th, 2021)
> end of day PST*. If you have any other blockers that you would like to
> surface for Hudi 0.9.0, feel free to reach out.
>
> Thanks,
> Udit
>
> On Fri, Aug 6, 2021 at 1:53 AM sagar sumit <sa...@gmail.com> wrote:
>
> > Hi Udit, Vinoth
> >
> > End of next week sounds good. Apart from the issues listed, there is one
> > more that we can take in this release:
> > [HUDI-1897] DeltaStreamer Source for AWS S3
> >
> > It's under review and should be closed by early next week.
> >
> > Regards,
> > Sagar
> >
> > On 2021/08/06 00:55:19, Raymond Xu <xu...@gmail.com> wrote:
> > > +1 End of next week
> > >
> > > On Thu, Aug 5, 2021 at 3:06 PM Sivabalan <n....@gmail.com> wrote:
> > >
> > > > Yeah, end of next week sounds good.
> > > >
> > > > Here are the status updates wrt patches I am involved.
> > > >
> > > >   Plan to get these in by early next week.
> > > >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner:
> pengzhiwei)
> > > >    - [HUDI-2250] Bulk insert support for tables w/ primary key
> (Owner:
> > > > Sivabalan)
> > > >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table
> (Owner:
> > > >    pengzhiwei)
> > > >    - [HUDI-1138] Re-implement marker files via timeline server
> (Owner:
> > > >    Ethan Guo)
> > > >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
> > > >    Sivabalan)
> > > >
> > > >    Mid next week:
> > > >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With
> > Hudi
> > > >    (Owner: pengzhiwei)
> > > >
> > > >   Waiting for reviews. Will try to get it in by early next week. If
> we
> > > > couldn't get this in, probably will skip this release.
> > > >    - [HUDI-1763] Fixing honoring of Ordering val in
> > > >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> > > >
> > > >    Removed from release blockers:
> > > >    - [HUDI-1887] Setting default value to false for enabling schema
> > post
> > > >    processor (Owner: Sivabalan)
> > > >    - [HUDI-1850] Fixing read of a empty table but with failed write
> > (Owner:
> > > >    Sivabalan)
> > > >
> > > >
> > > > On Thu, Aug 5, 2021 at 11:17 AM Vinoth Chandar <vi...@apache.org>
> > wrote:
> > > >
> > > > > Any other thoughts? Love to lock this date down sooner than later.
> > > > >
> > > > > Thanks
> > > > > Vinoth
> > > > >
> > > > > On Tue, Aug 3, 2021 at 11:35 PM Udit Mehrotra <ud...@apache.org>
> > wrote:
> > > > >
> > > > > > Agreed Vinoth. End of next week seems reasonable as a hard
> > deadline for
> > > > > > cutting the RC.
> > > > > >
> > > > > > If anyone thinks otherwise or needs more time, feel free to chime
> > in.
> > > > > >
> > > > > > On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar <vinoth@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > Thanks Udit! I propose we set end of next week as a hard
> > deadline for
> > > > > > > cutting the RC. Any thoughts?
> > > > > > >
> > > > > > > A good amount of progress is being made on these blockers, I
> > think.
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <
> uditme@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Community,
> > > > > > > >
> > > > > > > > As we draw close to doing Hudi 0.9.0 release, I am happy to
> > share a
> > > > > > > summary
> > > > > > > > of the key features/improvements that would be going in the
> > release
> > > > > and
> > > > > > > the
> > > > > > > > current blockers for everyone's visibility.
> > > > > > > >
> > > > > > > > *Highlights*
> > > > > > > >
> > > > > > > >    - [HUDI-1729] Asynchronous Hive sync and commits cleaning
> > for
> > > > > Flink
> > > > > > > >    writer
> > > > > > > >    - [HUDI-1738] Detect and emit deleted records for Flink
> MOR
> > > > table
> > > > > > > >    streaming read
> > > > > > > >    - [HUDI-1867] Support streaming reads for Flink COW table
> > > > > > > >    - [HUDI-1908] Global index for flink writer
> > > > > > > >    - [HUDI-1788] Support Insert Overwrite with Flink Writer
> > > > > > > >    - [HUDI-2209] Bulk insert for flink writer
> > > > > > > >    - [HUDI-1591] Support querying using non-globbed paths for
> > Hudi
> > > > > > Spark
> > > > > > > >    DataSource queries
> > > > > > > >    - [HUDI-1591] Partition pruning support for read optimized
> > > > queries
> > > > > > via
> > > > > > > >    Hudi Spark DataSource
> > > > > > > >    - [HUDI-1415] Register Hudi Table as a Spark DataSource
> > Table
> > > > with
> > > > > > > >    metastore. Queries via Spark SQL will be routed through
> Hudi
> > > > > > > DataSource
> > > > > > > >    (instead of InputFormat), thus making it more performant
> > due to
> > > > > > > Spark's
> > > > > > > >    native/optimized readers
> > > > > > > >    - [HUDI-1591] Partition pruning support for snapshot
> > queries via
> > > > > > Hudi
> > > > > > > >    Spark DataSource
> > > > > > > >    - [HUDI-1658] DML and DDL support via Spark SQL
> > > > > > > >    - [HUDI-1790] Add SqlSource for DeltaStreamer to support
> > > > backfill
> > > > > > use
> > > > > > > >    cases:
> > > > > > > >    - [HUDI-251] Add JDBC Source support for DeltaStreamer
> > > > > > > >    - [HUDI-1910] Support Kafka based checkpointing for
> > > > > > > HoodieDeltaStreamer
> > > > > > > >    - [HUDI-1371] Support metadata based listing for Spark
> > > > DataSource
> > > > > > and
> > > > > > > >    Spark SQL
> > > > > > > >    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016]
> > Improvements
> > > > to
> > > > > > > >    Metadata based listing
> > > > > > > >    - HUDI-89] Introduce a HoodieConfig/ConfigProperty
> > framework to
> > > > > > bring
> > > > > > > >    all configs under one roof
> > > > > > > >    - [HUDI-2124] Grafana dashboard for Hudi
> > > > > > > >    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk
> > > > Insert
> > > > > > via
> > > > > > > >    row writing
> > > > > > > >    - [HUDI-1483] Async clustering for Delta Streamer
> > > > > > > >    - [HUDI-2235] Add virtual key support to Hudi
> > > > > > > >    - [HUDI-1848] Add support for Hive Metastore in
> > Hive-sync-tool
> > > > > > > >    - In addition, there have been significant improvements
> and
> > bug
> > > > > > fixes
> > > > > > > to
> > > > > > > >    improve the overall stability of Flink Hudi integration
> > > > > > > >
> > > > > > > > *Current Blockers*
> > > > > > > >
> > > > > > > >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner:
> > > > > pengzhiwei)
> > > > > > > >    - [HUDI-1256] Follow on improvements to HFile tables for
> > > > metadata
> > > > > > > based
> > > > > > > >    listing (Owner: None)
> > > > > > > >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL)
> > integration
> > > > With
> > > > > > > Hudi
> > > > > > > >    (Owner: pengzhiwei)
> > > > > > > >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie
> Table
> > > > > (Owner:
> > > > > > > >    pengzhiwei)
> > > > > > > >    - [HUDI-1138] Re-implement marker files via timeline
> server
> > > > > (Owner:
> > > > > > > >    Ethan Guo)
> > > > > > > >    - [HUDI-1985] Website redesign implementation (Owner:
> Vinoth
> > > > > > > >    Govindarajan)
> > > > > > > >    - [HUDI-2232] MERGE INTO fails with table having nested
> > struct
> > > > > > (Owner:
> > > > > > > >    pengzhiwei)
> > > > > > > >    - [HUDI-1468] incremental read support with clustering
> > (Owner:
> > > > > > Liwei)
> > > > > > > >    - [HUDI-2250] Bulk insert support for tables w/ primary
> key
> > > > > (Owner:
> > > > > > > > None)
> > > > > > > >    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar
> > > > Sumit)
> > > > > > > >    - [HUDI-2221] [SQL] Functionality testing with Spark 2
> > (Owner:
> > > > > Sagar
> > > > > > > >    Sumit)
> > > > > > > >    - [HUDI-1887] Setting default value to false for enabling
> > schema
> > > > > > post
> > > > > > > >    processor (Owner: Sivabalan)
> > > > > > > >    - [HUDI-1850] Fixing read of a empty table but with failed
> > write
> > > > > > > (Owner:
> > > > > > > >    Sivabalan)
> > > > > > > >    - [HUDI-2151] Enable defaults for out of box performance
> > (Owner:
> > > > > > Udit
> > > > > > > >    Mehrotra)
> > > > > > > >    - [HUDI-2119] Ensure the rolled-back instance was
> previously
> > > > > synced
> > > > > > to
> > > > > > > >    the Metadata Table when syncing a Rollback Instant (Owner:
> > > > > Prashant
> > > > > > > > Wason)
> > > > > > > >    - [HUDI-1458] Support custom clustering strategies and
> > preserve
> > > > > > commit
> > > > > > > >    time to support incremental read (Owner: Satish Kotha)
> > > > > > > >    - [HUDI-1763] Fixing honoring of Ordering val in
> > > > > > > >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> > > > > > > >    - [HUDI-1129] Improving schema evolution support in hudi
> > (Owner:
> > > > > > > >    Sivabalan)
> > > > > > > >    - [HUDI-2120] [DOC] Update docs about schema in flink sql
> > > > > > > configuration
> > > > > > > >    (Owner: Xianghu Wang)
> > > > > > > >    - [HUDI-2182] Support Compaction Command For Spark Sql
> > (Owner:
> > > > > > > >    pengzhiwei)
> > > > > > > >
> > > > > > > > Please respond to the thread if you think that I have missed
> > > > > capturing
> > > > > > > any
> > > > > > > > of the highlights or blockers for Hudi 0.9.0 release. For the
> > > > owners
> > > > > of
> > > > > > > > these release blockers, can you please provide a specific
> > timeline
> > > > > you
> > > > > > > are
> > > > > > > > willing to commit to for finishing these so we can cut an RC
> ?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Udit
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > -Sivabalan
> > > >
> > >
> >
>
-- 
Regards,
-Sivabalan

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Udit Mehrotra <ud...@apache.org>.

Hi Community,

Here is a quick update on 0.9.0 release status. Over the last 10 days we
made significant progress on the release blockers previously mentioned in
the thread, thanks to all the owners. Here are the remaining blockers the
we are currently tracking:

   - [HUDI-2305] Add MARKERS.type and fix marker-based rollback
   - [HUDI-2268] Add upgrade and downgrade to and from 0.9.0
   release-blockers
   - [HUDI-2307] When using delete_partition with ds should not rely on the
   primary key
   - [HUDI-2151] Flipping defaults
   - [HUDI-1897] Deltastreamer source for AWS S3
   - [HUDI-2120] [DOC] Update docs about schema in flink sql configuration
   - [HUDI-2119] Ensure the rolled-back instance was previously synced to
   the Metadata Table when syncing a Rollback Instant.

We plan to resolve these soon and cut a RC by *tomorrow (August 14th, 2021)
end of day PST*. If you have any other blockers that you would like to
surface for Hudi 0.9.0, feel free to reach out.

Thanks,
Udit

On Fri, Aug 6, 2021 at 1:53 AM sagar sumit <sa...@gmail.com> wrote:

> Hi Udit, Vinoth
>
> End of next week sounds good. Apart from the issues listed, there is one
> more that we can take in this release:
> [HUDI-1897] DeltaStreamer Source for AWS S3
>
> It's under review and should be closed by early next week.
>
> Regards,
> Sagar
>
> On 2021/08/06 00:55:19, Raymond Xu <xu...@gmail.com> wrote:
> > +1 End of next week
> >
> > On Thu, Aug 5, 2021 at 3:06 PM Sivabalan <n....@gmail.com> wrote:
> >
> > > Yeah, end of next week sounds good.
> > >
> > > Here are the status updates wrt patches I am involved.
> > >
> > >   Plan to get these in by early next week.
> > >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
> > >    - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
> > > Sivabalan)
> > >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
> > >    pengzhiwei)
> > >    - [HUDI-1138] Re-implement marker files via timeline server (Owner:
> > >    Ethan Guo)
> > >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
> > >    Sivabalan)
> > >
> > >    Mid next week:
> > >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With
> Hudi
> > >    (Owner: pengzhiwei)
> > >
> > >   Waiting for reviews. Will try to get it in by early next week. If we
> > > couldn't get this in, probably will skip this release.
> > >    - [HUDI-1763] Fixing honoring of Ordering val in
> > >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> > >
> > >    Removed from release blockers:
> > >    - [HUDI-1887] Setting default value to false for enabling schema
> post
> > >    processor (Owner: Sivabalan)
> > >    - [HUDI-1850] Fixing read of a empty table but with failed write
> (Owner:
> > >    Sivabalan)
> > >
> > >
> > > On Thu, Aug 5, 2021 at 11:17 AM Vinoth Chandar <vi...@apache.org>
> wrote:
> > >
> > > > Any other thoughts? Love to lock this date down sooner than later.
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Tue, Aug 3, 2021 at 11:35 PM Udit Mehrotra <ud...@apache.org>
> wrote:
> > > >
> > > > > Agreed Vinoth. End of next week seems reasonable as a hard
> deadline for
> > > > > cutting the RC.
> > > > >
> > > > > If anyone thinks otherwise or needs more time, feel free to chime
> in.
> > > > >
> > > > > On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar <vi...@apache.org>
> > > wrote:
> > > > >
> > > > > > Thanks Udit! I propose we set end of next week as a hard
> deadline for
> > > > > > cutting the RC. Any thoughts?
> > > > > >
> > > > > > A good amount of progress is being made on these blockers, I
> think.
> > > > > >
> > > > > >
> > > > > > On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org>
> > > > wrote:
> > > > > >
> > > > > > > Hi Community,
> > > > > > >
> > > > > > > As we draw close to doing Hudi 0.9.0 release, I am happy to
> share a
> > > > > > summary
> > > > > > > of the key features/improvements that would be going in the
> release
> > > > and
> > > > > > the
> > > > > > > current blockers for everyone's visibility.
> > > > > > >
> > > > > > > *Highlights*
> > > > > > >
> > > > > > >    - [HUDI-1729] Asynchronous Hive sync and commits cleaning
> for
> > > > Flink
> > > > > > >    writer
> > > > > > >    - [HUDI-1738] Detect and emit deleted records for Flink MOR
> > > table
> > > > > > >    streaming read
> > > > > > >    - [HUDI-1867] Support streaming reads for Flink COW table
> > > > > > >    - [HUDI-1908] Global index for flink writer
> > > > > > >    - [HUDI-1788] Support Insert Overwrite with Flink Writer
> > > > > > >    - [HUDI-2209] Bulk insert for flink writer
> > > > > > >    - [HUDI-1591] Support querying using non-globbed paths for
> Hudi
> > > > > Spark
> > > > > > >    DataSource queries
> > > > > > >    - [HUDI-1591] Partition pruning support for read optimized
> > > queries
> > > > > via
> > > > > > >    Hudi Spark DataSource
> > > > > > >    - [HUDI-1415] Register Hudi Table as a Spark DataSource
> Table
> > > with
> > > > > > >    metastore. Queries via Spark SQL will be routed through Hudi
> > > > > > DataSource
> > > > > > >    (instead of InputFormat), thus making it more performant
> due to
> > > > > > Spark's
> > > > > > >    native/optimized readers
> > > > > > >    - [HUDI-1591] Partition pruning support for snapshot
> queries via
> > > > > Hudi
> > > > > > >    Spark DataSource
> > > > > > >    - [HUDI-1658] DML and DDL support via Spark SQL
> > > > > > >    - [HUDI-1790] Add SqlSource for DeltaStreamer to support
> > > backfill
> > > > > use
> > > > > > >    cases:
> > > > > > >    - [HUDI-251] Add JDBC Source support for DeltaStreamer
> > > > > > >    - [HUDI-1910] Support Kafka based checkpointing for
> > > > > > HoodieDeltaStreamer
> > > > > > >    - [HUDI-1371] Support metadata based listing for Spark
> > > DataSource
> > > > > and
> > > > > > >    Spark SQL
> > > > > > >    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016]
> Improvements
> > > to
> > > > > > >    Metadata based listing
> > > > > > >    - HUDI-89] Introduce a HoodieConfig/ConfigProperty
> framework to
> > > > > bring
> > > > > > >    all configs under one roof
> > > > > > >    - [HUDI-2124] Grafana dashboard for Hudi
> > > > > > >    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk
> > > Insert
> > > > > via
> > > > > > >    row writing
> > > > > > >    - [HUDI-1483] Async clustering for Delta Streamer
> > > > > > >    - [HUDI-2235] Add virtual key support to Hudi
> > > > > > >    - [HUDI-1848] Add support for Hive Metastore in
> Hive-sync-tool
> > > > > > >    - In addition, there have been significant improvements and
> bug
> > > > > fixes
> > > > > > to
> > > > > > >    improve the overall stability of Flink Hudi integration
> > > > > > >
> > > > > > > *Current Blockers*
> > > > > > >
> > > > > > >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner:
> > > > pengzhiwei)
> > > > > > >    - [HUDI-1256] Follow on improvements to HFile tables for
> > > metadata
> > > > > > based
> > > > > > >    listing (Owner: None)
> > > > > > >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL)
> integration
> > > With
> > > > > > Hudi
> > > > > > >    (Owner: pengzhiwei)
> > > > > > >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table
> > > > (Owner:
> > > > > > >    pengzhiwei)
> > > > > > >    - [HUDI-1138] Re-implement marker files via timeline server
> > > > (Owner:
> > > > > > >    Ethan Guo)
> > > > > > >    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
> > > > > > >    Govindarajan)
> > > > > > >    - [HUDI-2232] MERGE INTO fails with table having nested
> struct
> > > > > (Owner:
> > > > > > >    pengzhiwei)
> > > > > > >    - [HUDI-1468] incremental read support with clustering
> (Owner:
> > > > > Liwei)
> > > > > > >    - [HUDI-2250] Bulk insert support for tables w/ primary key
> > > > (Owner:
> > > > > > > None)
> > > > > > >    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar
> > > Sumit)
> > > > > > >    - [HUDI-2221] [SQL] Functionality testing with Spark 2
> (Owner:
> > > > Sagar
> > > > > > >    Sumit)
> > > > > > >    - [HUDI-1887] Setting default value to false for enabling
> schema
> > > > > post
> > > > > > >    processor (Owner: Sivabalan)
> > > > > > >    - [HUDI-1850] Fixing read of a empty table but with failed
> write
> > > > > > (Owner:
> > > > > > >    Sivabalan)
> > > > > > >    - [HUDI-2151] Enable defaults for out of box performance
> (Owner:
> > > > > Udit
> > > > > > >    Mehrotra)
> > > > > > >    - [HUDI-2119] Ensure the rolled-back instance was previously
> > > > synced
> > > > > to
> > > > > > >    the Metadata Table when syncing a Rollback Instant (Owner:
> > > > Prashant
> > > > > > > Wason)
> > > > > > >    - [HUDI-1458] Support custom clustering strategies and
> preserve
> > > > > commit
> > > > > > >    time to support incremental read (Owner: Satish Kotha)
> > > > > > >    - [HUDI-1763] Fixing honoring of Ordering val in
> > > > > > >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> > > > > > >    - [HUDI-1129] Improving schema evolution support in hudi
> (Owner:
> > > > > > >    Sivabalan)
> > > > > > >    - [HUDI-2120] [DOC] Update docs about schema in flink sql
> > > > > > configuration
> > > > > > >    (Owner: Xianghu Wang)
> > > > > > >    - [HUDI-2182] Support Compaction Command For Spark Sql
> (Owner:
> > > > > > >    pengzhiwei)
> > > > > > >
> > > > > > > Please respond to the thread if you think that I have missed
> > > > capturing
> > > > > > any
> > > > > > > of the highlights or blockers for Hudi 0.9.0 release. For the
> > > owners
> > > > of
> > > > > > > these release blockers, can you please provide a specific
> timeline
> > > > you
> > > > > > are
> > > > > > > willing to commit to for finishing these so we can cut an RC ?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Udit
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > -Sivabalan
> > >
> >
>

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by sagar sumit <sa...@gmail.com>.

Hi Udit, Vinoth

End of next week sounds good. Apart from the issues listed, there is one more that we can take in this release: 
[HUDI-1897] DeltaStreamer Source for AWS S3

It's under review and should be closed by early next week.

Regards,
Sagar

On 2021/08/06 00:55:19, Raymond Xu <xu...@gmail.com> wrote: 
> +1 End of next week
> 
> On Thu, Aug 5, 2021 at 3:06 PM Sivabalan <n....@gmail.com> wrote:
> 
> > Yeah, end of next week sounds good.
> >
> > Here are the status updates wrt patches I am involved.
> >
> >   Plan to get these in by early next week.
> >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
> >    - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
> > Sivabalan)
> >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
> >    pengzhiwei)
> >    - [HUDI-1138] Re-implement marker files via timeline server (Owner:
> >    Ethan Guo)
> >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
> >    Sivabalan)
> >
> >    Mid next week:
> >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With Hudi
> >    (Owner: pengzhiwei)
> >
> >   Waiting for reviews. Will try to get it in by early next week. If we
> > couldn't get this in, probably will skip this release.
> >    - [HUDI-1763] Fixing honoring of Ordering val in
> >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> >
> >    Removed from release blockers:
> >    - [HUDI-1887] Setting default value to false for enabling schema post
> >    processor (Owner: Sivabalan)
> >    - [HUDI-1850] Fixing read of a empty table but with failed write (Owner:
> >    Sivabalan)
> >
> >
> > On Thu, Aug 5, 2021 at 11:17 AM Vinoth Chandar <vi...@apache.org> wrote:
> >
> > > Any other thoughts? Love to lock this date down sooner than later.
> > >
> > > Thanks
> > > Vinoth
> > >
> > > On Tue, Aug 3, 2021 at 11:35 PM Udit Mehrotra <ud...@apache.org> wrote:
> > >
> > > > Agreed Vinoth. End of next week seems reasonable as a hard deadline for
> > > > cutting the RC.
> > > >
> > > > If anyone thinks otherwise or needs more time, feel free to chime in.
> > > >
> > > > On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar <vi...@apache.org>
> > wrote:
> > > >
> > > > > Thanks Udit! I propose we set end of next week as a hard deadline for
> > > > > cutting the RC. Any thoughts?
> > > > >
> > > > > A good amount of progress is being made on these blockers, I think.
> > > > >
> > > > >
> > > > > On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org>
> > > wrote:
> > > > >
> > > > > > Hi Community,
> > > > > >
> > > > > > As we draw close to doing Hudi 0.9.0 release, I am happy to share a
> > > > > summary
> > > > > > of the key features/improvements that would be going in the release
> > > and
> > > > > the
> > > > > > current blockers for everyone's visibility.
> > > > > >
> > > > > > *Highlights*
> > > > > >
> > > > > >    - [HUDI-1729] Asynchronous Hive sync and commits cleaning for
> > > Flink
> > > > > >    writer
> > > > > >    - [HUDI-1738] Detect and emit deleted records for Flink MOR
> > table
> > > > > >    streaming read
> > > > > >    - [HUDI-1867] Support streaming reads for Flink COW table
> > > > > >    - [HUDI-1908] Global index for flink writer
> > > > > >    - [HUDI-1788] Support Insert Overwrite with Flink Writer
> > > > > >    - [HUDI-2209] Bulk insert for flink writer
> > > > > >    - [HUDI-1591] Support querying using non-globbed paths for Hudi
> > > > Spark
> > > > > >    DataSource queries
> > > > > >    - [HUDI-1591] Partition pruning support for read optimized
> > queries
> > > > via
> > > > > >    Hudi Spark DataSource
> > > > > >    - [HUDI-1415] Register Hudi Table as a Spark DataSource Table
> > with
> > > > > >    metastore. Queries via Spark SQL will be routed through Hudi
> > > > > DataSource
> > > > > >    (instead of InputFormat), thus making it more performant due to
> > > > > Spark's
> > > > > >    native/optimized readers
> > > > > >    - [HUDI-1591] Partition pruning support for snapshot queries via
> > > > Hudi
> > > > > >    Spark DataSource
> > > > > >    - [HUDI-1658] DML and DDL support via Spark SQL
> > > > > >    - [HUDI-1790] Add SqlSource for DeltaStreamer to support
> > backfill
> > > > use
> > > > > >    cases:
> > > > > >    - [HUDI-251] Add JDBC Source support for DeltaStreamer
> > > > > >    - [HUDI-1910] Support Kafka based checkpointing for
> > > > > HoodieDeltaStreamer
> > > > > >    - [HUDI-1371] Support metadata based listing for Spark
> > DataSource
> > > > and
> > > > > >    Spark SQL
> > > > > >    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements
> > to
> > > > > >    Metadata based listing
> > > > > >    - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to
> > > > bring
> > > > > >    all configs under one roof
> > > > > >    - [HUDI-2124] Grafana dashboard for Hudi
> > > > > >    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk
> > Insert
> > > > via
> > > > > >    row writing
> > > > > >    - [HUDI-1483] Async clustering for Delta Streamer
> > > > > >    - [HUDI-2235] Add virtual key support to Hudi
> > > > > >    - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
> > > > > >    - In addition, there have been significant improvements and bug
> > > > fixes
> > > > > to
> > > > > >    improve the overall stability of Flink Hudi integration
> > > > > >
> > > > > > *Current Blockers*
> > > > > >
> > > > > >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner:
> > > pengzhiwei)
> > > > > >    - [HUDI-1256] Follow on improvements to HFile tables for
> > metadata
> > > > > based
> > > > > >    listing (Owner: None)
> > > > > >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration
> > With
> > > > > Hudi
> > > > > >    (Owner: pengzhiwei)
> > > > > >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table
> > > (Owner:
> > > > > >    pengzhiwei)
> > > > > >    - [HUDI-1138] Re-implement marker files via timeline server
> > > (Owner:
> > > > > >    Ethan Guo)
> > > > > >    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
> > > > > >    Govindarajan)
> > > > > >    - [HUDI-2232] MERGE INTO fails with table having nested struct
> > > > (Owner:
> > > > > >    pengzhiwei)
> > > > > >    - [HUDI-1468] incremental read support with clustering (Owner:
> > > > Liwei)
> > > > > >    - [HUDI-2250] Bulk insert support for tables w/ primary key
> > > (Owner:
> > > > > > None)
> > > > > >    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar
> > Sumit)
> > > > > >    - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner:
> > > Sagar
> > > > > >    Sumit)
> > > > > >    - [HUDI-1887] Setting default value to false for enabling schema
> > > > post
> > > > > >    processor (Owner: Sivabalan)
> > > > > >    - [HUDI-1850] Fixing read of a empty table but with failed write
> > > > > (Owner:
> > > > > >    Sivabalan)
> > > > > >    - [HUDI-2151] Enable defaults for out of box performance (Owner:
> > > > Udit
> > > > > >    Mehrotra)
> > > > > >    - [HUDI-2119] Ensure the rolled-back instance was previously
> > > synced
> > > > to
> > > > > >    the Metadata Table when syncing a Rollback Instant (Owner:
> > > Prashant
> > > > > > Wason)
> > > > > >    - [HUDI-1458] Support custom clustering strategies and preserve
> > > > commit
> > > > > >    time to support incremental read (Owner: Satish Kotha)
> > > > > >    - [HUDI-1763] Fixing honoring of Ordering val in
> > > > > >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> > > > > >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
> > > > > >    Sivabalan)
> > > > > >    - [HUDI-2120] [DOC] Update docs about schema in flink sql
> > > > > configuration
> > > > > >    (Owner: Xianghu Wang)
> > > > > >    - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
> > > > > >    pengzhiwei)
> > > > > >
> > > > > > Please respond to the thread if you think that I have missed
> > > capturing
> > > > > any
> > > > > > of the highlights or blockers for Hudi 0.9.0 release. For the
> > owners
> > > of
> > > > > > these release blockers, can you please provide a specific timeline
> > > you
> > > > > are
> > > > > > willing to commit to for finishing these so we can cut an RC ?
> > > > > >
> > > > > > Thanks,
> > > > > > Udit
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Regards,
> > -Sivabalan
> >
>

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Raymond Xu <xu...@gmail.com>.

+1 End of next week

On Thu, Aug 5, 2021 at 3:06 PM Sivabalan <n....@gmail.com> wrote:

> Yeah, end of next week sounds good.
>
> Here are the status updates wrt patches I am involved.
>
>   Plan to get these in by early next week.
>    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
>    - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
> Sivabalan)
>    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
>    pengzhiwei)
>    - [HUDI-1138] Re-implement marker files via timeline server (Owner:
>    Ethan Guo)
>    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
>    Sivabalan)
>
>    Mid next week:
>    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With Hudi
>    (Owner: pengzhiwei)
>
>   Waiting for reviews. Will try to get it in by early next week. If we
> couldn't get this in, probably will skip this release.
>    - [HUDI-1763] Fixing honoring of Ordering val in
>    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
>
>    Removed from release blockers:
>    - [HUDI-1887] Setting default value to false for enabling schema post
>    processor (Owner: Sivabalan)
>    - [HUDI-1850] Fixing read of a empty table but with failed write (Owner:
>    Sivabalan)
>
>
> On Thu, Aug 5, 2021 at 11:17 AM Vinoth Chandar <vi...@apache.org> wrote:
>
> > Any other thoughts? Love to lock this date down sooner than later.
> >
> > Thanks
> > Vinoth
> >
> > On Tue, Aug 3, 2021 at 11:35 PM Udit Mehrotra <ud...@apache.org> wrote:
> >
> > > Agreed Vinoth. End of next week seems reasonable as a hard deadline for
> > > cutting the RC.
> > >
> > > If anyone thinks otherwise or needs more time, feel free to chime in.
> > >
> > > On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar <vi...@apache.org>
> wrote:
> > >
> > > > Thanks Udit! I propose we set end of next week as a hard deadline for
> > > > cutting the RC. Any thoughts?
> > > >
> > > > A good amount of progress is being made on these blockers, I think.
> > > >
> > > >
> > > > On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org>
> > wrote:
> > > >
> > > > > Hi Community,
> > > > >
> > > > > As we draw close to doing Hudi 0.9.0 release, I am happy to share a
> > > > summary
> > > > > of the key features/improvements that would be going in the release
> > and
> > > > the
> > > > > current blockers for everyone's visibility.
> > > > >
> > > > > *Highlights*
> > > > >
> > > > >    - [HUDI-1729] Asynchronous Hive sync and commits cleaning for
> > Flink
> > > > >    writer
> > > > >    - [HUDI-1738] Detect and emit deleted records for Flink MOR
> table
> > > > >    streaming read
> > > > >    - [HUDI-1867] Support streaming reads for Flink COW table
> > > > >    - [HUDI-1908] Global index for flink writer
> > > > >    - [HUDI-1788] Support Insert Overwrite with Flink Writer
> > > > >    - [HUDI-2209] Bulk insert for flink writer
> > > > >    - [HUDI-1591] Support querying using non-globbed paths for Hudi
> > > Spark
> > > > >    DataSource queries
> > > > >    - [HUDI-1591] Partition pruning support for read optimized
> queries
> > > via
> > > > >    Hudi Spark DataSource
> > > > >    - [HUDI-1415] Register Hudi Table as a Spark DataSource Table
> with
> > > > >    metastore. Queries via Spark SQL will be routed through Hudi
> > > > DataSource
> > > > >    (instead of InputFormat), thus making it more performant due to
> > > > Spark's
> > > > >    native/optimized readers
> > > > >    - [HUDI-1591] Partition pruning support for snapshot queries via
> > > Hudi
> > > > >    Spark DataSource
> > > > >    - [HUDI-1658] DML and DDL support via Spark SQL
> > > > >    - [HUDI-1790] Add SqlSource for DeltaStreamer to support
> backfill
> > > use
> > > > >    cases:
> > > > >    - [HUDI-251] Add JDBC Source support for DeltaStreamer
> > > > >    - [HUDI-1910] Support Kafka based checkpointing for
> > > > HoodieDeltaStreamer
> > > > >    - [HUDI-1371] Support metadata based listing for Spark
> DataSource
> > > and
> > > > >    Spark SQL
> > > > >    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements
> to
> > > > >    Metadata based listing
> > > > >    - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to
> > > bring
> > > > >    all configs under one roof
> > > > >    - [HUDI-2124] Grafana dashboard for Hudi
> > > > >    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk
> Insert
> > > via
> > > > >    row writing
> > > > >    - [HUDI-1483] Async clustering for Delta Streamer
> > > > >    - [HUDI-2235] Add virtual key support to Hudi
> > > > >    - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
> > > > >    - In addition, there have been significant improvements and bug
> > > fixes
> > > > to
> > > > >    improve the overall stability of Flink Hudi integration
> > > > >
> > > > > *Current Blockers*
> > > > >
> > > > >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner:
> > pengzhiwei)
> > > > >    - [HUDI-1256] Follow on improvements to HFile tables for
> metadata
> > > > based
> > > > >    listing (Owner: None)
> > > > >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration
> With
> > > > Hudi
> > > > >    (Owner: pengzhiwei)
> > > > >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table
> > (Owner:
> > > > >    pengzhiwei)
> > > > >    - [HUDI-1138] Re-implement marker files via timeline server
> > (Owner:
> > > > >    Ethan Guo)
> > > > >    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
> > > > >    Govindarajan)
> > > > >    - [HUDI-2232] MERGE INTO fails with table having nested struct
> > > (Owner:
> > > > >    pengzhiwei)
> > > > >    - [HUDI-1468] incremental read support with clustering (Owner:
> > > Liwei)
> > > > >    - [HUDI-2250] Bulk insert support for tables w/ primary key
> > (Owner:
> > > > > None)
> > > > >    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar
> Sumit)
> > > > >    - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner:
> > Sagar
> > > > >    Sumit)
> > > > >    - [HUDI-1887] Setting default value to false for enabling schema
> > > post
> > > > >    processor (Owner: Sivabalan)
> > > > >    - [HUDI-1850] Fixing read of a empty table but with failed write
> > > > (Owner:
> > > > >    Sivabalan)
> > > > >    - [HUDI-2151] Enable defaults for out of box performance (Owner:
> > > Udit
> > > > >    Mehrotra)
> > > > >    - [HUDI-2119] Ensure the rolled-back instance was previously
> > synced
> > > to
> > > > >    the Metadata Table when syncing a Rollback Instant (Owner:
> > Prashant
> > > > > Wason)
> > > > >    - [HUDI-1458] Support custom clustering strategies and preserve
> > > commit
> > > > >    time to support incremental read (Owner: Satish Kotha)
> > > > >    - [HUDI-1763] Fixing honoring of Ordering val in
> > > > >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> > > > >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
> > > > >    Sivabalan)
> > > > >    - [HUDI-2120] [DOC] Update docs about schema in flink sql
> > > > configuration
> > > > >    (Owner: Xianghu Wang)
> > > > >    - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
> > > > >    pengzhiwei)
> > > > >
> > > > > Please respond to the thread if you think that I have missed
> > capturing
> > > > any
> > > > > of the highlights or blockers for Hudi 0.9.0 release. For the
> owners
> > of
> > > > > these release blockers, can you please provide a specific timeline
> > you
> > > > are
> > > > > willing to commit to for finishing these so we can cut an RC ?
> > > > >
> > > > > Thanks,
> > > > > Udit
> > > > >
> > > >
> > >
> >
>
>
> --
> Regards,
> -Sivabalan
>

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Raymond Xu <xu...@gmail.com>.

+1 End of next week

On Thu, Aug 5, 2021 at 3:06 PM Sivabalan <n....@gmail.com> wrote:

> Yeah, end of next week sounds good.
>
> Here are the status updates wrt patches I am involved.
>
>   Plan to get these in by early next week.
>    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
>    - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
> Sivabalan)
>    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
>    pengzhiwei)
>    - [HUDI-1138] Re-implement marker files via timeline server (Owner:
>    Ethan Guo)
>    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
>    Sivabalan)
>
>    Mid next week:
>    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With Hudi
>    (Owner: pengzhiwei)
>
>   Waiting for reviews. Will try to get it in by early next week. If we
> couldn't get this in, probably will skip this release.
>    - [HUDI-1763] Fixing honoring of Ordering val in
>    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
>
>    Removed from release blockers:
>    - [HUDI-1887] Setting default value to false for enabling schema post
>    processor (Owner: Sivabalan)
>    - [HUDI-1850] Fixing read of a empty table but with failed write (Owner:
>    Sivabalan)
>
>
> On Thu, Aug 5, 2021 at 11:17 AM Vinoth Chandar <vi...@apache.org> wrote:
>
> > Any other thoughts? Love to lock this date down sooner than later.
> >
> > Thanks
> > Vinoth
> >
> > On Tue, Aug 3, 2021 at 11:35 PM Udit Mehrotra <ud...@apache.org> wrote:
> >
> > > Agreed Vinoth. End of next week seems reasonable as a hard deadline for
> > > cutting the RC.
> > >
> > > If anyone thinks otherwise or needs more time, feel free to chime in.
> > >
> > > On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar <vi...@apache.org>
> wrote:
> > >
> > > > Thanks Udit! I propose we set end of next week as a hard deadline for
> > > > cutting the RC. Any thoughts?
> > > >
> > > > A good amount of progress is being made on these blockers, I think.
> > > >
> > > >
> > > > On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org>
> > wrote:
> > > >
> > > > > Hi Community,
> > > > >
> > > > > As we draw close to doing Hudi 0.9.0 release, I am happy to share a
> > > > summary
> > > > > of the key features/improvements that would be going in the release
> > and
> > > > the
> > > > > current blockers for everyone's visibility.
> > > > >
> > > > > *Highlights*
> > > > >
> > > > >    - [HUDI-1729] Asynchronous Hive sync and commits cleaning for
> > Flink
> > > > >    writer
> > > > >    - [HUDI-1738] Detect and emit deleted records for Flink MOR
> table
> > > > >    streaming read
> > > > >    - [HUDI-1867] Support streaming reads for Flink COW table
> > > > >    - [HUDI-1908] Global index for flink writer
> > > > >    - [HUDI-1788] Support Insert Overwrite with Flink Writer
> > > > >    - [HUDI-2209] Bulk insert for flink writer
> > > > >    - [HUDI-1591] Support querying using non-globbed paths for Hudi
> > > Spark
> > > > >    DataSource queries
> > > > >    - [HUDI-1591] Partition pruning support for read optimized
> queries
> > > via
> > > > >    Hudi Spark DataSource
> > > > >    - [HUDI-1415] Register Hudi Table as a Spark DataSource Table
> with
> > > > >    metastore. Queries via Spark SQL will be routed through Hudi
> > > > DataSource
> > > > >    (instead of InputFormat), thus making it more performant due to
> > > > Spark's
> > > > >    native/optimized readers
> > > > >    - [HUDI-1591] Partition pruning support for snapshot queries via
> > > Hudi
> > > > >    Spark DataSource
> > > > >    - [HUDI-1658] DML and DDL support via Spark SQL
> > > > >    - [HUDI-1790] Add SqlSource for DeltaStreamer to support
> backfill
> > > use
> > > > >    cases:
> > > > >    - [HUDI-251] Add JDBC Source support for DeltaStreamer
> > > > >    - [HUDI-1910] Support Kafka based checkpointing for
> > > > HoodieDeltaStreamer
> > > > >    - [HUDI-1371] Support metadata based listing for Spark
> DataSource
> > > and
> > > > >    Spark SQL
> > > > >    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements
> to
> > > > >    Metadata based listing
> > > > >    - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to
> > > bring
> > > > >    all configs under one roof
> > > > >    - [HUDI-2124] Grafana dashboard for Hudi
> > > > >    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk
> Insert
> > > via
> > > > >    row writing
> > > > >    - [HUDI-1483] Async clustering for Delta Streamer
> > > > >    - [HUDI-2235] Add virtual key support to Hudi
> > > > >    - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
> > > > >    - In addition, there have been significant improvements and bug
> > > fixes
> > > > to
> > > > >    improve the overall stability of Flink Hudi integration
> > > > >
> > > > > *Current Blockers*
> > > > >
> > > > >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner:
> > pengzhiwei)
> > > > >    - [HUDI-1256] Follow on improvements to HFile tables for
> metadata
> > > > based
> > > > >    listing (Owner: None)
> > > > >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration
> With
> > > > Hudi
> > > > >    (Owner: pengzhiwei)
> > > > >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table
> > (Owner:
> > > > >    pengzhiwei)
> > > > >    - [HUDI-1138] Re-implement marker files via timeline server
> > (Owner:
> > > > >    Ethan Guo)
> > > > >    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
> > > > >    Govindarajan)
> > > > >    - [HUDI-2232] MERGE INTO fails with table having nested struct
> > > (Owner:
> > > > >    pengzhiwei)
> > > > >    - [HUDI-1468] incremental read support with clustering (Owner:
> > > Liwei)
> > > > >    - [HUDI-2250] Bulk insert support for tables w/ primary key
> > (Owner:
> > > > > None)
> > > > >    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar
> Sumit)
> > > > >    - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner:
> > Sagar
> > > > >    Sumit)
> > > > >    - [HUDI-1887] Setting default value to false for enabling schema
> > > post
> > > > >    processor (Owner: Sivabalan)
> > > > >    - [HUDI-1850] Fixing read of a empty table but with failed write
> > > > (Owner:
> > > > >    Sivabalan)
> > > > >    - [HUDI-2151] Enable defaults for out of box performance (Owner:
> > > Udit
> > > > >    Mehrotra)
> > > > >    - [HUDI-2119] Ensure the rolled-back instance was previously
> > synced
> > > to
> > > > >    the Metadata Table when syncing a Rollback Instant (Owner:
> > Prashant
> > > > > Wason)
> > > > >    - [HUDI-1458] Support custom clustering strategies and preserve
> > > commit
> > > > >    time to support incremental read (Owner: Satish Kotha)
> > > > >    - [HUDI-1763] Fixing honoring of Ordering val in
> > > > >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> > > > >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
> > > > >    Sivabalan)
> > > > >    - [HUDI-2120] [DOC] Update docs about schema in flink sql
> > > > configuration
> > > > >    (Owner: Xianghu Wang)
> > > > >    - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
> > > > >    pengzhiwei)
> > > > >
> > > > > Please respond to the thread if you think that I have missed
> > capturing
> > > > any
> > > > > of the highlights or blockers for Hudi 0.9.0 release. For the
> owners
> > of
> > > > > these release blockers, can you please provide a specific timeline
> > you
> > > > are
> > > > > willing to commit to for finishing these so we can cut an RC ?
> > > > >
> > > > > Thanks,
> > > > > Udit
> > > > >
> > > >
> > >
> >
>
>
> --
> Regards,
> -Sivabalan
>

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Sivabalan <n....@gmail.com>.

Yeah, end of next week sounds good.

Here are the status updates wrt patches I am involved.

  Plan to get these in by early next week.
   - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
   - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
Sivabalan)
   - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
   pengzhiwei)
   - [HUDI-1138] Re-implement marker files via timeline server (Owner:
   Ethan Guo)
   - [HUDI-1129] Improving schema evolution support in hudi (Owner:
   Sivabalan)

   Mid next week:
   - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With Hudi
   (Owner: pengzhiwei)

  Waiting for reviews. Will try to get it in by early next week. If we
couldn't get this in, probably will skip this release.
   - [HUDI-1763] Fixing honoring of Ordering val in
   DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)

   Removed from release blockers:
   - [HUDI-1887] Setting default value to false for enabling schema post
   processor (Owner: Sivabalan)
   - [HUDI-1850] Fixing read of a empty table but with failed write (Owner:
   Sivabalan)


On Thu, Aug 5, 2021 at 11:17 AM Vinoth Chandar <vi...@apache.org> wrote:

> Any other thoughts? Love to lock this date down sooner than later.
>
> Thanks
> Vinoth
>
> On Tue, Aug 3, 2021 at 11:35 PM Udit Mehrotra <ud...@apache.org> wrote:
>
> > Agreed Vinoth. End of next week seems reasonable as a hard deadline for
> > cutting the RC.
> >
> > If anyone thinks otherwise or needs more time, feel free to chime in.
> >
> > On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar <vi...@apache.org> wrote:
> >
> > > Thanks Udit! I propose we set end of next week as a hard deadline for
> > > cutting the RC. Any thoughts?
> > >
> > > A good amount of progress is being made on these blockers, I think.
> > >
> > >
> > > On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org>
> wrote:
> > >
> > > > Hi Community,
> > > >
> > > > As we draw close to doing Hudi 0.9.0 release, I am happy to share a
> > > summary
> > > > of the key features/improvements that would be going in the release
> and
> > > the
> > > > current blockers for everyone's visibility.
> > > >
> > > > *Highlights*
> > > >
> > > >    - [HUDI-1729] Asynchronous Hive sync and commits cleaning for
> Flink
> > > >    writer
> > > >    - [HUDI-1738] Detect and emit deleted records for Flink MOR table
> > > >    streaming read
> > > >    - [HUDI-1867] Support streaming reads for Flink COW table
> > > >    - [HUDI-1908] Global index for flink writer
> > > >    - [HUDI-1788] Support Insert Overwrite with Flink Writer
> > > >    - [HUDI-2209] Bulk insert for flink writer
> > > >    - [HUDI-1591] Support querying using non-globbed paths for Hudi
> > Spark
> > > >    DataSource queries
> > > >    - [HUDI-1591] Partition pruning support for read optimized queries
> > via
> > > >    Hudi Spark DataSource
> > > >    - [HUDI-1415] Register Hudi Table as a Spark DataSource Table with
> > > >    metastore. Queries via Spark SQL will be routed through Hudi
> > > DataSource
> > > >    (instead of InputFormat), thus making it more performant due to
> > > Spark's
> > > >    native/optimized readers
> > > >    - [HUDI-1591] Partition pruning support for snapshot queries via
> > Hudi
> > > >    Spark DataSource
> > > >    - [HUDI-1658] DML and DDL support via Spark SQL
> > > >    - [HUDI-1790] Add SqlSource for DeltaStreamer to support backfill
> > use
> > > >    cases:
> > > >    - [HUDI-251] Add JDBC Source support for DeltaStreamer
> > > >    - [HUDI-1910] Support Kafka based checkpointing for
> > > HoodieDeltaStreamer
> > > >    - [HUDI-1371] Support metadata based listing for Spark DataSource
> > and
> > > >    Spark SQL
> > > >    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements to
> > > >    Metadata based listing
> > > >    - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to
> > bring
> > > >    all configs under one roof
> > > >    - [HUDI-2124] Grafana dashboard for Hudi
> > > >    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk Insert
> > via
> > > >    row writing
> > > >    - [HUDI-1483] Async clustering for Delta Streamer
> > > >    - [HUDI-2235] Add virtual key support to Hudi
> > > >    - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
> > > >    - In addition, there have been significant improvements and bug
> > fixes
> > > to
> > > >    improve the overall stability of Flink Hudi integration
> > > >
> > > > *Current Blockers*
> > > >
> > > >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner:
> pengzhiwei)
> > > >    - [HUDI-1256] Follow on improvements to HFile tables for metadata
> > > based
> > > >    listing (Owner: None)
> > > >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With
> > > Hudi
> > > >    (Owner: pengzhiwei)
> > > >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table
> (Owner:
> > > >    pengzhiwei)
> > > >    - [HUDI-1138] Re-implement marker files via timeline server
> (Owner:
> > > >    Ethan Guo)
> > > >    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
> > > >    Govindarajan)
> > > >    - [HUDI-2232] MERGE INTO fails with table having nested struct
> > (Owner:
> > > >    pengzhiwei)
> > > >    - [HUDI-1468] incremental read support with clustering (Owner:
> > Liwei)
> > > >    - [HUDI-2250] Bulk insert support for tables w/ primary key
> (Owner:
> > > > None)
> > > >    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar Sumit)
> > > >    - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner:
> Sagar
> > > >    Sumit)
> > > >    - [HUDI-1887] Setting default value to false for enabling schema
> > post
> > > >    processor (Owner: Sivabalan)
> > > >    - [HUDI-1850] Fixing read of a empty table but with failed write
> > > (Owner:
> > > >    Sivabalan)
> > > >    - [HUDI-2151] Enable defaults for out of box performance (Owner:
> > Udit
> > > >    Mehrotra)
> > > >    - [HUDI-2119] Ensure the rolled-back instance was previously
> synced
> > to
> > > >    the Metadata Table when syncing a Rollback Instant (Owner:
> Prashant
> > > > Wason)
> > > >    - [HUDI-1458] Support custom clustering strategies and preserve
> > commit
> > > >    time to support incremental read (Owner: Satish Kotha)
> > > >    - [HUDI-1763] Fixing honoring of Ordering val in
> > > >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> > > >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
> > > >    Sivabalan)
> > > >    - [HUDI-2120] [DOC] Update docs about schema in flink sql
> > > configuration
> > > >    (Owner: Xianghu Wang)
> > > >    - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
> > > >    pengzhiwei)
> > > >
> > > > Please respond to the thread if you think that I have missed
> capturing
> > > any
> > > > of the highlights or blockers for Hudi 0.9.0 release. For the owners
> of
> > > > these release blockers, can you please provide a specific timeline
> you
> > > are
> > > > willing to commit to for finishing these so we can cut an RC ?
> > > >
> > > > Thanks,
> > > > Udit
> > > >
> > >
> >
>


-- 
Regards,
-Sivabalan

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Sivabalan <n....@gmail.com>.

Yeah, end of next week sounds good.

Here are the status updates wrt patches I am involved.

  Plan to get these in by early next week.
   - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
   - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
Sivabalan)
   - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
   pengzhiwei)
   - [HUDI-1138] Re-implement marker files via timeline server (Owner:
   Ethan Guo)
   - [HUDI-1129] Improving schema evolution support in hudi (Owner:
   Sivabalan)

   Mid next week:
   - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With Hudi
   (Owner: pengzhiwei)

  Waiting for reviews. Will try to get it in by early next week. If we
couldn't get this in, probably will skip this release.
   - [HUDI-1763] Fixing honoring of Ordering val in
   DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)

   Removed from release blockers:
   - [HUDI-1887] Setting default value to false for enabling schema post
   processor (Owner: Sivabalan)
   - [HUDI-1850] Fixing read of a empty table but with failed write (Owner:
   Sivabalan)


On Thu, Aug 5, 2021 at 11:17 AM Vinoth Chandar <vi...@apache.org> wrote:

> Any other thoughts? Love to lock this date down sooner than later.
>
> Thanks
> Vinoth
>
> On Tue, Aug 3, 2021 at 11:35 PM Udit Mehrotra <ud...@apache.org> wrote:
>
> > Agreed Vinoth. End of next week seems reasonable as a hard deadline for
> > cutting the RC.
> >
> > If anyone thinks otherwise or needs more time, feel free to chime in.
> >
> > On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar <vi...@apache.org> wrote:
> >
> > > Thanks Udit! I propose we set end of next week as a hard deadline for
> > > cutting the RC. Any thoughts?
> > >
> > > A good amount of progress is being made on these blockers, I think.
> > >
> > >
> > > On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org>
> wrote:
> > >
> > > > Hi Community,
> > > >
> > > > As we draw close to doing Hudi 0.9.0 release, I am happy to share a
> > > summary
> > > > of the key features/improvements that would be going in the release
> and
> > > the
> > > > current blockers for everyone's visibility.
> > > >
> > > > *Highlights*
> > > >
> > > >    - [HUDI-1729] Asynchronous Hive sync and commits cleaning for
> Flink
> > > >    writer
> > > >    - [HUDI-1738] Detect and emit deleted records for Flink MOR table
> > > >    streaming read
> > > >    - [HUDI-1867] Support streaming reads for Flink COW table
> > > >    - [HUDI-1908] Global index for flink writer
> > > >    - [HUDI-1788] Support Insert Overwrite with Flink Writer
> > > >    - [HUDI-2209] Bulk insert for flink writer
> > > >    - [HUDI-1591] Support querying using non-globbed paths for Hudi
> > Spark
> > > >    DataSource queries
> > > >    - [HUDI-1591] Partition pruning support for read optimized queries
> > via
> > > >    Hudi Spark DataSource
> > > >    - [HUDI-1415] Register Hudi Table as a Spark DataSource Table with
> > > >    metastore. Queries via Spark SQL will be routed through Hudi
> > > DataSource
> > > >    (instead of InputFormat), thus making it more performant due to
> > > Spark's
> > > >    native/optimized readers
> > > >    - [HUDI-1591] Partition pruning support for snapshot queries via
> > Hudi
> > > >    Spark DataSource
> > > >    - [HUDI-1658] DML and DDL support via Spark SQL
> > > >    - [HUDI-1790] Add SqlSource for DeltaStreamer to support backfill
> > use
> > > >    cases:
> > > >    - [HUDI-251] Add JDBC Source support for DeltaStreamer
> > > >    - [HUDI-1910] Support Kafka based checkpointing for
> > > HoodieDeltaStreamer
> > > >    - [HUDI-1371] Support metadata based listing for Spark DataSource
> > and
> > > >    Spark SQL
> > > >    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements to
> > > >    Metadata based listing
> > > >    - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to
> > bring
> > > >    all configs under one roof
> > > >    - [HUDI-2124] Grafana dashboard for Hudi
> > > >    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk Insert
> > via
> > > >    row writing
> > > >    - [HUDI-1483] Async clustering for Delta Streamer
> > > >    - [HUDI-2235] Add virtual key support to Hudi
> > > >    - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
> > > >    - In addition, there have been significant improvements and bug
> > fixes
> > > to
> > > >    improve the overall stability of Flink Hudi integration
> > > >
> > > > *Current Blockers*
> > > >
> > > >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner:
> pengzhiwei)
> > > >    - [HUDI-1256] Follow on improvements to HFile tables for metadata
> > > based
> > > >    listing (Owner: None)
> > > >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With
> > > Hudi
> > > >    (Owner: pengzhiwei)
> > > >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table
> (Owner:
> > > >    pengzhiwei)
> > > >    - [HUDI-1138] Re-implement marker files via timeline server
> (Owner:
> > > >    Ethan Guo)
> > > >    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
> > > >    Govindarajan)
> > > >    - [HUDI-2232] MERGE INTO fails with table having nested struct
> > (Owner:
> > > >    pengzhiwei)
> > > >    - [HUDI-1468] incremental read support with clustering (Owner:
> > Liwei)
> > > >    - [HUDI-2250] Bulk insert support for tables w/ primary key
> (Owner:
> > > > None)
> > > >    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar Sumit)
> > > >    - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner:
> Sagar
> > > >    Sumit)
> > > >    - [HUDI-1887] Setting default value to false for enabling schema
> > post
> > > >    processor (Owner: Sivabalan)
> > > >    - [HUDI-1850] Fixing read of a empty table but with failed write
> > > (Owner:
> > > >    Sivabalan)
> > > >    - [HUDI-2151] Enable defaults for out of box performance (Owner:
> > Udit
> > > >    Mehrotra)
> > > >    - [HUDI-2119] Ensure the rolled-back instance was previously
> synced
> > to
> > > >    the Metadata Table when syncing a Rollback Instant (Owner:
> Prashant
> > > > Wason)
> > > >    - [HUDI-1458] Support custom clustering strategies and preserve
> > commit
> > > >    time to support incremental read (Owner: Satish Kotha)
> > > >    - [HUDI-1763] Fixing honoring of Ordering val in
> > > >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> > > >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
> > > >    Sivabalan)
> > > >    - [HUDI-2120] [DOC] Update docs about schema in flink sql
> > > configuration
> > > >    (Owner: Xianghu Wang)
> > > >    - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
> > > >    pengzhiwei)
> > > >
> > > > Please respond to the thread if you think that I have missed
> capturing
> > > any
> > > > of the highlights or blockers for Hudi 0.9.0 release. For the owners
> of
> > > > these release blockers, can you please provide a specific timeline
> you
> > > are
> > > > willing to commit to for finishing these so we can cut an RC ?
> > > >
> > > > Thanks,
> > > > Udit
> > > >
> > >
> >
>


-- 
Regards,
-Sivabalan

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Vinoth Chandar <vi...@apache.org>.

Any other thoughts? Love to lock this date down sooner than later.

Thanks
Vinoth

On Tue, Aug 3, 2021 at 11:35 PM Udit Mehrotra <ud...@apache.org> wrote:

> Agreed Vinoth. End of next week seems reasonable as a hard deadline for
> cutting the RC.
>
> If anyone thinks otherwise or needs more time, feel free to chime in.
>
> On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar <vi...@apache.org> wrote:
>
> > Thanks Udit! I propose we set end of next week as a hard deadline for
> > cutting the RC. Any thoughts?
> >
> > A good amount of progress is being made on these blockers, I think.
> >
> >
> > On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org> wrote:
> >
> > > Hi Community,
> > >
> > > As we draw close to doing Hudi 0.9.0 release, I am happy to share a
> > summary
> > > of the key features/improvements that would be going in the release and
> > the
> > > current blockers for everyone's visibility.
> > >
> > > *Highlights*
> > >
> > >    - [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink
> > >    writer
> > >    - [HUDI-1738] Detect and emit deleted records for Flink MOR table
> > >    streaming read
> > >    - [HUDI-1867] Support streaming reads for Flink COW table
> > >    - [HUDI-1908] Global index for flink writer
> > >    - [HUDI-1788] Support Insert Overwrite with Flink Writer
> > >    - [HUDI-2209] Bulk insert for flink writer
> > >    - [HUDI-1591] Support querying using non-globbed paths for Hudi
> Spark
> > >    DataSource queries
> > >    - [HUDI-1591] Partition pruning support for read optimized queries
> via
> > >    Hudi Spark DataSource
> > >    - [HUDI-1415] Register Hudi Table as a Spark DataSource Table with
> > >    metastore. Queries via Spark SQL will be routed through Hudi
> > DataSource
> > >    (instead of InputFormat), thus making it more performant due to
> > Spark's
> > >    native/optimized readers
> > >    - [HUDI-1591] Partition pruning support for snapshot queries via
> Hudi
> > >    Spark DataSource
> > >    - [HUDI-1658] DML and DDL support via Spark SQL
> > >    - [HUDI-1790] Add SqlSource for DeltaStreamer to support backfill
> use
> > >    cases:
> > >    - [HUDI-251] Add JDBC Source support for DeltaStreamer
> > >    - [HUDI-1910] Support Kafka based checkpointing for
> > HoodieDeltaStreamer
> > >    - [HUDI-1371] Support metadata based listing for Spark DataSource
> and
> > >    Spark SQL
> > >    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements to
> > >    Metadata based listing
> > >    - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to
> bring
> > >    all configs under one roof
> > >    - [HUDI-2124] Grafana dashboard for Hudi
> > >    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk Insert
> via
> > >    row writing
> > >    - [HUDI-1483] Async clustering for Delta Streamer
> > >    - [HUDI-2235] Add virtual key support to Hudi
> > >    - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
> > >    - In addition, there have been significant improvements and bug
> fixes
> > to
> > >    improve the overall stability of Flink Hudi integration
> > >
> > > *Current Blockers*
> > >
> > >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
> > >    - [HUDI-1256] Follow on improvements to HFile tables for metadata
> > based
> > >    listing (Owner: None)
> > >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With
> > Hudi
> > >    (Owner: pengzhiwei)
> > >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
> > >    pengzhiwei)
> > >    - [HUDI-1138] Re-implement marker files via timeline server (Owner:
> > >    Ethan Guo)
> > >    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
> > >    Govindarajan)
> > >    - [HUDI-2232] MERGE INTO fails with table having nested struct
> (Owner:
> > >    pengzhiwei)
> > >    - [HUDI-1468] incremental read support with clustering (Owner:
> Liwei)
> > >    - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
> > > None)
> > >    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar Sumit)
> > >    - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner: Sagar
> > >    Sumit)
> > >    - [HUDI-1887] Setting default value to false for enabling schema
> post
> > >    processor (Owner: Sivabalan)
> > >    - [HUDI-1850] Fixing read of a empty table but with failed write
> > (Owner:
> > >    Sivabalan)
> > >    - [HUDI-2151] Enable defaults for out of box performance (Owner:
> Udit
> > >    Mehrotra)
> > >    - [HUDI-2119] Ensure the rolled-back instance was previously synced
> to
> > >    the Metadata Table when syncing a Rollback Instant (Owner: Prashant
> > > Wason)
> > >    - [HUDI-1458] Support custom clustering strategies and preserve
> commit
> > >    time to support incremental read (Owner: Satish Kotha)
> > >    - [HUDI-1763] Fixing honoring of Ordering val in
> > >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> > >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
> > >    Sivabalan)
> > >    - [HUDI-2120] [DOC] Update docs about schema in flink sql
> > configuration
> > >    (Owner: Xianghu Wang)
> > >    - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
> > >    pengzhiwei)
> > >
> > > Please respond to the thread if you think that I have missed capturing
> > any
> > > of the highlights or blockers for Hudi 0.9.0 release. For the owners of
> > > these release blockers, can you please provide a specific timeline you
> > are
> > > willing to commit to for finishing these so we can cut an RC ?
> > >
> > > Thanks,
> > > Udit
> > >
> >
>

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Vinoth Chandar <vi...@apache.org>.

Any other thoughts? Love to lock this date down sooner than later.

Thanks
Vinoth

On Tue, Aug 3, 2021 at 11:35 PM Udit Mehrotra <ud...@apache.org> wrote:

> Agreed Vinoth. End of next week seems reasonable as a hard deadline for
> cutting the RC.
>
> If anyone thinks otherwise or needs more time, feel free to chime in.
>
> On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar <vi...@apache.org> wrote:
>
> > Thanks Udit! I propose we set end of next week as a hard deadline for
> > cutting the RC. Any thoughts?
> >
> > A good amount of progress is being made on these blockers, I think.
> >
> >
> > On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org> wrote:
> >
> > > Hi Community,
> > >
> > > As we draw close to doing Hudi 0.9.0 release, I am happy to share a
> > summary
> > > of the key features/improvements that would be going in the release and
> > the
> > > current blockers for everyone's visibility.
> > >
> > > *Highlights*
> > >
> > >    - [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink
> > >    writer
> > >    - [HUDI-1738] Detect and emit deleted records for Flink MOR table
> > >    streaming read
> > >    - [HUDI-1867] Support streaming reads for Flink COW table
> > >    - [HUDI-1908] Global index for flink writer
> > >    - [HUDI-1788] Support Insert Overwrite with Flink Writer
> > >    - [HUDI-2209] Bulk insert for flink writer
> > >    - [HUDI-1591] Support querying using non-globbed paths for Hudi
> Spark
> > >    DataSource queries
> > >    - [HUDI-1591] Partition pruning support for read optimized queries
> via
> > >    Hudi Spark DataSource
> > >    - [HUDI-1415] Register Hudi Table as a Spark DataSource Table with
> > >    metastore. Queries via Spark SQL will be routed through Hudi
> > DataSource
> > >    (instead of InputFormat), thus making it more performant due to
> > Spark's
> > >    native/optimized readers
> > >    - [HUDI-1591] Partition pruning support for snapshot queries via
> Hudi
> > >    Spark DataSource
> > >    - [HUDI-1658] DML and DDL support via Spark SQL
> > >    - [HUDI-1790] Add SqlSource for DeltaStreamer to support backfill
> use
> > >    cases:
> > >    - [HUDI-251] Add JDBC Source support for DeltaStreamer
> > >    - [HUDI-1910] Support Kafka based checkpointing for
> > HoodieDeltaStreamer
> > >    - [HUDI-1371] Support metadata based listing for Spark DataSource
> and
> > >    Spark SQL
> > >    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements to
> > >    Metadata based listing
> > >    - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to
> bring
> > >    all configs under one roof
> > >    - [HUDI-2124] Grafana dashboard for Hudi
> > >    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk Insert
> via
> > >    row writing
> > >    - [HUDI-1483] Async clustering for Delta Streamer
> > >    - [HUDI-2235] Add virtual key support to Hudi
> > >    - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
> > >    - In addition, there have been significant improvements and bug
> fixes
> > to
> > >    improve the overall stability of Flink Hudi integration
> > >
> > > *Current Blockers*
> > >
> > >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
> > >    - [HUDI-1256] Follow on improvements to HFile tables for metadata
> > based
> > >    listing (Owner: None)
> > >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With
> > Hudi
> > >    (Owner: pengzhiwei)
> > >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
> > >    pengzhiwei)
> > >    - [HUDI-1138] Re-implement marker files via timeline server (Owner:
> > >    Ethan Guo)
> > >    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
> > >    Govindarajan)
> > >    - [HUDI-2232] MERGE INTO fails with table having nested struct
> (Owner:
> > >    pengzhiwei)
> > >    - [HUDI-1468] incremental read support with clustering (Owner:
> Liwei)
> > >    - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
> > > None)
> > >    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar Sumit)
> > >    - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner: Sagar
> > >    Sumit)
> > >    - [HUDI-1887] Setting default value to false for enabling schema
> post
> > >    processor (Owner: Sivabalan)
> > >    - [HUDI-1850] Fixing read of a empty table but with failed write
> > (Owner:
> > >    Sivabalan)
> > >    - [HUDI-2151] Enable defaults for out of box performance (Owner:
> Udit
> > >    Mehrotra)
> > >    - [HUDI-2119] Ensure the rolled-back instance was previously synced
> to
> > >    the Metadata Table when syncing a Rollback Instant (Owner: Prashant
> > > Wason)
> > >    - [HUDI-1458] Support custom clustering strategies and preserve
> commit
> > >    time to support incremental read (Owner: Satish Kotha)
> > >    - [HUDI-1763] Fixing honoring of Ordering val in
> > >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> > >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
> > >    Sivabalan)
> > >    - [HUDI-2120] [DOC] Update docs about schema in flink sql
> > configuration
> > >    (Owner: Xianghu Wang)
> > >    - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
> > >    pengzhiwei)
> > >
> > > Please respond to the thread if you think that I have missed capturing
> > any
> > > of the highlights or blockers for Hudi 0.9.0 release. For the owners of
> > > these release blockers, can you please provide a specific timeline you
> > are
> > > willing to commit to for finishing these so we can cut an RC ?
> > >
> > > Thanks,
> > > Udit
> > >
> >
>

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Danny Chan <da...@apache.org>.

HUDI-2170 needs to be involved, it solves the problem that in COW write and
MOR reader code path, the preCombine field is ignored when merging.

HUDI-1771: we would try to get the rough version so that we can get more
feedback from the user, this is also a strong request for Chinese users.

Best,
Danny Chan

Udit Mehrotra <ud...@apache.org> 于2021年8月4日周三 下午2:35写道：

> Agreed Vinoth. End of next week seems reasonable as a hard deadline for
> cutting the RC.
>
> If anyone thinks otherwise or needs more time, feel free to chime in.
>
> On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar <vi...@apache.org> wrote:
>
>> Thanks Udit! I propose we set end of next week as a hard deadline for
>> cutting the RC. Any thoughts?
>>
>> A good amount of progress is being made on these blockers, I think.
>>
>>
>> On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org> wrote:
>>
>> > Hi Community,
>> >
>> > As we draw close to doing Hudi 0.9.0 release, I am happy to share a
>> summary
>> > of the key features/improvements that would be going in the release and
>> the
>> > current blockers for everyone's visibility.
>> >
>> > *Highlights*
>> >
>> >    - [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink
>> >    writer
>> >    - [HUDI-1738] Detect and emit deleted records for Flink MOR table
>> >    streaming read
>> >    - [HUDI-1867] Support streaming reads for Flink COW table
>> >    - [HUDI-1908] Global index for flink writer
>> >    - [HUDI-1788] Support Insert Overwrite with Flink Writer
>> >    - [HUDI-2209] Bulk insert for flink writer
>> >    - [HUDI-1591] Support querying using non-globbed paths for Hudi Spark
>> >    DataSource queries
>> >    - [HUDI-1591] Partition pruning support for read optimized queries
>> via
>> >    Hudi Spark DataSource
>> >    - [HUDI-1415] Register Hudi Table as a Spark DataSource Table with
>> >    metastore. Queries via Spark SQL will be routed through Hudi
>> DataSource
>> >    (instead of InputFormat), thus making it more performant due to
>> Spark's
>> >    native/optimized readers
>> >    - [HUDI-1591] Partition pruning support for snapshot queries via Hudi
>> >    Spark DataSource
>> >    - [HUDI-1658] DML and DDL support via Spark SQL
>> >    - [HUDI-1790] Add SqlSource for DeltaStreamer to support backfill use
>> >    cases:
>> >    - [HUDI-251] Add JDBC Source support for DeltaStreamer
>> >    - [HUDI-1910] Support Kafka based checkpointing for
>> HoodieDeltaStreamer
>> >    - [HUDI-1371] Support metadata based listing for Spark DataSource and
>> >    Spark SQL
>> >    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements to
>> >    Metadata based listing
>> >    - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to bring
>> >    all configs under one roof
>> >    - [HUDI-2124] Grafana dashboard for Hudi
>> >    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk Insert via
>> >    row writing
>> >    - [HUDI-1483] Async clustering for Delta Streamer
>> >    - [HUDI-2235] Add virtual key support to Hudi
>> >    - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
>> >    - In addition, there have been significant improvements and bug
>> fixes to
>> >    improve the overall stability of Flink Hudi integration
>> >
>> > *Current Blockers*
>> >
>> >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
>> >    - [HUDI-1256] Follow on improvements to HFile tables for metadata
>> based
>> >    listing (Owner: None)
>> >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With
>> Hudi
>> >    (Owner: pengzhiwei)
>> >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
>> >    pengzhiwei)
>> >    - [HUDI-1138] Re-implement marker files via timeline server (Owner:
>> >    Ethan Guo)
>> >    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
>> >    Govindarajan)
>> >    - [HUDI-2232] MERGE INTO fails with table having nested struct
>> (Owner:
>> >    pengzhiwei)
>> >    - [HUDI-1468] incremental read support with clustering (Owner: Liwei)
>> >    - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
>> > None)
>> >    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar Sumit)
>> >    - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner: Sagar
>> >    Sumit)
>> >    - [HUDI-1887] Setting default value to false for enabling schema post
>> >    processor (Owner: Sivabalan)
>> >    - [HUDI-1850] Fixing read of a empty table but with failed write
>> (Owner:
>> >    Sivabalan)
>> >    - [HUDI-2151] Enable defaults for out of box performance (Owner: Udit
>> >    Mehrotra)
>> >    - [HUDI-2119] Ensure the rolled-back instance was previously synced
>> to
>> >    the Metadata Table when syncing a Rollback Instant (Owner: Prashant
>> > Wason)
>> >    - [HUDI-1458] Support custom clustering strategies and preserve
>> commit
>> >    time to support incremental read (Owner: Satish Kotha)
>> >    - [HUDI-1763] Fixing honoring of Ordering val in
>> >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
>> >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
>> >    Sivabalan)
>> >    - [HUDI-2120] [DOC] Update docs about schema in flink sql
>> configuration
>> >    (Owner: Xianghu Wang)
>> >    - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
>> >    pengzhiwei)
>> >
>> > Please respond to the thread if you think that I have missed capturing
>> any
>> > of the highlights or blockers for Hudi 0.9.0 release. For the owners of
>> > these release blockers, can you please provide a specific timeline you
>> are
>> > willing to commit to for finishing these so we can cut an RC ?
>> >
>> > Thanks,
>> > Udit
>> >
>>
>

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Udit Mehrotra <ud...@apache.org>.

Agreed Vinoth. End of next week seems reasonable as a hard deadline for
cutting the RC.

If anyone thinks otherwise or needs more time, feel free to chime in.

On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar <vi...@apache.org> wrote:

> Thanks Udit! I propose we set end of next week as a hard deadline for
> cutting the RC. Any thoughts?
>
> A good amount of progress is being made on these blockers, I think.
>
>
> On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org> wrote:
>
> > Hi Community,
> >
> > As we draw close to doing Hudi 0.9.0 release, I am happy to share a
> summary
> > of the key features/improvements that would be going in the release and
> the
> > current blockers for everyone's visibility.
> >
> > *Highlights*
> >
> >    - [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink
> >    writer
> >    - [HUDI-1738] Detect and emit deleted records for Flink MOR table
> >    streaming read
> >    - [HUDI-1867] Support streaming reads for Flink COW table
> >    - [HUDI-1908] Global index for flink writer
> >    - [HUDI-1788] Support Insert Overwrite with Flink Writer
> >    - [HUDI-2209] Bulk insert for flink writer
> >    - [HUDI-1591] Support querying using non-globbed paths for Hudi Spark
> >    DataSource queries
> >    - [HUDI-1591] Partition pruning support for read optimized queries via
> >    Hudi Spark DataSource
> >    - [HUDI-1415] Register Hudi Table as a Spark DataSource Table with
> >    metastore. Queries via Spark SQL will be routed through Hudi
> DataSource
> >    (instead of InputFormat), thus making it more performant due to
> Spark's
> >    native/optimized readers
> >    - [HUDI-1591] Partition pruning support for snapshot queries via Hudi
> >    Spark DataSource
> >    - [HUDI-1658] DML and DDL support via Spark SQL
> >    - [HUDI-1790] Add SqlSource for DeltaStreamer to support backfill use
> >    cases:
> >    - [HUDI-251] Add JDBC Source support for DeltaStreamer
> >    - [HUDI-1910] Support Kafka based checkpointing for
> HoodieDeltaStreamer
> >    - [HUDI-1371] Support metadata based listing for Spark DataSource and
> >    Spark SQL
> >    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements to
> >    Metadata based listing
> >    - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to bring
> >    all configs under one roof
> >    - [HUDI-2124] Grafana dashboard for Hudi
> >    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk Insert via
> >    row writing
> >    - [HUDI-1483] Async clustering for Delta Streamer
> >    - [HUDI-2235] Add virtual key support to Hudi
> >    - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
> >    - In addition, there have been significant improvements and bug fixes
> to
> >    improve the overall stability of Flink Hudi integration
> >
> > *Current Blockers*
> >
> >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
> >    - [HUDI-1256] Follow on improvements to HFile tables for metadata
> based
> >    listing (Owner: None)
> >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With
> Hudi
> >    (Owner: pengzhiwei)
> >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
> >    pengzhiwei)
> >    - [HUDI-1138] Re-implement marker files via timeline server (Owner:
> >    Ethan Guo)
> >    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
> >    Govindarajan)
> >    - [HUDI-2232] MERGE INTO fails with table having nested struct (Owner:
> >    pengzhiwei)
> >    - [HUDI-1468] incremental read support with clustering (Owner: Liwei)
> >    - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
> > None)
> >    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar Sumit)
> >    - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner: Sagar
> >    Sumit)
> >    - [HUDI-1887] Setting default value to false for enabling schema post
> >    processor (Owner: Sivabalan)
> >    - [HUDI-1850] Fixing read of a empty table but with failed write
> (Owner:
> >    Sivabalan)
> >    - [HUDI-2151] Enable defaults for out of box performance (Owner: Udit
> >    Mehrotra)
> >    - [HUDI-2119] Ensure the rolled-back instance was previously synced to
> >    the Metadata Table when syncing a Rollback Instant (Owner: Prashant
> > Wason)
> >    - [HUDI-1458] Support custom clustering strategies and preserve commit
> >    time to support incremental read (Owner: Satish Kotha)
> >    - [HUDI-1763] Fixing honoring of Ordering val in
> >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
> >    Sivabalan)
> >    - [HUDI-2120] [DOC] Update docs about schema in flink sql
> configuration
> >    (Owner: Xianghu Wang)
> >    - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
> >    pengzhiwei)
> >
> > Please respond to the thread if you think that I have missed capturing
> any
> > of the highlights or blockers for Hudi 0.9.0 release. For the owners of
> > these release blockers, can you please provide a specific timeline you
> are
> > willing to commit to for finishing these so we can cut an RC ?
> >
> > Thanks,
> > Udit
> >
>

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Udit Mehrotra <ud...@apache.org>.

Agreed Vinoth. End of next week seems reasonable as a hard deadline for
cutting the RC.

If anyone thinks otherwise or needs more time, feel free to chime in.

On Tue, Aug 3, 2021 at 8:10 PM Vinoth Chandar <vi...@apache.org> wrote:

> Thanks Udit! I propose we set end of next week as a hard deadline for
> cutting the RC. Any thoughts?
>
> A good amount of progress is being made on these blockers, I think.
>
>
> On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org> wrote:
>
> > Hi Community,
> >
> > As we draw close to doing Hudi 0.9.0 release, I am happy to share a
> summary
> > of the key features/improvements that would be going in the release and
> the
> > current blockers for everyone's visibility.
> >
> > *Highlights*
> >
> >    - [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink
> >    writer
> >    - [HUDI-1738] Detect and emit deleted records for Flink MOR table
> >    streaming read
> >    - [HUDI-1867] Support streaming reads for Flink COW table
> >    - [HUDI-1908] Global index for flink writer
> >    - [HUDI-1788] Support Insert Overwrite with Flink Writer
> >    - [HUDI-2209] Bulk insert for flink writer
> >    - [HUDI-1591] Support querying using non-globbed paths for Hudi Spark
> >    DataSource queries
> >    - [HUDI-1591] Partition pruning support for read optimized queries via
> >    Hudi Spark DataSource
> >    - [HUDI-1415] Register Hudi Table as a Spark DataSource Table with
> >    metastore. Queries via Spark SQL will be routed through Hudi
> DataSource
> >    (instead of InputFormat), thus making it more performant due to
> Spark's
> >    native/optimized readers
> >    - [HUDI-1591] Partition pruning support for snapshot queries via Hudi
> >    Spark DataSource
> >    - [HUDI-1658] DML and DDL support via Spark SQL
> >    - [HUDI-1790] Add SqlSource for DeltaStreamer to support backfill use
> >    cases:
> >    - [HUDI-251] Add JDBC Source support for DeltaStreamer
> >    - [HUDI-1910] Support Kafka based checkpointing for
> HoodieDeltaStreamer
> >    - [HUDI-1371] Support metadata based listing for Spark DataSource and
> >    Spark SQL
> >    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements to
> >    Metadata based listing
> >    - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to bring
> >    all configs under one roof
> >    - [HUDI-2124] Grafana dashboard for Hudi
> >    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk Insert via
> >    row writing
> >    - [HUDI-1483] Async clustering for Delta Streamer
> >    - [HUDI-2235] Add virtual key support to Hudi
> >    - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
> >    - In addition, there have been significant improvements and bug fixes
> to
> >    improve the overall stability of Flink Hudi integration
> >
> > *Current Blockers*
> >
> >    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
> >    - [HUDI-1256] Follow on improvements to HFile tables for metadata
> based
> >    listing (Owner: None)
> >    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With
> Hudi
> >    (Owner: pengzhiwei)
> >    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
> >    pengzhiwei)
> >    - [HUDI-1138] Re-implement marker files via timeline server (Owner:
> >    Ethan Guo)
> >    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
> >    Govindarajan)
> >    - [HUDI-2232] MERGE INTO fails with table having nested struct (Owner:
> >    pengzhiwei)
> >    - [HUDI-1468] incremental read support with clustering (Owner: Liwei)
> >    - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
> > None)
> >    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar Sumit)
> >    - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner: Sagar
> >    Sumit)
> >    - [HUDI-1887] Setting default value to false for enabling schema post
> >    processor (Owner: Sivabalan)
> >    - [HUDI-1850] Fixing read of a empty table but with failed write
> (Owner:
> >    Sivabalan)
> >    - [HUDI-2151] Enable defaults for out of box performance (Owner: Udit
> >    Mehrotra)
> >    - [HUDI-2119] Ensure the rolled-back instance was previously synced to
> >    the Metadata Table when syncing a Rollback Instant (Owner: Prashant
> > Wason)
> >    - [HUDI-1458] Support custom clustering strategies and preserve commit
> >    time to support incremental read (Owner: Satish Kotha)
> >    - [HUDI-1763] Fixing honoring of Ordering val in
> >    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
> >    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
> >    Sivabalan)
> >    - [HUDI-2120] [DOC] Update docs about schema in flink sql
> configuration
> >    (Owner: Xianghu Wang)
> >    - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
> >    pengzhiwei)
> >
> > Please respond to the thread if you think that I have missed capturing
> any
> > of the highlights or blockers for Hudi 0.9.0 release. For the owners of
> > these release blockers, can you please provide a specific timeline you
> are
> > willing to commit to for finishing these so we can cut an RC ?
> >
> > Thanks,
> > Udit
> >
>

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Vinoth Chandar <vi...@apache.org>.

Thanks Udit! I propose we set end of next week as a hard deadline for
cutting the RC. Any thoughts?

A good amount of progress is being made on these blockers, I think.


On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org> wrote:

> Hi Community,
>
> As we draw close to doing Hudi 0.9.0 release, I am happy to share a summary
> of the key features/improvements that would be going in the release and the
> current blockers for everyone's visibility.
>
> *Highlights*
>
>    - [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink
>    writer
>    - [HUDI-1738] Detect and emit deleted records for Flink MOR table
>    streaming read
>    - [HUDI-1867] Support streaming reads for Flink COW table
>    - [HUDI-1908] Global index for flink writer
>    - [HUDI-1788] Support Insert Overwrite with Flink Writer
>    - [HUDI-2209] Bulk insert for flink writer
>    - [HUDI-1591] Support querying using non-globbed paths for Hudi Spark
>    DataSource queries
>    - [HUDI-1591] Partition pruning support for read optimized queries via
>    Hudi Spark DataSource
>    - [HUDI-1415] Register Hudi Table as a Spark DataSource Table with
>    metastore. Queries via Spark SQL will be routed through Hudi DataSource
>    (instead of InputFormat), thus making it more performant due to Spark's
>    native/optimized readers
>    - [HUDI-1591] Partition pruning support for snapshot queries via Hudi
>    Spark DataSource
>    - [HUDI-1658] DML and DDL support via Spark SQL
>    - [HUDI-1790] Add SqlSource for DeltaStreamer to support backfill use
>    cases:
>    - [HUDI-251] Add JDBC Source support for DeltaStreamer
>    - [HUDI-1910] Support Kafka based checkpointing for HoodieDeltaStreamer
>    - [HUDI-1371] Support metadata based listing for Spark DataSource and
>    Spark SQL
>    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements to
>    Metadata based listing
>    - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to bring
>    all configs under one roof
>    - [HUDI-2124] Grafana dashboard for Hudi
>    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk Insert via
>    row writing
>    - [HUDI-1483] Async clustering for Delta Streamer
>    - [HUDI-2235] Add virtual key support to Hudi
>    - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
>    - In addition, there have been significant improvements and bug fixes to
>    improve the overall stability of Flink Hudi integration
>
> *Current Blockers*
>
>    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
>    - [HUDI-1256] Follow on improvements to HFile tables for metadata based
>    listing (Owner: None)
>    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With Hudi
>    (Owner: pengzhiwei)
>    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
>    pengzhiwei)
>    - [HUDI-1138] Re-implement marker files via timeline server (Owner:
>    Ethan Guo)
>    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
>    Govindarajan)
>    - [HUDI-2232] MERGE INTO fails with table having nested struct (Owner:
>    pengzhiwei)
>    - [HUDI-1468] incremental read support with clustering (Owner: Liwei)
>    - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
> None)
>    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar Sumit)
>    - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner: Sagar
>    Sumit)
>    - [HUDI-1887] Setting default value to false for enabling schema post
>    processor (Owner: Sivabalan)
>    - [HUDI-1850] Fixing read of a empty table but with failed write (Owner:
>    Sivabalan)
>    - [HUDI-2151] Enable defaults for out of box performance (Owner: Udit
>    Mehrotra)
>    - [HUDI-2119] Ensure the rolled-back instance was previously synced to
>    the Metadata Table when syncing a Rollback Instant (Owner: Prashant
> Wason)
>    - [HUDI-1458] Support custom clustering strategies and preserve commit
>    time to support incremental read (Owner: Satish Kotha)
>    - [HUDI-1763] Fixing honoring of Ordering val in
>    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
>    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
>    Sivabalan)
>    - [HUDI-2120] [DOC] Update docs about schema in flink sql configuration
>    (Owner: Xianghu Wang)
>    - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
>    pengzhiwei)
>
> Please respond to the thread if you think that I have missed capturing any
> of the highlights or blockers for Hudi 0.9.0 release. For the owners of
> these release blockers, can you please provide a specific timeline you are
> willing to commit to for finishing these so we can cut an RC ?
>
> Thanks,
> Udit
>

Re: [DISCUSS] Hudi 0.9.0 Release

Posted by Vinoth Chandar <vi...@apache.org>.

Thanks Udit! I propose we set end of next week as a hard deadline for
cutting the RC. Any thoughts?

A good amount of progress is being made on these blockers, I think.


On Tue, Aug 3, 2021 at 5:13 PM Udit Mehrotra <ud...@apache.org> wrote:

> Hi Community,
>
> As we draw close to doing Hudi 0.9.0 release, I am happy to share a summary
> of the key features/improvements that would be going in the release and the
> current blockers for everyone's visibility.
>
> *Highlights*
>
>    - [HUDI-1729] Asynchronous Hive sync and commits cleaning for Flink
>    writer
>    - [HUDI-1738] Detect and emit deleted records for Flink MOR table
>    streaming read
>    - [HUDI-1867] Support streaming reads for Flink COW table
>    - [HUDI-1908] Global index for flink writer
>    - [HUDI-1788] Support Insert Overwrite with Flink Writer
>    - [HUDI-2209] Bulk insert for flink writer
>    - [HUDI-1591] Support querying using non-globbed paths for Hudi Spark
>    DataSource queries
>    - [HUDI-1591] Partition pruning support for read optimized queries via
>    Hudi Spark DataSource
>    - [HUDI-1415] Register Hudi Table as a Spark DataSource Table with
>    metastore. Queries via Spark SQL will be routed through Hudi DataSource
>    (instead of InputFormat), thus making it more performant due to Spark's
>    native/optimized readers
>    - [HUDI-1591] Partition pruning support for snapshot queries via Hudi
>    Spark DataSource
>    - [HUDI-1658] DML and DDL support via Spark SQL
>    - [HUDI-1790] Add SqlSource for DeltaStreamer to support backfill use
>    cases:
>    - [HUDI-251] Add JDBC Source support for DeltaStreamer
>    - [HUDI-1910] Support Kafka based checkpointing for HoodieDeltaStreamer
>    - [HUDI-1371] Support metadata based listing for Spark DataSource and
>    Spark SQL
>    - [HUDI-2013] [HUDI-1717] [HUDI-2089] [HUDI-2016] Improvements to
>    Metadata based listing
>    - HUDI-89] Introduce a HoodieConfig/ConfigProperty framework to bring
>    all configs under one roof
>    - [HUDI-2124] Grafana dashboard for Hudi
>    - [HUDI-1104] [HUDI-1105] [HUDI-2009] Improvements to Bulk Insert via
>    row writing
>    - [HUDI-1483] Async clustering for Delta Streamer
>    - [HUDI-2235] Add virtual key support to Hudi
>    - [HUDI-1848] Add support for Hive Metastore in Hive-sync-tool
>    - In addition, there have been significant improvements and bug fixes to
>    improve the overall stability of Flink Hudi integration
>
> *Current Blockers*
>
>    - [HUDI-2208] Support Bulk Insert For Spark Sql (Owner: pengzhiwei)
>    - [HUDI-1256] Follow on improvements to HFile tables for metadata based
>    listing (Owner: None)
>    - [HUDI-2063] Add Doc For Spark Sql (DML and DDL) integration With Hudi
>    (Owner: pengzhiwei)
>    - [HUDI-1842] Spark Sql Support For The Exists Hoodie Table (Owner:
>    pengzhiwei)
>    - [HUDI-1138] Re-implement marker files via timeline server (Owner:
>    Ethan Guo)
>    - [HUDI-1985] Website redesign implementation (Owner: Vinoth
>    Govindarajan)
>    - [HUDI-2232] MERGE INTO fails with table having nested struct (Owner:
>    pengzhiwei)
>    - [HUDI-1468] incremental read support with clustering (Owner: Liwei)
>    - [HUDI-2250] Bulk insert support for tables w/ primary key (Owner:
> None)
>    - [HUDI-2222] [SQL] Test catalog integration (Owner: Sagar Sumit)
>    - [HUDI-2221] [SQL] Functionality testing with Spark 2 (Owner: Sagar
>    Sumit)
>    - [HUDI-1887] Setting default value to false for enabling schema post
>    processor (Owner: Sivabalan)
>    - [HUDI-1850] Fixing read of a empty table but with failed write (Owner:
>    Sivabalan)
>    - [HUDI-2151] Enable defaults for out of box performance (Owner: Udit
>    Mehrotra)
>    - [HUDI-2119] Ensure the rolled-back instance was previously synced to
>    the Metadata Table when syncing a Rollback Instant (Owner: Prashant
> Wason)
>    - [HUDI-1458] Support custom clustering strategies and preserve commit
>    time to support incremental read (Owner: Satish Kotha)
>    - [HUDI-1763] Fixing honoring of Ordering val in
>    DefaultHoodieRecordPayload.preCombine (Owner: Sivabalan)
>    - [HUDI-1129] Improving schema evolution support in hudi (Owner:
>    Sivabalan)
>    - [HUDI-2120] [DOC] Update docs about schema in flink sql configuration
>    (Owner: Xianghu Wang)
>    - [HUDI-2182] Support Compaction Command For Spark Sql (Owner:
>    pengzhiwei)
>
> Please respond to the thread if you think that I have missed capturing any
> of the highlights or blockers for Hudi 0.9.0 release. For the owners of
> these release blockers, can you please provide a specific timeline you are
> willing to commit to for finishing these so we can cut an RC ?
>
> Thanks,
> Udit
>