You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/26 10:40:09 UTC

[GitHub] [hudi] umehrot2 opened a new pull request #3547: Add Hudi 0.9.0 release page with highlights

umehrot2 opened a new pull request #3547:
URL: https://github.com/apache/hudi/pull/3547


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan merged pull request #3547: [HUDI-2382] Add Hudi 0.9.0 release page with highlights

Posted by GitBox <gi...@apache.org>.

nsivabalan merged pull request #3547:
URL: https://github.com/apache/hudi/pull/3547


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] umehrot2 commented on pull request #3547: Add Hudi 0.9.0 release page with highlights

Posted by GitBox <gi...@apache.org>.

umehrot2 commented on pull request #3547:
URL: https://github.com/apache/hudi/pull/3547#issuecomment-907504212


   @nsivabalan updated the download page, and re-direct links to the release 0.9.0 page which were reverted from the other PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] danny0405 commented on a change in pull request #3547: Add Hudi 0.9.0 release page with highlights

Posted by GitBox <gi...@apache.org>.

danny0405 commented on a change in pull request #3547:
URL: https://github.com/apache/hudi/pull/3547#discussion_r697316833



##########
File path: website/releases/release-0.9.0.md
##########
@@ -0,0 +1,118 @@
+---
+title: "Release 0.9.0"
+sidebar_position: 2
+layout: releases
+toc: true
+last_modified_at: 2021-08-26T08:40:00-07:00
+---
+# [Release 0.9.0](https://github.com/apache/hudi/releases/tag/release-0.9.0) ([docs](/docs/quick-start-guide))
+
+## Download Information
+* Source Release : [Apache Hudi 0.9.0 Source Release](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz.sha512))
+* Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+## Migration Guide for this release
+- If migrating from release older than 0.5.3, please also check the upgrade instructions for each subsequent release below.
+- Specifically check upgrade instructions for 0.6.0.
+- With 0.9.0 Hudi is adding more table properties to aid in using an existing hudi table with spark-sql. 
+  To smoothly aid this transition these properties added to `hoodie.properties` file. Whenever Hudi is launched with
+  newer table version i.e 2 (or moving from pre 0.9.0 to 0.9.0), an upgrade step will be executed automatically.
+  This automatic upgrade step will happen just once per Hudi table as the `hoodie.table.version` will be updated in 
+  property file after upgrade is completed.
+- Similarly, a command line tool for Downgrading (command - `downgrade`) is added if in case some users want to 
+  downgrade Hudi from table version 2 to 1 or move from Hudi 0.9.0 to pre 0.9.0
+- This release also replaces string representation of all configs to ConfigProperty. Older configs (strings) are deprecated
+and users are encouraged to use the new ConfigProperties. 
+
+## Release Highlights
+
+### Spark SQL DML and DDL Support
+
+0.9.0 brings the ability to perform data definition (create, modify and truncate) and data manipulation (inserts, 
+upserts and deletes) on Hudi tables using the familiar Structured Query Language (SQL) via Spark SQL.
+This is huge step towards making Hudi more easily accessible and operable by all personas (non-engineers, analysts etc).
+Users can now use `INSERT`, `UPDATE`, `MERGE INTO` and `DELETE`
+sql statements to manipulate data. In addition, `INSERT OVERWRITE` statement can be used to overwrite existing data in the table or partition
+with new values. 
+
+On the data definition side, customers can now use `CREATE TABLE....USING HUDI` statement, which creates and registers the
+table as Hudi Spark DataSource table, instead of the earlier approach of having to define Hudi Input Format and Parquet
+Serde. They can also use `CREATE TABLE....AS`, `ALTER TABLE` and `TRUNCATE TABLE` statements to create, modify or truncate table
+definitions.
+
+Please see [RFC-25](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+25%3A+Spark+SQL+Extension+For+Hudi)
+for more implementation details and follow this [page](/docs/quick-start-guide) to get started with DML and DDL!
+
+### Query side improvements
+- Hudi tables can now be registered adn read as DataSource Table in spark. 
+- **Metadata based File Listing Improvements for Spark**: ?? 
+- Added support for timetravel query. Please check the [quick start page](http://localhost:3000/docs/next/quick-start-guide)
+  on how to invoke the same. 
+
+### Writer side improvements 
+- Virtual keys support has been added where users can avoid adding meta fields to hudi table and leverage existing 
+  fields to populate record keys and partition paths. One needs to disable [this](http://localhost:3000/docs/next/configurations#hoodiepopulatemetafields)
+  config to enable virtual keys. 
+- Clustering improvements:  
+    - Async Clustering support has been added to both DeltaStreamer and Spark Streaming. More on this can be found in this 
+      blog post and how to use the same.
+    - Incremental read works for clustered data as well. ? This is more of a fix right? 
+    - HoodieClusteringJob has been added to assist in building and executing a clustering plan together as a standalone job. 
+    - Added a config (`hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions`) to skip latest N partitions 
+      while creating the cluster plan.  
+- Bulk_Insert using row writer path has been enhanced and brought to feature parity with WriteClient's version of Bulk_insert. 
+Users can start to use the row writer path for better performance. 
+- Added support for HMS in HiveSyncTool. HMSDDLExecutor is a DDLExecutor implementation, based on HMS which uses HMS 
+  apis directly for all DDL tasks.
+- A pre commit validator framework has been added to spark engine. Users can leverage this to add validations like verifying all valid
+files are present for given instant, or all invalid files are removed, etc as part of this framework.
+- Users can choose to drop fields used to generate partition paths if need be. (`hoodie.datasource.write.drop.partition.columns`)
+- Added support for "delete_partition" operation to spark. Users can leverage this to delete older partitions as and when required.    
+- Disk based map improvements? 
+- Concurrency control: https://issues.apache.org/jira/browse/HUDI-944
+- ORC format support: ?
+- Support for Huawei Cloud Object Storage, BAIDU AFS storage format, Baidu BOS storage in Hudi: Should we expand more on this? 
+- Marker file management has been enhanced to use batch mode and is now centrally coordinate by timeline server. This benefits 
+cloud stores like S3 when large number of marker files are created or deleted during commits. You can read more on this blog.
+- Java engine now support HoodieBloomIndex.
+
+### Flink Integration Improvements
+- **Insert Overwrite Support**: HUDI-1788
+- **Bulk Insert Support**: HUDI-2209
+- **Non Partitioned Table Support**: HUDI-1814

Review comment:
       - The Flink writer supports `CDC format` for `MOR` table, turns on the option `changelog.enabled`, Hoodie would then persist all change flags of each record, using the streaming reader of Flink, user can do stateful computation based on these change logs. Note that when the commit is compacted with async compaction service, all the the intermediate changes are merged into one(last record), to only have `UPSERT` semantics.
   - Bulk insert is supported for efficient load of existing table, set up `write.operation` as `bulk_insert` to use
   - You can now streaming read the COW table
   - Delete messages is default emitted in streaming read mode, when `changelog.enabled` is `false`, the downstream receives the `DELETE` message as a Hoodie record with empty payload
   - Flink writer now can update the history partition, i.e. delete the old record in history partiton then insert new record in current partition, turns on `index.global.enabled` to use
   - Hive sync has been greatly improved by support different Hive versions(1.x, 2.x, 3.x)
   - Flink support pure log append mode, in this mode, no records were deduplicated, for both `COW` and `MOR` table, parquets are written directly for each flush, turns off `write.insert.deduplicate` to use this mode




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on a change in pull request #3547: Add Hudi 0.9.0 release page with highlights

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on a change in pull request #3547:
URL: https://github.com/apache/hudi/pull/3547#discussion_r697130444



##########
File path: website/releases/release-0.9.0.md
##########
@@ -0,0 +1,118 @@
+---
+title: "Release 0.9.0"
+sidebar_position: 2
+layout: releases
+toc: true
+last_modified_at: 2021-08-26T08:40:00-07:00
+---
+# [Release 0.9.0](https://github.com/apache/hudi/releases/tag/release-0.9.0) ([docs](/docs/quick-start-guide))
+
+## Download Information
+* Source Release : [Apache Hudi 0.9.0 Source Release](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz.sha512))
+* Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+## Migration Guide for this release
+- If migrating from release older than 0.5.3, please also check the upgrade instructions for each subsequent release below.
+- Specifically check upgrade instructions for 0.6.0.
+- With 0.9.0 Hudi is adding more table properties to aid in using an existing hudi table with spark-sql. 
+  To smoothly aid this transition these properties added to `hoodie.properties` file. Whenever Hudi is launched with
+  newer table version i.e 2 (or moving from pre 0.9.0 to 0.9.0), an upgrade step will be executed automatically.
+  This automatic upgrade step will happen just once per Hudi table as the `hoodie.table.version` will be updated in 
+  property file after upgrade is completed.
+- Similarly, a command line tool for Downgrading (command - `downgrade`) is added if in case some users want to 
+  downgrade Hudi from table version 2 to 1 or move from Hudi 0.9.0 to pre 0.9.0
+- This release also replaces string representation of all configs to ConfigProperty. Older configs (strings) are deprecated
+and users are encouraged to use the new ConfigProperties. 
+
+## Release Highlights
+
+### Spark SQL DML and DDL Support
+
+0.9.0 brings the ability to perform data definition (create, modify and truncate) and data manipulation (inserts, 
+upserts and deletes) on Hudi tables using the familiar Structured Query Language (SQL) via Spark SQL.
+This is huge step towards making Hudi more easily accessible and operable by all personas (non-engineers, analysts etc).
+Users can now use `INSERT`, `UPDATE`, `MERGE INTO` and `DELETE`
+sql statements to manipulate data. In addition, `INSERT OVERWRITE` statement can be used to overwrite existing data in the table or partition
+with new values. 
+
+On the data definition side, customers can now use `CREATE TABLE....USING HUDI` statement, which creates and registers the
+table as Hudi Spark DataSource table, instead of the earlier approach of having to define Hudi Input Format and Parquet
+Serde. They can also use `CREATE TABLE....AS`, `ALTER TABLE` and `TRUNCATE TABLE` statements to create, modify or truncate table
+definitions.
+
+Please see [RFC-25](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+25%3A+Spark+SQL+Extension+For+Hudi)
+for more implementation details and follow this [page](/docs/quick-start-guide) to get started with DML and DDL!
+
+### Query side improvements
+- Hudi tables can now be registered adn read as DataSource Table in spark. 
+- **Metadata based File Listing Improvements for Spark**: ?? 
+- Added support for timetravel query. Please check the [quick start page](http://localhost:3000/docs/next/quick-start-guide)
+  on how to invoke the same. 
+
+### Writer side improvements 
+- Virtual keys support has been added where users can avoid adding meta fields to hudi table and leverage existing 
+  fields to populate record keys and partition paths. One needs to disable [this](http://localhost:3000/docs/next/configurations#hoodiepopulatemetafields)
+  config to enable virtual keys. 
+- Clustering improvements:  
+    - Async Clustering support has been added to both DeltaStreamer and Spark Streaming. More on this can be found in this 
+      blog post and how to use the same.
+    - Incremental read works for clustered data as well. ? This is more of a fix right? 
+    - HoodieClusteringJob has been added to assist in building and executing a clustering plan together as a standalone job. 
+    - Added a config (`hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions`) to skip latest N partitions 
+      while creating the cluster plan.  
+- Bulk_Insert using row writer path has been enhanced and brought to feature parity with WriteClient's version of Bulk_insert. 
+Users can start to use the row writer path for better performance. 
+- Added support for HMS in HiveSyncTool. HMSDDLExecutor is a DDLExecutor implementation, based on HMS which uses HMS 
+  apis directly for all DDL tasks.
+- A pre commit validator framework has been added to spark engine. Users can leverage this to add validations like verifying all valid
+files are present for given instant, or all invalid files are removed, etc as part of this framework.
+- Users can choose to drop fields used to generate partition paths if need be. (`hoodie.datasource.write.drop.partition.columns`)
+- Added support for "delete_partition" operation to spark. Users can leverage this to delete older partitions as and when required.    
+- Disk based map improvements? 
+- Concurrency control: https://issues.apache.org/jira/browse/HUDI-944
+- ORC format support: ?
+- Support for Huawei Cloud Object Storage, BAIDU AFS storage format, Baidu BOS storage in Hudi: Should we expand more on this? 
+- Marker file management has been enhanced to use batch mode and is now centrally coordinate by timeline server. This benefits 
+cloud stores like S3 when large number of marker files are created or deleted during commits. You can read more on this blog.
+- Java engine now support HoodieBloomIndex.
+
+### Flink Integration Improvements
+- **Insert Overwrite Support**: HUDI-1788
+- **Bulk Insert Support**: HUDI-2209
+- **Non Partitioned Table Support**: HUDI-1814
+- **Streaming Read for COW Tables**: HUDI-1867
+- **Emit Deletes for MOR Streaming Read**: HUDI-1738
+- **Global Index Support**: HUDI-1908
+- Propagate CDC format for Hudi
+- Emit deletes for Flink MOR table streaming read 
+- Supports hive style partitioning for flink writer
+- Support Transformer for HoodieFlinkStreamer
+-  Metadata table for flink
+-  Support hive3 meta sync for flink writer
+-  Support hive1 metadata sync for flink writer
+-  Asynchronous Hive sync and commits cleaning for Flink writer
+-  Support flink hive sync in batch mode-
+-  Support Append only in Flink stream
+-  Add marker files for flink writer
+
+## Utilities
+### DeltaStreamer
+- Couple of sources have been added to deltastreamer. 
+  1. JDBC source can assist in reading data from RDBMS sources.
+  2. SQLSource can assist in backfill use-cases like adding a new column and backfilling just one column for the past 6 months.
+  3. S3EventsHoodieIncrSource and S3EventsSource assists in reading data from S3 reliably and efficiently ingesting data to Hudi. 
+     Existing approach of using DFSSource uses last modification time of files as checkpoint to pull in new files, but if large 
+     number of files have same modification time, might run into issues of missing some files to be read from source. 
+     These two sources (S3EventsHoodieIncrSource and S3EventsSource) together ensures data is reliably ingested from S3 into 
+     Hudi by leveraging AWS SNS and SQS services that subscribes to file events from the source bucket.
+- In addition to pulling data from kafka for kafka sources using DeltaStreamer using regular offset format 
+  (topic_name,partition_num:offset,partition_num:offset,....), we also added two new formats. Timestampd based and 
+  Group consumer offset. 
+- Added support to pass in basic auth credentials in schema registry provider url with schema provider in deltastreamer.   
+- Some improvements have been made to hudi-cli like `SCHEDULE COMPACTION` and `RUN COMPACTION` statements to easily schedule and run

Review comment:
       @umehrot2 : I saw this rom your first version of this patch. I guess schedule compation and run compaction has been there in previous releases in hudi-cli. Did you mean something else here? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan merged pull request #3547: [HUDI-2382] Add Hudi 0.9.0 release page with highlights

Posted by GitBox <gi...@apache.org>.

nsivabalan merged pull request #3547:
URL: https://github.com/apache/hudi/pull/3547


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] umehrot2 commented on a change in pull request #3547: Add Hudi 0.9.0 release page with highlights

Posted by GitBox <gi...@apache.org>.

umehrot2 commented on a change in pull request #3547:
URL: https://github.com/apache/hudi/pull/3547#discussion_r697745362



##########
File path: website/releases/release-0.9.0.md
##########
@@ -0,0 +1,118 @@
+---
+title: "Release 0.9.0"
+sidebar_position: 2
+layout: releases
+toc: true
+last_modified_at: 2021-08-26T08:40:00-07:00
+---
+# [Release 0.9.0](https://github.com/apache/hudi/releases/tag/release-0.9.0) ([docs](/docs/quick-start-guide))
+
+## Download Information
+* Source Release : [Apache Hudi 0.9.0 Source Release](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz.sha512))
+* Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+## Migration Guide for this release
+- If migrating from release older than 0.5.3, please also check the upgrade instructions for each subsequent release below.
+- Specifically check upgrade instructions for 0.6.0.
+- With 0.9.0 Hudi is adding more table properties to aid in using an existing hudi table with spark-sql. 
+  To smoothly aid this transition these properties added to `hoodie.properties` file. Whenever Hudi is launched with
+  newer table version i.e 2 (or moving from pre 0.9.0 to 0.9.0), an upgrade step will be executed automatically.
+  This automatic upgrade step will happen just once per Hudi table as the `hoodie.table.version` will be updated in 
+  property file after upgrade is completed.
+- Similarly, a command line tool for Downgrading (command - `downgrade`) is added if in case some users want to 
+  downgrade Hudi from table version 2 to 1 or move from Hudi 0.9.0 to pre 0.9.0
+- This release also replaces string representation of all configs to ConfigProperty. Older configs (strings) are deprecated
+and users are encouraged to use the new ConfigProperties. 
+
+## Release Highlights
+
+### Spark SQL DML and DDL Support
+
+0.9.0 brings the ability to perform data definition (create, modify and truncate) and data manipulation (inserts, 
+upserts and deletes) on Hudi tables using the familiar Structured Query Language (SQL) via Spark SQL.
+This is huge step towards making Hudi more easily accessible and operable by all personas (non-engineers, analysts etc).
+Users can now use `INSERT`, `UPDATE`, `MERGE INTO` and `DELETE`
+sql statements to manipulate data. In addition, `INSERT OVERWRITE` statement can be used to overwrite existing data in the table or partition
+with new values. 
+
+On the data definition side, customers can now use `CREATE TABLE....USING HUDI` statement, which creates and registers the
+table as Hudi Spark DataSource table, instead of the earlier approach of having to define Hudi Input Format and Parquet
+Serde. They can also use `CREATE TABLE....AS`, `ALTER TABLE` and `TRUNCATE TABLE` statements to create, modify or truncate table
+definitions.
+
+Please see [RFC-25](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+25%3A+Spark+SQL+Extension+For+Hudi)
+for more implementation details and follow this [page](/docs/quick-start-guide) to get started with DML and DDL!
+
+### Query side improvements
+- Hudi tables can now be registered adn read as DataSource Table in spark. 
+- **Metadata based File Listing Improvements for Spark**: ?? 
+- Added support for timetravel query. Please check the [quick start page](http://localhost:3000/docs/next/quick-start-guide)
+  on how to invoke the same. 
+
+### Writer side improvements 
+- Virtual keys support has been added where users can avoid adding meta fields to hudi table and leverage existing 
+  fields to populate record keys and partition paths. One needs to disable [this](http://localhost:3000/docs/next/configurations#hoodiepopulatemetafields)
+  config to enable virtual keys. 
+- Clustering improvements:  
+    - Async Clustering support has been added to both DeltaStreamer and Spark Streaming. More on this can be found in this 
+      blog post and how to use the same.
+    - Incremental read works for clustered data as well. ? This is more of a fix right? 
+    - HoodieClusteringJob has been added to assist in building and executing a clustering plan together as a standalone job. 
+    - Added a config (`hoodie.clustering.plan.strategy.daybased.skipfromlatest.partitions`) to skip latest N partitions 
+      while creating the cluster plan.  
+- Bulk_Insert using row writer path has been enhanced and brought to feature parity with WriteClient's version of Bulk_insert. 
+Users can start to use the row writer path for better performance. 
+- Added support for HMS in HiveSyncTool. HMSDDLExecutor is a DDLExecutor implementation, based on HMS which uses HMS 
+  apis directly for all DDL tasks.
+- A pre commit validator framework has been added to spark engine. Users can leverage this to add validations like verifying all valid
+files are present for given instant, or all invalid files are removed, etc as part of this framework.
+- Users can choose to drop fields used to generate partition paths if need be. (`hoodie.datasource.write.drop.partition.columns`)
+- Added support for "delete_partition" operation to spark. Users can leverage this to delete older partitions as and when required.    
+- Disk based map improvements? 
+- Concurrency control: https://issues.apache.org/jira/browse/HUDI-944
+- ORC format support: ?
+- Support for Huawei Cloud Object Storage, BAIDU AFS storage format, Baidu BOS storage in Hudi: Should we expand more on this? 
+- Marker file management has been enhanced to use batch mode and is now centrally coordinate by timeline server. This benefits 
+cloud stores like S3 when large number of marker files are created or deleted during commits. You can read more on this blog.
+- Java engine now support HoodieBloomIndex.
+
+### Flink Integration Improvements
+- **Insert Overwrite Support**: HUDI-1788
+- **Bulk Insert Support**: HUDI-2209
+- **Non Partitioned Table Support**: HUDI-1814
+- **Streaming Read for COW Tables**: HUDI-1867
+- **Emit Deletes for MOR Streaming Read**: HUDI-1738
+- **Global Index Support**: HUDI-1908
+- Propagate CDC format for Hudi
+- Emit deletes for Flink MOR table streaming read 
+- Supports hive style partitioning for flink writer
+- Support Transformer for HoodieFlinkStreamer
+-  Metadata table for flink
+-  Support hive3 meta sync for flink writer
+-  Support hive1 metadata sync for flink writer
+-  Asynchronous Hive sync and commits cleaning for Flink writer
+-  Support flink hive sync in batch mode-
+-  Support Append only in Flink stream
+-  Add marker files for flink writer
+
+## Utilities
+### DeltaStreamer
+- Couple of sources have been added to deltastreamer. 
+  1. JDBC source can assist in reading data from RDBMS sources.
+  2. SQLSource can assist in backfill use-cases like adding a new column and backfilling just one column for the past 6 months.
+  3. S3EventsHoodieIncrSource and S3EventsSource assists in reading data from S3 reliably and efficiently ingesting data to Hudi. 
+     Existing approach of using DFSSource uses last modification time of files as checkpoint to pull in new files, but if large 
+     number of files have same modification time, might run into issues of missing some files to be read from source. 
+     These two sources (S3EventsHoodieIncrSource and S3EventsSource) together ensures data is reliably ingested from S3 into 
+     Hudi by leveraging AWS SNS and SQS services that subscribes to file events from the source bucket.
+- In addition to pulling data from kafka for kafka sources using DeltaStreamer using regular offset format 
+  (topic_name,partition_num:offset,partition_num:offset,....), we also added two new formats. Timestampd based and 
+  Group consumer offset. 
+- Added support to pass in basic auth credentials in schema registry provider url with schema provider in deltastreamer.   
+- Some improvements have been made to hudi-cli like `SCHEDULE COMPACTION` and `RUN COMPACTION` statements to easily schedule and run

Review comment:
       @nsivabalan I had added this in the SQL DML section, not for the CLI. I think you might have added this by mistake.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #3547: Add Hudi 0.9.0 release page with highlights

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on pull request #3547:
URL: https://github.com/apache/hudi/pull/3547#issuecomment-906738392


   Looks like with the new website structure, I could not push changes to others branches :( 
   So, here are my local updates for release highlights. 
   https://gist.github.com/nsivabalan/8c82f565c1834768e900bc75fa1ff795
   I don't have good visibility into Flink and hence have added just titles. We need to fix those. And for some, I have added a line item, but not sure if we need to highlight. 
   
   <img width="1049" alt="Screen Shot 2021-08-26 at 4 54 31 PM" src="https://user-images.githubusercontent.com/513218/131035098-c74d156c-9979-47f7-ad55-95fa63b562c6.png">
   <img width="1017" alt="Screen Shot 2021-08-26 at 4 54 39 PM" src="https://user-images.githubusercontent.com/513218/131035100-a0e74c60-d468-456c-ba2b-96ace4b2df3b.png">
   <img width="997" alt="Screen Shot 2021-08-26 at 4 54 48 PM" src="https://user-images.githubusercontent.com/513218/131035103-15c222dd-2520-435c-8c4c-8e3a26422f27.png">
   <img width="1017" alt="Screen Shot 2021-08-26 at 4 54 57 PM" src="https://user-images.githubusercontent.com/513218/131035104-317b7a59-4912-4df5-b278-72309c9b1f30.png">
   <img width="1013" alt="Screen Shot 2021-08-26 at 4 55 06 PM" src="https://user-images.githubusercontent.com/513218/131035105-199d32ea-01b7-4dba-a8a7-5beef3340681.png">
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on a change in pull request #3547: Add Hudi 0.9.0 release page with highlights

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on a change in pull request #3547:
URL: https://github.com/apache/hudi/pull/3547#discussion_r697710966



##########
File path: website/releases/release-0.9.0.md
##########
@@ -0,0 +1,118 @@
+---
+title: "Release 0.9.0"
+sidebar_position: 2
+layout: releases
+toc: true
+last_modified_at: 2021-08-26T08:40:00-07:00
+---
+# [Release 0.9.0](https://github.com/apache/hudi/releases/tag/release-0.9.0) ([docs](/docs/quick-start-guide))
+
+## Download Information
+* Source Release : [Apache Hudi 0.9.0 Source Release](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz.sha512))
+* Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+## Migration Guide for this release
+- If migrating from release older than 0.5.3, please also check the upgrade instructions for each subsequent release below.
+- Specifically check upgrade instructions for 0.6.0.
+- With 0.9.0 Hudi is adding more table properties to aid in using an existing hudi table with spark-sql. 
+  To smoothly aid this transition these properties added to `hoodie.properties` file. Whenever Hudi is launched with
+  newer table version i.e 2 (or moving from pre 0.9.0 to 0.9.0), an upgrade step will be executed automatically.
+  This automatic upgrade step will happen just once per Hudi table as the `hoodie.table.version` will be updated in 
+  property file after upgrade is completed.
+- Similarly, a command line tool for Downgrading (command - `downgrade`) is added if in case some users want to 
+  downgrade Hudi from table version 2 to 1 or move from Hudi 0.9.0 to pre 0.9.0
+- This release also replaces string representation of all configs to ConfigProperty. Older configs (strings) are deprecated
+and users are encouraged to use the new ConfigProperties. 
+
+## Release Highlights

Review comment:
       highlights are pretty long IMO. can we condense this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan merged pull request #3547: [HUDI-2382] Add Hudi 0.9.0 release page with highlights

Posted by GitBox <gi...@apache.org>.

nsivabalan merged pull request #3547:
URL: https://github.com/apache/hudi/pull/3547


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org