You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/24 05:35:06 UTC

[GitHub] [hudi] bhasudha opened a new pull request #2016: [WIP] Add release page doc for 0.6.0

bhasudha opened a new pull request #2016:
URL: https://github.com/apache/hudi/pull/2016


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on a change in pull request #2016: [WIP] Add release page doc for 0.6.0

Posted by GitBox <gi...@apache.org>.
bhasudha commented on a change in pull request #2016:
URL: https://github.com/apache/hudi/pull/2016#discussion_r475638113



##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is completed.
+ - Similarly, a command line tool for Downgrading is added if in case some users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent Cloud Object Storage` storages.

Review comment:
       Aliyun was added part of 0.5.3 already.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #2016: [WIP] Add release page doc for 0.6.0

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #2016:
URL: https://github.com/apache/hudi/pull/2016#discussion_r475579182



##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 

Review comment:
       minor: not sure whats the usual terminology used in general, but I have used just "dataset" here. Ensure we use some convention that we use everywhere(or may be "hudi dataset"). 

##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is completed.
+ - Similarly, a command line tool for Downgrading is added if in case some users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent Cloud Object Storage` storages.

Review comment:
       do we need to add aliyun here ?

##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release

Review comment:
       don't we need to call out the change in interface for BulkInsertPartitioner? 

##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is completed.
+ - Similarly, a command line tool for Downgrading is added if in case some users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent Cloud Object Storage` storages.
+  - Add support for "bulk_insert" without converting to RDD. This has better performance compared to existing "bulk_insert".
+    This implementation uses Datasource for writing to storage with support for key generators to operate on Row 
+    (rather than HoodieRecords as per previous "bulk_insert") is added.
+  - # TODO Add more about bulk insert modes. 
+  - # TODO Add more on bootstrap.             
+  - In previous versions, auto clean runs synchronously after ingestion. Starting 0.6.0, Hudi does cleaning and ingestion in parallel.
+  - Support async compaction for spark streaming writes to hudi table. Previous versions supported only inline compactions.
+  - Implemented rollbacks using marker files instead of relying on commit metadata. Please check the migration guide for more details on this.
+  - A new InlineFileSystem has been added to support embedding any file format as an inline format within a regular file.
+
+#### Query side improvements:
+  - Starting 0.6.0, snapshot queries are feasible via spark datasource. 
+  - In prior versions we only supported HoodieCombineHiveInputFormat for CopyOnWrite tables to ensure that there is a limit on the number of mappers spawned for
+    any query. Hudi now supports Merge on Read tables also using HoodieCombineInputFormat.
+  - Speedup spark read queries by caching metaclient in HoodieROPathFilter. This helps reduce listing related overheads in S3 when filtering files for read-optimized queries. 
+
+#### DeltaStreamer improvements:
+  - HoodieMultiDeltaStreamer: adds support for ingesting multiple kafka streams in a single DeltaStreamer deployment
+  - Added a new tool - InitialCheckPointProvider, to set checkpoints when migrating to DeltaStreamer after an initial load of the table is complete..
+  - Add CSV source support.
+  - Added chained transformer that can add chain multiple transformers.
+
+#### Indexing improvements:
+  - Added a new index `HoodieSimpleIndex` which joins incoming records with base files to index records.
+  - Added ability to configure user defined indexes.
+
+#### Key generation improvements:
+  - Introduced `CustomTimestampBasedKeyGenerator` to support complex keys as record key and custom partition paths.
+  - Support more time units and dat/time formats in `TimestampBasedKeyGenerator`  

Review comment:
       I see there are few more improvements to TimestampBasedKeyGen. prob, can list everything in this line. Feel free to take a call though.
    Add support for multiple date/time formats in TimestampBasedKeyGenerator
    Support for complex record keys with TimestampBasedKeyGenerator
    Support different time units in TimestampBasedKeyGenerator

##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release

Review comment:
       also, @vinothchandar : we have introduced a new interface for KeyGenerator right(KeyGeneratorInterface). I understand no changes are required ATM from users standpoint for this release. But, is there any comms we need to do here wrt that. Something like "in future users might have to migrate to using the new interface rather than existing one" sort of. 

##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is completed.
+ - Similarly, a command line tool for Downgrading is added if in case some users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent Cloud Object Storage` storages.
+  - Add support for "bulk_insert" without converting to RDD. This has better performance compared to existing "bulk_insert".
+    This implementation uses Datasource for writing to storage with support for key generators to operate on Row 
+    (rather than HoodieRecords as per previous "bulk_insert") is added.
+  - # TODO Add more about bulk insert modes. 
+  - # TODO Add more on bootstrap.             

Review comment:
       in the release notes [link](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663), wondering why bootstrap is not listed under "New Features"

##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is completed.
+ - Similarly, a command line tool for Downgrading is added if in case some users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent Cloud Object Storage` storages.
+  - Add support for "bulk_insert" without converting to RDD. This has better performance compared to existing "bulk_insert".
+    This implementation uses Datasource for writing to storage with support for key generators to operate on Row 
+    (rather than HoodieRecords as per previous "bulk_insert") is added.
+  - # TODO Add more about bulk insert modes. 
+  - # TODO Add more on bootstrap.             
+  - In previous versions, auto clean runs synchronously after ingestion. Starting 0.6.0, Hudi does cleaning and ingestion in parallel.
+  - Support async compaction for spark streaming writes to hudi table. Previous versions supported only inline compactions.
+  - Implemented rollbacks using marker files instead of relying on commit metadata. Please check the migration guide for more details on this.
+  - A new InlineFileSystem has been added to support embedding any file format as an inline format within a regular file.
+
+#### Query side improvements:
+  - Starting 0.6.0, snapshot queries are feasible via spark datasource. 
+  - In prior versions we only supported HoodieCombineHiveInputFormat for CopyOnWrite tables to ensure that there is a limit on the number of mappers spawned for
+    any query. Hudi now supports Merge on Read tables also using HoodieCombineInputFormat.
+  - Speedup spark read queries by caching metaclient in HoodieROPathFilter. This helps reduce listing related overheads in S3 when filtering files for read-optimized queries. 
+
+#### DeltaStreamer improvements:
+  - HoodieMultiDeltaStreamer: adds support for ingesting multiple kafka streams in a single DeltaStreamer deployment
+  - Added a new tool - InitialCheckPointProvider, to set checkpoints when migrating to DeltaStreamer after an initial load of the table is complete..
+  - Add CSV source support.
+  - Added chained transformer that can add chain multiple transformers.
+
+#### Indexing improvements:
+  - Added a new index `HoodieSimpleIndex` which joins incoming records with base files to index records.
+  - Added ability to configure user defined indexes.
+
+#### Key generation improvements:
+  - Introduced `CustomTimestampBasedKeyGenerator` to support complex keys as record key and custom partition paths.

Review comment:
       Guess we missed to fix the commit msg or ticket title appropriately. This is actually called as "ComplexKeyGenerator" in code now.

##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is completed.
+ - Similarly, a command line tool for Downgrading is added if in case some users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent Cloud Object Storage` storages.
+  - Add support for "bulk_insert" without converting to RDD. This has better performance compared to existing "bulk_insert".
+    This implementation uses Datasource for writing to storage with support for key generators to operate on Row 
+    (rather than HoodieRecords as per previous "bulk_insert") is added.
+  - # TODO Add more about bulk insert modes. 
+  - # TODO Add more on bootstrap.             
+  - In previous versions, auto clean runs synchronously after ingestion. Starting 0.6.0, Hudi does cleaning and ingestion in parallel.
+  - Support async compaction for spark streaming writes to hudi table. Previous versions supported only inline compactions.
+  - Implemented rollbacks using marker files instead of relying on commit metadata. Please check the migration guide for more details on this.
+  - A new InlineFileSystem has been added to support embedding any file format as an inline format within a regular file.
+
+#### Query side improvements:
+  - Starting 0.6.0, snapshot queries are feasible via spark datasource. 
+  - In prior versions we only supported HoodieCombineHiveInputFormat for CopyOnWrite tables to ensure that there is a limit on the number of mappers spawned for
+    any query. Hudi now supports Merge on Read tables also using HoodieCombineInputFormat.
+  - Speedup spark read queries by caching metaclient in HoodieROPathFilter. This helps reduce listing related overheads in S3 when filtering files for read-optimized queries. 
+
+#### DeltaStreamer improvements:
+  - HoodieMultiDeltaStreamer: adds support for ingesting multiple kafka streams in a single DeltaStreamer deployment
+  - Added a new tool - InitialCheckPointProvider, to set checkpoints when migrating to DeltaStreamer after an initial load of the table is complete..
+  - Add CSV source support.
+  - Added chained transformer that can add chain multiple transformers.
+
+#### Indexing improvements:
+  - Added a new index `HoodieSimpleIndex` which joins incoming records with base files to index records.
+  - Added ability to configure user defined indexes.
+
+#### Key generation improvements:
+  - Introduced `CustomTimestampBasedKeyGenerator` to support complex keys as record key and custom partition paths.
+  - Support more time units and dat/time formats in `TimestampBasedKeyGenerator`  
+
+#### Developer productivity and monitoring improvements:
+  - Spark DAGs are named to aid better debuggability
+  - Console, JMX, Prometheus and DataDog metric reporters have been added.
+  - Support pluggable metrics reporting by introducing proper abstraction for user defined metrics.
+
+#### CLI related features:
+  - Added support for deleting savepoints via CLI
+  - Added a new command - `export instants`, to export metadata of instants

Review comment:
       I see we have called out in migration section. so I assume the upradedowngrade command is intentionally left out here.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha closed pull request #2016: [WIP] Add release page doc for 0.6.0

Posted by GitBox <gi...@apache.org>.
bhasudha closed pull request #2016:
URL: https://github.com/apache/hudi/pull/2016


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on a change in pull request #2016: [WIP] Add release page doc for 0.6.0

Posted by GitBox <gi...@apache.org>.
bhasudha commented on a change in pull request #2016:
URL: https://github.com/apache/hudi/pull/2016#discussion_r475646635



##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is completed.
+ - Similarly, a command line tool for Downgrading is added if in case some users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent Cloud Object Storage` storages.
+  - Add support for "bulk_insert" without converting to RDD. This has better performance compared to existing "bulk_insert".
+    This implementation uses Datasource for writing to storage with support for key generators to operate on Row 
+    (rather than HoodieRecords as per previous "bulk_insert") is added.
+  - # TODO Add more about bulk insert modes. 
+  - # TODO Add more on bootstrap.             
+  - In previous versions, auto clean runs synchronously after ingestion. Starting 0.6.0, Hudi does cleaning and ingestion in parallel.
+  - Support async compaction for spark streaming writes to hudi table. Previous versions supported only inline compactions.
+  - Implemented rollbacks using marker files instead of relying on commit metadata. Please check the migration guide for more details on this.
+  - A new InlineFileSystem has been added to support embedding any file format as an inline format within a regular file.
+
+#### Query side improvements:
+  - Starting 0.6.0, snapshot queries are feasible via spark datasource. 
+  - In prior versions we only supported HoodieCombineHiveInputFormat for CopyOnWrite tables to ensure that there is a limit on the number of mappers spawned for
+    any query. Hudi now supports Merge on Read tables also using HoodieCombineInputFormat.
+  - Speedup spark read queries by caching metaclient in HoodieROPathFilter. This helps reduce listing related overheads in S3 when filtering files for read-optimized queries. 
+
+#### DeltaStreamer improvements:
+  - HoodieMultiDeltaStreamer: adds support for ingesting multiple kafka streams in a single DeltaStreamer deployment
+  - Added a new tool - InitialCheckPointProvider, to set checkpoints when migrating to DeltaStreamer after an initial load of the table is complete..
+  - Add CSV source support.
+  - Added chained transformer that can add chain multiple transformers.
+
+#### Indexing improvements:
+  - Added a new index `HoodieSimpleIndex` which joins incoming records with base files to index records.
+  - Added ability to configure user defined indexes.
+
+#### Key generation improvements:
+  - Introduced `CustomTimestampBasedKeyGenerator` to support complex keys as record key and custom partition paths.
+  - Support more time units and dat/time formats in `TimestampBasedKeyGenerator`  
+
+#### Developer productivity and monitoring improvements:
+  - Spark DAGs are named to aid better debuggability
+  - Console, JMX, Prometheus and DataDog metric reporters have been added.
+  - Support pluggable metrics reporting by introducing proper abstraction for user defined metrics.
+
+#### CLI related features:
+  - Added support for deleting savepoints via CLI
+  - Added a new command - `export instants`, to export metadata of instants

Review comment:
       Let me add one line here and point to the migration section for more details. Like I did in [line 34.](https://github.com/apache/hudi/pull/2016/files#diff-21c3ed259536d942a5f57ecff7d2a17aR34)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on a change in pull request #2016: [WIP] Add release page doc for 0.6.0

Posted by GitBox <gi...@apache.org>.
bhasudha commented on a change in pull request #2016:
URL: https://github.com/apache/hudi/pull/2016#discussion_r475645263



##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is completed.
+ - Similarly, a command line tool for Downgrading is added if in case some users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent Cloud Object Storage` storages.
+  - Add support for "bulk_insert" without converting to RDD. This has better performance compared to existing "bulk_insert".
+    This implementation uses Datasource for writing to storage with support for key generators to operate on Row 
+    (rather than HoodieRecords as per previous "bulk_insert") is added.
+  - # TODO Add more about bulk insert modes. 
+  - # TODO Add more on bootstrap.             
+  - In previous versions, auto clean runs synchronously after ingestion. Starting 0.6.0, Hudi does cleaning and ingestion in parallel.
+  - Support async compaction for spark streaming writes to hudi table. Previous versions supported only inline compactions.
+  - Implemented rollbacks using marker files instead of relying on commit metadata. Please check the migration guide for more details on this.
+  - A new InlineFileSystem has been added to support embedding any file format as an inline format within a regular file.
+
+#### Query side improvements:
+  - Starting 0.6.0, snapshot queries are feasible via spark datasource. 
+  - In prior versions we only supported HoodieCombineHiveInputFormat for CopyOnWrite tables to ensure that there is a limit on the number of mappers spawned for
+    any query. Hudi now supports Merge on Read tables also using HoodieCombineInputFormat.
+  - Speedup spark read queries by caching metaclient in HoodieROPathFilter. This helps reduce listing related overheads in S3 when filtering files for read-optimized queries. 
+
+#### DeltaStreamer improvements:
+  - HoodieMultiDeltaStreamer: adds support for ingesting multiple kafka streams in a single DeltaStreamer deployment
+  - Added a new tool - InitialCheckPointProvider, to set checkpoints when migrating to DeltaStreamer after an initial load of the table is complete..
+  - Add CSV source support.
+  - Added chained transformer that can add chain multiple transformers.
+
+#### Indexing improvements:
+  - Added a new index `HoodieSimpleIndex` which joins incoming records with base files to index records.
+  - Added ability to configure user defined indexes.
+
+#### Key generation improvements:
+  - Introduced `CustomTimestampBasedKeyGenerator` to support complex keys as record key and custom partition paths.
+  - Support more time units and dat/time formats in `TimestampBasedKeyGenerator`  
+
+#### Developer productivity and monitoring improvements:
+  - Spark DAGs are named to aid better debuggability
+  - Console, JMX, Prometheus and DataDog metric reporters have been added.
+  - Support pluggable metrics reporting by introducing proper abstraction for user defined metrics.
+
+#### CLI related features:
+  - Added support for deleting savepoints via CLI
+  - Added a new command - `export instants`, to export metadata of instants
+
+#### Other features:
+  - Data snapshot exporter is added for usability. Latest version of records as of a certain point in time can be exported as plain parquet files with this tool.
+  - Introduce write committed callback hooks for incremental pipelines to be notified and act on new commits in the timeline.
+

Review comment:
       Yeah. I was thinking of highlighting new features and improvements along with any callouts to migration. Including bugs would be a long list of things. So lets leave that. Its capture in release notes anyways. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on a change in pull request #2016: [WIP] Add release page doc for 0.6.0

Posted by GitBox <gi...@apache.org>.
bhasudha commented on a change in pull request #2016:
URL: https://github.com/apache/hudi/pull/2016#discussion_r475641124



##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is completed.
+ - Similarly, a command line tool for Downgrading is added if in case some users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent Cloud Object Storage` storages.
+  - Add support for "bulk_insert" without converting to RDD. This has better performance compared to existing "bulk_insert".
+    This implementation uses Datasource for writing to storage with support for key generators to operate on Row 
+    (rather than HoodieRecords as per previous "bulk_insert") is added.
+  - # TODO Add more about bulk insert modes. 
+  - # TODO Add more on bootstrap.             

Review comment:
       Oh I know.  The main bootstrap feature for RFC 12 - https://issues.apache.org/jira/browse/HUDI-242 has some more pending subtasks which are going in 0.6.1 . So I marked the main ticket as 0.6.1. All others were tagged as sub task of this main feature. Thats why. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on a change in pull request #2016: [WIP] Add release page doc for 0.6.0

Posted by GitBox <gi...@apache.org>.
bhasudha commented on a change in pull request #2016:
URL: https://github.com/apache/hudi/pull/2016#discussion_r475643657



##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is completed.
+ - Similarly, a command line tool for Downgrading is added if in case some users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent Cloud Object Storage` storages.
+  - Add support for "bulk_insert" without converting to RDD. This has better performance compared to existing "bulk_insert".
+    This implementation uses Datasource for writing to storage with support for key generators to operate on Row 
+    (rather than HoodieRecords as per previous "bulk_insert") is added.
+  - # TODO Add more about bulk insert modes. 
+  - # TODO Add more on bootstrap.             
+  - In previous versions, auto clean runs synchronously after ingestion. Starting 0.6.0, Hudi does cleaning and ingestion in parallel.
+  - Support async compaction for spark streaming writes to hudi table. Previous versions supported only inline compactions.
+  - Implemented rollbacks using marker files instead of relying on commit metadata. Please check the migration guide for more details on this.
+  - A new InlineFileSystem has been added to support embedding any file format as an inline format within a regular file.
+
+#### Query side improvements:
+  - Starting 0.6.0, snapshot queries are feasible via spark datasource. 
+  - In prior versions we only supported HoodieCombineHiveInputFormat for CopyOnWrite tables to ensure that there is a limit on the number of mappers spawned for
+    any query. Hudi now supports Merge on Read tables also using HoodieCombineInputFormat.
+  - Speedup spark read queries by caching metaclient in HoodieROPathFilter. This helps reduce listing related overheads in S3 when filtering files for read-optimized queries. 
+
+#### DeltaStreamer improvements:
+  - HoodieMultiDeltaStreamer: adds support for ingesting multiple kafka streams in a single DeltaStreamer deployment
+  - Added a new tool - InitialCheckPointProvider, to set checkpoints when migrating to DeltaStreamer after an initial load of the table is complete..
+  - Add CSV source support.
+  - Added chained transformer that can add chain multiple transformers.
+
+#### Indexing improvements:
+  - Added a new index `HoodieSimpleIndex` which joins incoming records with base files to index records.
+  - Added ability to configure user defined indexes.
+
+#### Key generation improvements:
+  - Introduced `CustomTimestampBasedKeyGenerator` to support complex keys as record key and custom partition paths.

Review comment:
       fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on pull request #2016: [WIP] Add release page doc for 0.6.0

Posted by GitBox <gi...@apache.org>.
bhasudha commented on pull request #2016:
URL: https://github.com/apache/hudi/pull/2016#issuecomment-679383433


   closing this in favor of https://github.com/apache/hudi/pull/2028 . Capture the comment there. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bhasudha commented on a change in pull request #2016: [WIP] Add release page doc for 0.6.0

Posted by GitBox <gi...@apache.org>.
bhasudha commented on a change in pull request #2016:
URL: https://github.com/apache/hudi/pull/2016#discussion_r475661834



##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 

Review comment:
       I changed dataset -> table and all references to hoodie as hudi in non code context.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #2016: [WIP] Add release page doc for 0.6.0

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #2016:
URL: https://github.com/apache/hudi/pull/2016#discussion_r475627122



##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is completed.
+ - Similarly, a command line tool for Downgrading is added if in case some users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent Cloud Object Storage` storages.
+  - Add support for "bulk_insert" without converting to RDD. This has better performance compared to existing "bulk_insert".
+    This implementation uses Datasource for writing to storage with support for key generators to operate on Row 
+    (rather than HoodieRecords as per previous "bulk_insert") is added.
+  - # TODO Add more about bulk insert modes. 
+  - # TODO Add more on bootstrap.             
+  - In previous versions, auto clean runs synchronously after ingestion. Starting 0.6.0, Hudi does cleaning and ingestion in parallel.
+  - Support async compaction for spark streaming writes to hudi table. Previous versions supported only inline compactions.
+  - Implemented rollbacks using marker files instead of relying on commit metadata. Please check the migration guide for more details on this.
+  - A new InlineFileSystem has been added to support embedding any file format as an inline format within a regular file.
+
+#### Query side improvements:
+  - Starting 0.6.0, snapshot queries are feasible via spark datasource. 
+  - In prior versions we only supported HoodieCombineHiveInputFormat for CopyOnWrite tables to ensure that there is a limit on the number of mappers spawned for
+    any query. Hudi now supports Merge on Read tables also using HoodieCombineInputFormat.
+  - Speedup spark read queries by caching metaclient in HoodieROPathFilter. This helps reduce listing related overheads in S3 when filtering files for read-optimized queries. 
+
+#### DeltaStreamer improvements:
+  - HoodieMultiDeltaStreamer: adds support for ingesting multiple kafka streams in a single DeltaStreamer deployment
+  - Added a new tool - InitialCheckPointProvider, to set checkpoints when migrating to DeltaStreamer after an initial load of the table is complete..
+  - Add CSV source support.
+  - Added chained transformer that can add chain multiple transformers.
+
+#### Indexing improvements:
+  - Added a new index `HoodieSimpleIndex` which joins incoming records with base files to index records.
+  - Added ability to configure user defined indexes.
+
+#### Key generation improvements:
+  - Introduced `CustomTimestampBasedKeyGenerator` to support complex keys as record key and custom partition paths.
+  - Support more time units and dat/time formats in `TimestampBasedKeyGenerator`  
+
+#### Developer productivity and monitoring improvements:
+  - Spark DAGs are named to aid better debuggability
+  - Console, JMX, Prometheus and DataDog metric reporters have been added.
+  - Support pluggable metrics reporting by introducing proper abstraction for user defined metrics.
+
+#### CLI related features:
+  - Added support for deleting savepoints via CLI
+  - Added a new command - `export instants`, to export metadata of instants
+
+#### Other features:
+  - Data snapshot exporter is added for usability. Latest version of records as of a certain point in time can be exported as plain parquet files with this tool.
+  - Introduce write committed callback hooks for incremental pipelines to be notified and act on new commits in the timeline.
+

Review comment:
       if you plan to list out any bugs, probably consider 
   [HUDI-1068] - HoodieGlobalBloomIndex does not correctly send deletes to older partition when partition path is updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #2016: [WIP] Add release page doc for 0.6.0

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #2016:
URL: https://github.com/apache/hudi/pull/2016#discussion_r475863568



##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) ([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) ([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), [sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available [here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is completed.
+ - Similarly, a command line tool for Downgrading is added if in case some users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent Cloud Object Storage` storages.
+  - Add support for "bulk_insert" without converting to RDD. This has better performance compared to existing "bulk_insert".
+    This implementation uses Datasource for writing to storage with support for key generators to operate on Row 
+    (rather than HoodieRecords as per previous "bulk_insert") is added.
+  - # TODO Add more about bulk insert modes. 
+  - # TODO Add more on bootstrap.             
+  - In previous versions, auto clean runs synchronously after ingestion. Starting 0.6.0, Hudi does cleaning and ingestion in parallel.
+  - Support async compaction for spark streaming writes to hudi table. Previous versions supported only inline compactions.
+  - Implemented rollbacks using marker files instead of relying on commit metadata. Please check the migration guide for more details on this.
+  - A new InlineFileSystem has been added to support embedding any file format as an inline format within a regular file.
+
+#### Query side improvements:
+  - Starting 0.6.0, snapshot queries are feasible via spark datasource. 
+  - In prior versions we only supported HoodieCombineHiveInputFormat for CopyOnWrite tables to ensure that there is a limit on the number of mappers spawned for
+    any query. Hudi now supports Merge on Read tables also using HoodieCombineInputFormat.
+  - Speedup spark read queries by caching metaclient in HoodieROPathFilter. This helps reduce listing related overheads in S3 when filtering files for read-optimized queries. 
+
+#### DeltaStreamer improvements:
+  - HoodieMultiDeltaStreamer: adds support for ingesting multiple kafka streams in a single DeltaStreamer deployment
+  - Added a new tool - InitialCheckPointProvider, to set checkpoints when migrating to DeltaStreamer after an initial load of the table is complete..
+  - Add CSV source support.
+  - Added chained transformer that can add chain multiple transformers.
+
+#### Indexing improvements:
+  - Added a new index `HoodieSimpleIndex` which joins incoming records with base files to index records.
+  - Added ability to configure user defined indexes.
+
+#### Key generation improvements:
+  - Introduced `CustomTimestampBasedKeyGenerator` to support complex keys as record key and custom partition paths.
+  - Support more time units and dat/time formats in `TimestampBasedKeyGenerator`  
+
+#### Developer productivity and monitoring improvements:
+  - Spark DAGs are named to aid better debuggability
+  - Console, JMX, Prometheus and DataDog metric reporters have been added.
+  - Support pluggable metrics reporting by introducing proper abstraction for user defined metrics.
+
+#### CLI related features:
+  - Added support for deleting savepoints via CLI
+  - Added a new command - `export instants`, to export metadata of instants

Review comment:
       sure. you can add something like this. Feel free to edit as per convenience.
   ```
   A command line tool is added to hudi-cli, to assist in upgrading or downgrading the hoodie dataset. "UPGRADE" or "DOWNGRADE" is the command to use. DOWNGRADE has to be done using hudi-cli if someone prefers to downgrade their hoodie dataset from 0.6.0 to any pre 0.6.0 versions. 
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org