You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hudi.apache.org by leesf <le...@gmail.com> on 2022/02/27 15:38:00 UTC

[ANNOUNCE] Hudi Community Update(2022-02-13 ~ 2022-02-27)

Dear community,

Nice to share Hudi community bi-weekly updates for 2022-02-23 ~ 2022-02-27
with updates on features, bug fixes.


=======================================
Features


[Spark] Introduce HoodieCatalog to manage tables for Spark Datasource V2 [1]
[Core] Added new cleaning policy based on number of hours [2]
[Spark] upgrade spark to 3.2.1 [3]
[Spark] Add Call Produce Command for Spark SQL [4]
[Spark] Support Metadata Table in Spark Datasource [5]



[1] https://issues.apache.org/jira/browse/HUDI-3254
[2] https://issues.apache.org/jira/browse/HUDI-349
[3] https://issues.apache.org/jira/browse/HUDI-3432
[4] https://issues.apache.org/jira/browse/HUDI-3161
[5] https://issues.apache.org/jira/browse/HUDI-1296



=======================================
Bugs

[Core] fix Sql source's checkpoint issue [1]
[Core] The files recorded in the commit may not match the actual ones for
MOR Compaction [2]
[Core] TypedProperties no need to create new set when check key exist or
not [3]
[Core] If mode==ignore && tableExists, do not execute write logic and sync
hive [4]
[Core] Fix the build on aarch64, Fedora 33 [5]
[Core] Fix TableSchemaResolver for all file formats and metadata table[6]
[Core] deprecate hoodie.file.index.enable and unify to use
BaseFileOnlyViewRelation to handle [7]
[Core] Make archiving an async service [8]
[Core] fix problem that spark on TimestampKeyGenerator has no result when
query by partition column [9]
[Core] Add config to disable table services [10]
[Core] Remove hardcoded logic of disabling metadata table in tests [11]
[Core] Cleaning up Hive-related hierarchies after refactoring [12]
[Core] Sync datasource clustering config [13]
[Core] Introduce a checksum mechanism for validating hoodie.properties [14]
[Deltastreamer] Fix Deltastreamer to properly shut down the services upon
failure [15]
[Core] Avoid getSmallFiles if hoodie.parquet.small.file.limit is 0 [16]
[Spark] fix ColumnarArrayData ClassCastException issue [17]
[Flink] Supports batch reader in BootstrapOperator#loadRecords [18]
[Core] Fix BulkInsertPartitioner generic type [19]
[Core] Retry FileSystem action instead of failed directly [20]
[Core] Fixing restore with metadata enabled [21]
[Core] Fixing checkpoint management in hoodie incr source [22]
[Core] Abstract Spark update Strategy to make code more clean and remove
duplicates [23]
[Core] Fix duplicate cleaning of same files when unfinished clean
operations are present using a config [24]
[Deltastreamer] Adding delete partitions support to DeltaStreamer [25]
[Flink] The archived timeline for flink streaming reader should not be
reused [26]
[Core] Fix wrong field order for constructing HoodieMetadataColumnStats [27]
[Core] The flink small file list should exclude file slices with pending
compaction [28]
[Core] Not table to get execution plan [29]
[Core] fix NPE caused by incorrect beforeKeyGenClassName validation [30]
[Flink] Add more document to Pipelines for the usage of this tool to build
a write pipeline [31]
[Core] Pending clustering may break
AbstractTableFileSystemView#getxxBaseFile() [32]
[Core] Refactor clustering executors [33]
[Core] Making rdd unpersist optional at the end of writes [34]



[1] https://issues.apache.org/jira/browse/HUDI-2413
[2] https://issues.apache.org/jira/browse/HUDI-3370
[3] https://issues.apache.org/jira/browse/HUDI-3412
[4] https://issues.apache.org/jira/browse/HUDI-3272
[5] https://issues.apache.org/jira/browse/HUDI-1657
[6] https://issues.apache.org/jira/browse/HUDI-3398
[7] https://issues.apache.org/jira/browse/HUDI-3200
[8] https://issues.apache.org/jira/browse/HUDI-1576
[9] https://issues.apache.org/jira/browse/HUDI-3204
[10] https://issues.apache.org/jira/browse/HUDI-2931
[11] https://issues.apache.org/jira/browse/HUDI-3394
[12] https://issues.apache.org/jira/browse/HUDI-3280
[13] https://issues.apache.org/jira/browse/HUDI-3426
[14] https://issues.apache.org/jira/browse/HUDI-2809
[15] https://issues.apache.org/jira/browse/HUDI-3430
[16] https://issues.apache.org/jira/browse/HUDI-3438
[17] https://issues.apache.org/jira/browse/HUDI-3389
[18] https://issues.apache.org/jira/browse/HUDI-3446
[19] https://issues.apache.org/jira/browse/HUDI-3458
[20] https://issues.apache.org/jira/browse/HUDI-2648
[21] https://issues.apache.org/jira/browse/HUDI-3432
[22] https://issues.apache.org/jira/browse/HUDI-3455
[23] https://issues.apache.org/jira/browse/HUDI-3042
[24] https://issues.apache.org/jira/browse/HUDI-2925
[25] https://issues.apache.org/jira/browse/HUDI-2189
[26] https://issues.apache.org/jira/browse/HUDI-3461
[27] https://issues.apache.org/jira/browse/HUDI-3486
[28] https://issues.apache.org/jira/browse/HUDI-3488
[29] https://issues.apache.org/jira/browse/HUDI-3494
[30] https://issues.apache.org/jira/browse/HUDI-3401
[31] https://issues.apache.org/jira/browse/HUDI-3474
[32] https://issues.apache.org/jira/browse/HUDI-3421
[33] https://issues.apache.org/jira/browse/HUDI-3042
[34] https://issues.apache.org/jira/browse/HUDI-3515


===================================
Tests

[Tests] Remove hardcoded logic of disabling metadata table in tests [1]
[Tests] Enchancements to integ test suite [2]
[Tests] Support clustering scheduleAndExecute for hudi-cli and add
clustering-cli Tests [3]



[1] https://issues.apache.org/jira/browse/HUDI-3366
[2] https://issues.apache.org/jira/browse/HUDI-3480
[3] https://issues.apache.org/jira/browse/HUDI-3429



Best,
Leesf