You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hudi.apache.org by leesf <le...@gmail.com> on 2022/03/13 15:31:00 UTC

[ANNOUNCE] Hudi Community Update(2022-02-28 ~ 2022-03-13)

Dear community,

Nice to share Hudi community bi-weekly updates for 2022-02-28 ~ 2022-03-13
with updates on features, bug fixes.


=======================================
Features


[Spark] Adding Datatable validator tool [1]
[Core] Add support for "marker delete" in hudi-cli [2]
[Flink] RFC-35 Part-1 Support bucket index in Flink writer  [3]
[Spark] RFC-27: Data skipping index to improve query performance [4]
[Spark] Support Clustering Command Based on Call Procedure Command for
Spark SQL [5]
[Spark] [RFC-47] Add Call Produce Command for Spark SQL [6]
[Core] Introduce DeleteSupportSchemaPostProcessor to support adding
_hoodie_is_deleted column to schema [7]
[Core] Introduce JsonkafkaSourceProcessor to support data preprocess before
it is transformed to DataSet [8]
[Flink] Add DFS based message queue for flink writer[part3] [9]
[Core] Support querying a table as of a savepoint [10]
[Core] Introduce ChainedSchemaPostProcessor to support setting multi
processors at once [11]
[Core] Introduce DropColumnSchemaPostProcessor to support drop columns from
schema [12]
[Core] [RFC-42] RFC for consistent hashing index [13]
[Core] Support savepoints command based on Call Produce Command [14]


[1] https://issues.apache.org/jira/browse/HUDI-3497
[2] https://issues.apache.org/jira/browse/HUDI-3441
[3] https://issues.apache.org/jira/browse/HUDI-3315
[4] https://issues.apache.org/jira/browse/HUDI-2973
[5] https://issues.apache.org/jira/browse/HUDI-3445
[6] https://issues.apache.org/jira/browse/HUDI-3161
[7] https://issues.apache.org/jira/browse/HUDI-3520
[8] https://issues.apache.org/jira/browse/HUDI-3525
[9] https://issues.apache.org/jira/browse/HUDI-2677
[10] https://issues.apache.org/jira/browse/HUDI-3221
[11] https://issues.apache.org/jira/browse/HUDI-3568
[12] https://issues.apache.org/jira/browse/HUDI-3522
[13] https://issues.apache.org/jira/browse/HUDI-2999
[14] https://issues.apache.org/jira/browse/HUDI-3501

=======================================
Bugs

[Core] Fixing kakfa key and value serializer value type from class to
string [1]
[Core] Adding validation to dataframe scheme to ensure reserved field does
not have diff data type [2]
[Core] rollback insert data appended to log file when using Hbase Index [3]
[Core] Fix String convert issue and overwrite putAll method in
TypedProperties.java [4]
[Core] Fix log file reader for S3 with hadoop-aws 2.7.x [5]
[CLI] Avoid passing empty string spark master to hudi cli [6]
[Core] Save timeout option for remote RemoteFileSystemView [7]
[Core] Add validation of column stats and bloom filters in
HoodieMetadataTableValidator [8]
[Core] Implement record iterator for HoodieDataBlock [9]
[Flink] In CompactFunction, set up the write schema each time with the
latest schema[10]
[Core] made schema registry urls configurable with MTDS [11]
[Core] Fixing "populate meta fields" update to metadata table [12]
[Core] Fix if user specify key "hoodie.datasource.clustering.async.enable"
directly, async clustering not work [13]
[Flink] Add reader merge memory option for flink [14]
[Core] Fixing timeline server for repeated refreshes [15]
[Core] Fixing Hive getSchema for RT tables addressing different partitions
having different schemas[16]
[Core] Improve HoodieMergedLogRecordScanner avoid putting unnecessary
hoodie records [17]
[Core] Making commit preserve metadata to true for compaction [18]
[Core] Avoid including whole MultipleSparkJobExecutionStrategy object into
the closure for Spark to serialize [19]
[Core] Make sure Metadata Table records are updated appropriately on HDFS
[20]
[Core] support set --sparkMaster for MDT cli [21]
[Core] Configuring timeline refreshes based on latest commit [22]
[Flink] Flink cleanFuntion execute clean on initialization [23]
[Build] Improve maven module configs for different spark profiles [24]
[Core] HoodieData for metadata index records; BloomFilter construction from
index based on the type param [25]
[Core] Sync column comments while syncing a hive table [26]
[Core] Make sure BaseFileOnlyViewRelation only reads projected columns [27]
[Core] Fixing NULL schema provider for empty batch [28]
[Core] Refactor HoodieCommonUtils to make code more reasonable [29]
[Core] Make sure Column Stats does not fail in case it fails to load
previous Index Table state [30]
[Core] Fix NPE of DefaultHoodieRecordPayload if Property is empty [31]
[Core]  Re-use rollback instant for rolling back of clustering and
compaction if rollback failed mid-way [32]
[Core] Restore TypedProperties and flush checksum in table config [33]
[Core] Fix MarkerBasedRollbackStrategy NoSuchElementException [34]


[1] https://issues.apache.org/jira/browse/HUDI-3521
[2] https://issues.apache.org/jira/browse/HUDI-3018
[3] https://issues.apache.org/jira/browse/HUDI-2917
[4] https://issues.apache.org/jira/browse/HUDI-3528
[5] https://issues.apache.org/jira/browse/HUDI-3341
[6] https://issues.apache.org/jira/browse/HUDI-3450
[7] https://issues.apache.org/jira/browse/HUDI-3418
[8] https://issues.apache.org/jira/browse/HUDI-3465
[9] https://issues.apache.org/jira/browse/HUDI-3516
[10] https://issues.apache.org/jira/browse/HUDI-2631
[11] https://issues.apache.org/jira/browse/HUDI-3264
[12] https://issues.apache.org/jira/browse/HUDI-3544
[13] https://issues.apache.org/jira/browse/HUDI-3548
[14] https://issues.apache.org/jira/browse/HUDI-3460
[15] https://issues.apache.org/jira/browse/HUDI-2761
[16] https://issues.apache.org/jira/browse/HUDI-3130
[17] https://issues.apache.org/jira/browse/HUDI-3069
[18] https://issues.apache.org/jira/browse/HUDI-3213
[19] https://issues.apache.org/jira/browse/HUDI-3561
[20] https://issues.apache.org/jira/browse/HUDI-3365
[21] https://issues.apache.org/jira/browse/HUDI-2747
[22] https://issues.apache.org/jira/browse/HUDI-3576
[23] https://issues.apache.org/jira/browse/HUDI-3573
[24] https://issues.apache.org/jira/browse/HUDI-3574
[25] https://issues.apache.org/jira/browse/HUDI-3356
[26] https://issues.apache.org/jira/browse/HUDI-3383
[27] https://issues.apache.org/jira/browse/HUDI-3396
[28] https://issues.apache.org/jira/browse/HUDI-3595
[29] https://issues.apache.org/jira/browse/HUDI-3567
[30] https://issues.apache.org/jira/browse/HUDI-3513
[31] https://issues.apache.org/jira/browse/HUDI-3592
[32] https://issues.apache.org/jira/browse/HUDI-3556
[33] https://issues.apache.org/jira/browse/HUDI-3593
[34] https://issues.apache.org/jira/browse/HUDI-3583


===================================
Tests

[Tests] Refactor HoodieTestDataGenerator to provide for reproducible Builds
[1]
[Tests] Add UT to verify HoodieRealtimeFileSplit serde  [2]
[Tests] Skip integ test modules by default [3]
[Tests] Add Trino Queries in integration tests [4]
[Tests] Use HoodieTestDataGenerator#TRIP_SCHEMA as example schema in
TestSchemaPostProcessor [5]


[1] https://issues.apache.org/jira/browse/HUDI-3469
[2] https://issues.apache.org/jira/browse/HUDI-3348
[3] https://issues.apache.org/jira/browse/HUDI-3584
[4] https://issues.apache.org/jira/browse/HUDI-3586
[5] https://issues.apache.org/jira/browse/HUDI-3575



Best,
Leesf