You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hudi.apache.org by leesf <le...@gmail.com> on 2022/02/13 15:26:00 UTC

[ANNOUNCE] Hudi Community Update(2022-01-16 ~ 2022-02-13)

Dear community,

Nice to share Hudi community bi-weekly updates for 2022-01-16 ~ 2022-02-13
with updates on features, bug fixes.


=======================================
Features


[Spark] Struct Stream Source Support Spark3 [1]
[Spark] ] Add support allowDuplicateInserts in HoodieJavaClient [2]
[Core] Adding support for Parquet in MOR tables Log blocks [3]
[Core] New ScheduleAndExecute mode for HoodieCompactor and hudi-cli [4]
[Flink] Bump flink version to 1.14.3 [5]
[Spark] Adding inline scheduling support for spark datasource path for
compaction and clustering [6]
[Core] Adding restore.requested instant and restore plan for restore action
[7]



[1] https://issues.apache.org/jira/browse/HUDI-1558
[2] https://issues.apache.org/jira/browse/HUDI-2417
[3] https://issues.apache.org/jira/browse/HUDI-431
[4] https://issues.apache.org/jira/browse/HUDI-3369
[5] https://issues.apache.org/jira/browse/HUDI-3389
[6] https://issues.apache.org/jira/browse/HUDI-1847
[7] https://issues.apache.org/jira/browse/HUDI-2432


=======================================
Bugs

[Core] Extracted common AbstractHoodieTableFileIndex to be shared across
engines [1]
[Core] Excluding clustering instants from pending rollback info [2]
[Core] fix MOR snapshot query during compaction [3]
[Core] Avoid creating empty requestedReplaceCommit in the startCommit
method [4]
[Core] A] Read rt table by hive cli throw NoSuchMethodError [5]
[Core] Do not nullify members in HoodieTableFileSystemView#resetViewState
to avoid NPE [6]
[Core] get table schema from the last commit with data written [7]
[Core] Convert uppercase letters to lowercase in storage configs [8]
[Core] Rebasing Hive's FileInputFormat onto `AbstractHoodieTableFileIndex`
[9]
[Core] Filter non-parquet files in bootstrap procedure [10]
[Core] use fields'comments persisted in catalog to fill in schema [11]
[Core] Bootstrap support overwrite existing table [12]
[Core] Drop unused method
SparkBootstrapCommitActionExecutor#handleMetadataBootstrap [13]
[Core] Tuning performance of getAllPartitionPaths API in
FileSystemBackedTableMetadata [14]
[Core] Fix NPE while reading table with Spark datasource [15]
[Core] Add support for using database name in incremental query [16]
[Spark] Fixing read of a empty table but with failed write [17]
[Spark] Fix delete exception for Spark SQL when sync Hive [18]
[Core] Fixing conflict resolution in transaction management codepath for
auto commit code path [19]
[Core] Refactoring layout optimization (clustering) flow to support linear
ordering [20]
[Core] gracefully fail to change column data type [21]
[Core] Rewriting rfc-27 for data skipping index [22]
[Core] Metadata table records - support for key deduplication based on
hardcoded key field [23]
[Core] Make class names consistent in hudi-client [24]
[Core] [RFC-40] A new Hudi connector for Trino [25]
[Core] Complete pending clustering before deltastreamer sync [26]
[Core] Fix Hudi CLI tempview query issue [27]
[Core] preferred to use the table's own location [28]
[Core] [RFC-46] Optimize Record Payload handling [29]
[Core] Enabling lazy read by default for log blocks during compaction [30]
[Core] Fallback to fulltable scan for IncrementalRelation if underlying
files have been cleared or moved by cleaner [31]
[Core] UFixing non existant marker dir handling in TwoToOnedowngrade [32]
[Core] Fixing default value for clustering small file config to 300MB [33]
[Core] RFC-37: Metadata table based bloom index [34]
[Core] Fixing Metadata Table Records Duplication Issues [35]
[Core] Fixing Parquet Column Range metadata extraction [36]
[Core] Metadata Index - Bloom filter and Column stats index to speed up
index lookups [37]
[Core] Removing duplicating file-listing process w/in Hive's MOR
`FileInputFormat`s [38]
[Core] Generalize HoodieIndex for flexible record data type [39]
[Core] Expose HMS mode metastore uri config option for spark writer [40]
[Deltastreamer] Adding retries to deltastreamer for source errors [41]
[Core] Show _hoodie_operation in spark sql results [42]
[Core] Unify Hive's MOR implementations to avoid duplication [43]
[Core] Simplify Precommit file system view [44]
[Core] Add zero value metrics for empty data source and
PROMETHEUS_PUSHGATEWAY reporter [45]
[Core] Hoodie metadata table validator [46]
[Core] Making SIMPLE index as the default index type [47]
[Core] Fixing missing begin checkpoint in HoodieIncremental pull [48]
[Core] Rebased Parquet-based FileInputFormat impls to inherit from
`MapredParquetInputFormat` [49]
[Core] Converting BaseHoodieTableFileIndex to Java [50]
[Core] fix that getNestedFieldVal breaks with Spark 3.2 [51]
[CLI] Allow pass rollbackUsingMarkers to Hudi CLI rollback command [52]
[Spark] pass the spark version when sync the table created by spark [53]
[Core] Update all deprecated calls to new apis in HoodieRecordPayload [54]
[Core] Set TIMESTAMP_MICROS as the default value for
hoodie.parquet.outputtimestamptype [55]
[Core] Custom relation instead of HadoopFsRelation [56]
[Core] Fix restore to rollback pending clustering operations followed by
other rolling back other commits [57]
[Deltastreamer] fix jackson parse error when empty message from
JsonKafkaSource Using HoodieDeltaStreamer [58]

[1] https://issues.apache.org/jira/browse/HUDI-3179
[2] https://issues.apache.org/jira/browse/HUDI-3257
[3] https://issues.apache.org/jira/browse/HUDI-3194
[4] https://issues.apache.org/jira/browse/HUDI-3252
[5] https://issues.apache.org/jira/browse/HUDI-3261
[6] https://issues.apache.org/jira/browse/HUDI-3263
[7] https://issues.apache.org/jira/browse/HUDI-2903
[8] https://issues.apache.org/jira/browse/HUDI-3245
[9] https://issues.apache.org/jira/browse/HUDI-3191
[10] https://issues.apache.org/jira/browse/HUDI-3277
[11] https://issues.apache.org/jira/browse/HUDI-3236
[12] https://issues.apache.org/jira/browse/HUDI-3283
[13] https://issues.apache.org/jira/browse/HUDI-3285
[14] https://issues.apache.org/jira/browse/HUDI-3281
[15] https://issues.apache.org/jira/browse/HUDI-3268
[16] https://issues.apache.org/jira/browse/HUDI-2837
[17] https://issues.apache.org/jira/browse/HUDI-1850
[18] https://issues.apache.org/jira/browse/HUDI-3282
[19] https://issues.apache.org/jira/browse/HUDI-3072
[20] https://issues.apache.org/jira/browse/HUDI-2872
[21] https://issues.apache.org/jira/browse/HUDI-3237
[22] https://issues.apache.org/jira/browse/HUDI-1822
[23] https://issues.apache.org/jira/browse/HUDI-2763
[24] https://issues.apache.org/jira/browse/HUDI-2596
[25] https://issues.apache.org/jira/browse/HUDI-2688
[26] https://issues.apache.org/jira/browse/HUDI-2943
[27] https://issues.apache.org/jira/browse/HUDI-1977
[28] https://issues.apache.org/jira/browse/HUDI-3253
[29] https://issues.apache.org/jira/browse/HUDI-3318
[30] https://issues.apache.org/jira/browse/HUDI-3292
[31] https://issues.apache.org/jira/browse/HUDI-2711
[32] https://issues.apache.org/jira/browse/HUDI-3346
[33] https://issues.apache.org/jira/browse/HUDI-3293
[34] https://issues.apache.org/jira/browse/HUDI-2589
[35] https://issues.apache.org/jira/browse/HUDI-3322
[36] https://issues.apache.org/jira/browse/HUDI-3337
[37] https://issues.apache.org/jira/browse/HUDI-1295
[38] https://issues.apache.org/jira/browse/HUDI-3191
[39] https://issues.apache.org/jira/browse/HUDI-2656
[40] https://issues.apache.org/jira/browse/HUDI-2491
[41] https://issues.apache.org/jira/browse/HUDI-3360
[42] https://issues.apache.org/jira/browse/HUDI-2941
[43] https://issues.apache.org/jira/browse/HUDI-3206
[44] https://issues.apache.org/jira/browse/HUDI-3058
[45] https://issues.apache.org/jira/browse/HUDI-3373
[46] https://issues.apache.org/jira/browse/HUDI-3320
[47] https://issues.apache.org/jira/browse/HUDI-3091
[48] https://issues.apache.org/jira/browse/HUDI-3361
[49] https://issues.apache.org/jira/browse/HUDI-3276
[50] https://issues.apache.org/jira/browse/HUDI-3239
[51] https://issues.apache.org/jira/browse/HUDI-3333
[52] https://issues.apache.org/jira/browse/HUDI-3395
[53] https://issues.apache.org/jira/browse/HUDI-2610
[54] https://issues.apache.org/jira/browse/HUDI-2987
[55] https://issues.apache.org/jira/browse/HUDI-3402
[56] https://issues.apache.org/jira/browse/HUDI-3338
[57] https://issues.apache.org/jira/browse/HUDI-3362
[58] https://issues.apache.org/jira/browse/HUDI-3413

===================================
Tests

[Tests] add UT for update/delete on non-pk condition [1]
[Tests] Fixing utilities and integ test suite bundle to include hudi spark
datasource [2]
[Tests] Solve UT for Spark 3.2 [3]
[Tests] Remove fixture test tables for multi writer tests [4]
[Tests] Fixing spark yaml and adding hive validation to integ test suite [5]


[1] https://issues.apache.org/jira/browse/HUDI-2968
[2] https://issues.apache.org/jira/browse/HUDI-3262
[3] https://issues.apache.org/jira/browse/HUDI-3215
[4] https://issues.apache.org/jira/browse/HUDI-3330
[5] https://issues.apache.org/jira/browse/HUDI-3312


Best,
Leesf