You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by leesf <le...@gmail.com> on 2021/08/15 15:42:00 UTC

[ANNOUNCE] HUdi Community Update(2021-08-01 ~ 2021-08-15)

Dear community,

Nice to share Hudi community bi-weekly updates for 2021-08-01 ~ 2021-08-15
with updates on features, bug fixes and tests.


=======================================
Features

[Examples] Add a compaction job in hudi-examples [1]
[Core] Add pre-commit validator framework [2]
[Spark Integration] Support metadata based listing for Spark DataSource and
Spark SQL [3]
[Hive Integration] Metadata table for flink [4]
[Spark Integration] Use HMS To Sync Hive Meta For Spark Sql [5]
[Flink Integration] Allows INSERT duplicates for Flink MOR table [6]
[Flink Integration] Use INT64 timestamp with precision 3 for flink parquet
writer [7]
[Spark Integration] Support Compaction Command For Spark Sql [8]
[Core] Support custom clustering strategies and preserve commit metadata as
part of clustering [9]
[Flink Integration] Spark Sql Support For pre-existing Hoodie Table [10]
[Spark Integration] Support Time Travel Query For Hoodie Table [11]
[Spark Integration] Support Bulk Insert For Spark Sql [12]
[Spark Integration] Skip the latest N partitions when choosing partitions
to create ClusteringPlan [13]
[Flink Integration] Propagate CDC format for hoodie [14]
[Core] Support storage on ks3 for hudi [15]
[DeltaStreamer] Adding support for delete_partitions to spark data source
[16]
[Core] Add timeline-server-based marker file strategy for improving
marker-related latency [17]
[Core] Add API to set a metric in the registry [18]
[Core] Adding virtual keys support to deltastreamer [19]
[Spark Integration] Support column name matching for insert * and update
set * in merge into [20]
[Core] Provide option to drop partition columns [21]
[Deltastreamer] Deltastreamer source for AWS S3 [22]
[Core] Add upgrade and downgrade to and from 0.9.0 [23]


[1] https://issues.apache.org/jira/browse/HUDI-2225
[2] https://issues.apache.org/jira/browse/HUDI-2072
[3] https://issues.apache.org/jira/browse/HUDI-1893
[4] https://issues.apache.org/jira/browse/HUDI-2258
[5] https://issues.apache.org/jira/browse/HUDI-2233
[6] https://issues.apache.org/jira/browse/HUDI-2274
[7] https://issues.apache.org/jira/browse/HUDI-2278
[8] https://issues.apache.org/jira/browse/HUDI-2182
[9] https://issues.apache.org/jira/browse/HUDI-1468
[10] https://issues.apache.org/jira/browse/HUDI-1842
[11] https://issues.apache.org/jira/browse/HUDI-2243
[12] https://issues.apache.org/jira/browse/HUDI-2208
[13] https://issues.apache.org/jira/browse/HUDI-2194
[14] https://issues.apache.org/jira/browse/HUDI-1771
[15] https://issues.apache.org/jira/browse/HUDI-2288
[16] https://issues.apache.org/jira/browse/HUDI-1774
[17] https://issues.apache.org/jira/browse/HUDI-1138
[18] https://issues.apache.org/jira/browse/HUDI-2017
[19] https://issues.apache.org/jira/browse/HUDI-2294
[20] https://issues.apache.org/jira/browse/HUDI-2279
[21] https://issues.apache.org/jira/browse/HUDI-1363
[22] https://issues.apache.org/jira/browse/HUDI-1897
[23] https://issues.apache.org/jira/browse/HUDI-2268

=======================================
Bugs

[Flink Integration] Release the disk map resource for flink streaming
reader  [1]
[Hive Integration] Pass base file format to sync clients [2]
[Spark Integration] Refactor Datasource options [3]
[Core] Ensure Disk Maps create a subfolder with appropriate prefixes and
cleans them up on close [4]
[Spark Integraion] MERGE INTO fails with table having nested struct [5]
[Flink Integration] Filter file where length less than parquet MAGIC length
[6]
[Core] Improving schema evolution support in hudi [7]
[Flink Integration] Compare the field object directly in
OverwriteWithLatestAvroPayload [8]
[Spark Integration] Always choose the latest record for HoodieRecordPayload
[9]
[Spark Integration] remove joda time in hivesync module [10]
[Core] MOR should not predicate pushdown when reading with payload_combine
type [11]
[Core] Handle the case of failed deltacommit on the metadata table. [12]
[Core] The HoodieMergedLogRecordScanner should set up the operation of the
chosen record [13]
[Core] Remove the logic that delete replaced file when archive [14]
[Core] Created a config to enable/disable syncing of metadata table [15]
[Core] Flipping defaults [16]
[Core] Ensure the rolled-back instance was previously synced to the
Metadata Table when syncing a Rollback Instant [17]
[Core] When using delete_partition with ds should not rely on the primary
key [18]
[Core] Add MARKERS.type and fix marker-based rollback [19]


[1] https://issues.apache.org/jira/browse/HUDI-2269
[2] https://issues.apache.org/jira/browse/HUDI-2272
[3] https://issues.apache.org/jira/browse/HUDI-2255
[4] https://issues.apache.org/jira/browse/HUDI-2090
[5] https://issues.apache.org/jira/browse/HUDI-2232
[6] https://issues.apache.org/jira/browse/HUDI-2247
[7] https://issues.apache.org/jira/browse/HUDI-1129
[8] https://issues.apache.org/jira/browse/HUDI-2042
[9] https://issues.apache.org/jira/browse/HUDI-1763
[10] https://issues.apache.org/jira/browse/HUDI-1939
[11] https://issues.apache.org/jira/browse/HUDI-2292
[12] https://issues.apache.org/jira/browse/HUDI-2286
[13] https://issues.apache.org/jira/browse/HUDI-2298
[14] https://issues.apache.org/jira/browse/HUDI-1518
[15] https://issues.apache.org/jira/browse/HUDI-1292
[16] https://issues.apache.org/jira/browse/HUDI-2151
[17] https://issues.apache.org/jira/browse/HUDI-2119
[18] https://issues.apache.org/jira/browse/HUDI-2307
[19] https://issues.apache.org/jira/browse/HUDI-2305


======================================
Tests

[Tests] Migrating some long running tests to functional test profile [1]

[1] https://issues.apache.org/jira/browse/HUDI-2273


Best,
Leesf