You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hudi.apache.org by leesf <le...@gmail.com> on 2021/12/05 15:22:00 UTC

[ANNOUNCE] Hudi Community Update(2021-11-21 ~ 2021-12-05)

Dear community,

Nice to share Hudi community bi-weekly updates for 2021-11-21 ~ 2021-12-05
with updates on features, bug fixes and tests.


=======================================
Features

[Spark SQL] extract HoodieCatalogTable to coordinate spark catalog table
and hoodie table [1]
[Deltastreamer] Add Debezium Source for deltastreamer [2]
[Core] Add Amazon CloudWatch metrics reporter [3]
[Core] Support hilbert curve for hudi [4]
[Flink Integration] Support flink catalog to help user use flink table
conveniently [5]
[Core] Introduce a pulsar implementation of hoodie write commit [6]
[Core] Support HiveSchemaProvider [7]



[1] https://issues.apache.org/jira/browse/HUDI-2759
[2] https://issues.apache.org/jira/browse/HUDI-1290
[3] https://issues.apache.org/jira/browse/HUDI-2801
[4] https://issues.apache.org/jira/browse/HUDI-2102
[5] https://issues.apache.org/jira/browse/HUDI-2877
[6] https://issues.apache.org/jira/browse/HUDI-2937
[7] https://issues.apache.org/jira/browse/HUDI-2418


=======================================
Bugs

[Flink] Set up keygen class explicit for write config for flink table
upgrade [1]
[Core] New option for hoodieClusteringJob to check, rollback and re-execute
the last failed clustering job [2]
[Core] Converting commit timestamp format to millisecs [3]
[Core] Expand File-Group candidates list for appending for MOR tables [4]
[Flink] Use earliest instant by default for async compaction and clustering
jobs [5]
[Core] Rollback unfinished replace commit to allow updates [6]
[Core] Assume path exists and defer fs.exists() in
AbstractTableFileSystemView [7]
[Core] Optimize statistics collection related codes and add some docs for
z-order add fix some bugs [8]
[Core] Using HBase shaded jars in Hudi presto bundle [9]
[Core] Add clustering and compaction in Kafka Connect Sink [10]
[Core] Add hive sync support to kafka connect [11]
[Core] Securing usages of SimpleDateFormat to be thread-safe [12]
[Core] Fix 2to3 upgrade when set `hoodie.table.keygenerator.class` [13]
[Spark Integration] refresh table after drop partition [14]
[Flink Integration] Flink metadata table supports virtual keys [15]
[Core] Fix kafka offset handling in Kafka Connect protocol  [16]
[Core] Hudi KVComparator for all HFile writer usages [17]
[Core] Fixing issues w/ Z-order Layout Optimization [18]
[Core] Cluster update strategy should not be fenced by write config [19]
[Deltastreamer] Fixing deltastreamer checkpoint fetch/copy over [20]
[Core] Add JMX deps in hudi utilities and kafka connect bundles [21]
[CLI] Fixing archived Timeline crashing if timeline contains REPLACE_COMMIT
[22]
[Core] Configure metadata payload consistency check [23]
[Core] FileSlice after pending compaction-requested instant-time is ignored
by MOR snapshot reader [24]
[Deltastreamer] fixing mysql debezium source [25]
[Core] Remove rdd.isEmpty() validation to prevent CreateHandle being called
twice [26]
[Core] Guarding table service commits within a single lock to commit to
both data table and metadata table [27]
[deltastreamer] Fixing handling of cluster update reject exception in
deltastreamer [28]
[Core] Fixing lazy rollback for MOR with list based strategy [29]
[Deltastreamer] Fixed DeltaStreaemer to properly respect configuration
passed t/h properties file [30]
[Core] Removing direct fs call in HoodieLogFileReader [31]
[Core] Table metadata returns empty for non-exist partition [32]
[CLI] Fixing Clustering CLI - schedule and run command fixes to avoid
NumberFormatException [33]
[Core] Addressing issues w/ Z-order Layout Optimization [34]
[Core] Re-use same rollback instant time for failed rollbacks [35]
[Core] Enabling timeline-server-based marker as default [36]
[CLI] Metadata CLI - files/partition file listing fix and new validate
option [37]
[Core] Metadata table creation and avoid bootstrapping race for write
client & add locking for upgrade [38]
[Spark Integration] Add support ignoring case in update sql operation [39]
[Core] Fix write configs for Java engine in Kafka Connect Sink [40]
[Core] Fixing loading of props from default dir [41]
[Core] Compact the file group with larger log files to reduce write
amplification [42]
[Core] Fix metadata table archival overstepping between regular writers and
table services [43]
[Flink Integration] Fix remote timeline server config for flink [44]
[Core] Refresh the fs view on successful checkpoints for write profile [45]
[Core] Fixing populate meta fields with Hfile writers and Disabling virtual
keys by default for metadata table [46]
[Core] Removing default value for PARTITIONPATH_FIELD_NAME resulting in
incorrect `KeyGenerator` configuration [47]
[Core] Metadata table - avoiding key lookup failures on base files over S3
[48]
[Core] Kafka Connect: Fix failed writes and avoid table service concurrent
operations [49]
[Core] Fixing metadata table reader when metadata compaction is inflight
[50]
[Deltastreamer] Remove special casing of clustering in deltastreamer
checkpoint retrival [51]


[1] https://issues.apache.org/jira/browse/HUDI-2702
[2] https://issues.apache.org/jira/browse/HUDI-2533
[3] https://issues.apache.org/jira/browse/HUDI-2559
[4] https://issues.apache.org/jira/browse/HUDI-2550
[5] https://issues.apache.org/jira/browse/HUDI-2737
[6] https://issues.apache.org/jira/browse/HUDI-1937
[7] https://issues.apache.org/jira/browse/HUDI-2743
[8] https://issues.apache.org/jira/browse/HUDI-2778
[9] https://issues.apache.org/jira/browse/HUDI-2409
[10] https://issues.apache.org/jira/browse/HUDI-2332
[11] https://issues.apache.org/jira/browse/HUDI-2325
[12] https://issues.apache.org/jira/browse/HUDI-2831
[13] https://issues.apache.org/jira/browse/HUDI-2818
[14] https://issues.apache.org/jira/browse/HUDI-2838
[15] https://issues.apache.org/jira/browse/HUDI-2847
[16] https://issues.apache.org/jira/browse/HUDI-2671
[17] https://issues.apache.org/jira/browse/HUDI-2443
[18] https://issues.apache.org/jira/browse/HUDI-2778
[19] https://issues.apache.org/jira/browse/HUDI-2766
[20] https://issues.apache.org/jira/browse/HUDI-2793
[21] https://issues.apache.org/jira/browse/HUDI-2853
[22] https://issues.apache.org/jira/browse/HUDI-2844
[23] https://issues.apache.org/jira/browse/HUDI-2792
[24] https://issues.apache.org/jira/browse/HUDI-2480
[25] https://issues.apache.org/jira/browse/HUDI-1290
[26] https://issues.apache.org/jira/browse/HUDI-2800
[27] https://issues.apache.org/jira/browse/HUDI-2794
[28] https://issues.apache.org/jira/browse/HUDI-2858
[29] https://issues.apache.org/jira/browse/HUDI-2841
[30] https://issues.apache.org/jira/browse/HUDI-2840
[31] https://issues.apache.org/jira/browse/HUDI-2005
[32] https://issues.apache.org/jira/browse/HUDI-2852
[33] https://issues.apache.org/jira/browse/HUDI-2850
[34] https://issues.apache.org/jira/browse/HUDI-2814
[35] https://issues.apache.org/jira/browse/HUDI-2861
[36] https://issues.apache.org/jira/browse/HUDI-2767
[37] https://issues.apache.org/jira/browse/HUDI-2845
[38] https://issues.apache.org/jira/browse/HUDI-2475
[39] https://issues.apache.org/jira/browse/HUDI-2642
[40] https://issues.apache.org/jira/browse/HUDI-2891
[41] https://issues.apache.org/jira/browse/HUDI-2880
[42] https://issues.apache.org/jira/browse/HUDI-2881
[43] https://issues.apache.org/jira/browse/HUDI-2904
[44] https://issues.apache.org/jira/browse/HUDI-2914
[45] https://issues.apache.org/jira/browse/HUDI-2924
[46] https://issues.apache.org/jira/browse/HUDI-2902
[47] https://issues.apache.org/jira/browse/HUDI-2911
[48] https://issues.apache.org/jira/browse/HUDI-2894
[49] https://issues.apache.org/jira/browse/HUDI-2890
[50] https://issues.apache.org/jira/browse/HUDI-2923
[51] https://issues.apache.org/jira/browse/HUDI-2935


======================================
Tests

[Tests] Add more Spark CI build tasks [1]
[Tests] Fix skipped HoodieSparkSqlWriterSuite [2]



[1] https://issues.apache.org/jira/browse/HUDI-1870
[2] https://issues.apache.org/jira/browse/HUDI-2868




Best,
Leesf