You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hudi.apache.org by leesf <le...@gmail.com> on 2021/07/04 15:31:00 UTC

[ANNOUNCE] Hudi Community Update(2021-06-20 ~ 2021-07-04)

Dear community,

Nice to share Hudi community bi-weekly updates for 2021-06-20 ~ 2021-07-04
with updates on features, bug fixes and tests.


=======================================
Features

[Spark Integration] Support AlterCommand For Hoodie  [1]
[Spark Integration] Support Truncate Table For Hoodie [2]
[Utilities] Add ORC support in HoodieSnapshotExporter [3]
[Hive Integration] Add ability to provide multi-region (global) data
consistency across HMS in different regions
[Deltastreamer] Commit Offset to Kafka after successful Hudi commit [5]
[Flink Integration] Supports hive style partitioning for flink writer [6]
[Flink Integration] Support specify compaction paralleism and compaction
target io for flink batch compaction [7]
[Deltastreamer] Support Hudi to read from committed offset [8]
[Flink Integration] Support load logFile in BootstrapFunction [9]
[Core] Add configOption & refactor all configs based on that [10]
[Spark Integration] Enable Hive Sync When Spark Enable Hive Meta For Spark
Sql [11]
[Flink Integration] Support reading pure logs file group for flink batch
reader after compaction [12]
[Flink Integration] Add operator uid for flink stateful operators [13]
[Utilities] A Grafana dashboard for HUDI [14]
[Core] Bootstrap support configure KeyGenerator by type [15]


[1] https://issues.apache.org/jira/browse/HUDI-1914
[2] https://issues.apache.org/jira/browse/HUDI-1883
[3] https://issues.apache.org/jira/browse/HUDI-1826
[5] https://issues.apache.org/jira/browse/HUDI-2094
[6] https://issues.apache.org/jira/browse/HUDI-1790
[7] https://issues.apache.org/jira/browse/HUDI-2085
[8] https://issues.apache.org/jira/browse/HUDI-1944
[9] https://issues.apache.org/jira/browse/HUDI-2052
[10] https://issues.apache.org/jira/browse/HUDI-89
[11] https://issues.apache.org/jira/browse/HUDI-2051
[12] https://issues.apache.org/jira/browse/HUDI-2112
[13] https://issues.apache.org/jira/browse/HUDI-2121
[14] https://issues.apache.org/jira/browse/HUDI-2124
[15] https://issues.apache.org/jira/browse/HUDI-1930

=======================================
Bugs

[Flink Integration] StreamWriteFunction should wait for the next inflight
instant time before flushing [1]
[Flink Integration] Support rollback inflight compaction instances for
batch flink compactor [2]
[Core] HoodieDefaultTimeline$filterPendingCompactionTImeline() method have
wrong filter condition [3]
[Core] JVM occasionally crashes during compaction when spark speculative
execution is enabled [4]
[Flink Integration] Ignore FileNotFoundException in WriteProfiles
#getWritePathsOfInstant [5]
[Core] Removed option to fallback to file listing when Metadata Table is
enabled [6]
[Core] Metadata Reader should merge all the un-synced but complete instants
from the dataset timeline [7]
[Core] FinalizeWrite() been executed twice in
AbstractHoodieWriteClient$commitstats [8]
[Flink Integratoin] Remove the duplicate name for flink write pipeline [9]
[Flink Integration] Support rollback inflight compaction instances for
CompactionPlanOperator [10]
[Spark Integration] Incorrect Schema Inference For Schema Evolved Table [11]
[Spark Integration]  Insert Static Partition With DateType Return Incorrect
Partition Value [12]
[DeltaStreamer] Fix KafkaAvroSchemaDeserializer to not rely on reflection
[13]
[Flink Integration] Catch FileNotFoundException in WriteProfiles
#getCommitMetadata Safely [14]
[Core] Fix the bug of hoodieClusteringJob never quit [15]
[Flink Integration] Use while loop instead of recursive call in
MergeOnReadInputFormat#MergeIterator to avoid StackOverflow [16]
[Flink Integration] Sync FlinkOptions config to FlinkStreamerConfig [17]
[Flink Integration] Resend the uncommitted write metadata when start up [18]
[Spark Integration] Fix Flink unable to read commit metadata error [19]
[Flink Integration] Fix NPE caused by
FlinkStreamerConfig#writePartitionUrlEncode null value [20]
[Flink Integration] Add rebalance before index bootstrap [21]
[Flink Integration] Missing Partition Fields And PreCombineField In Hoodie
Properties For Table Written By Flink [22]
[Spark Integration] Compaction Failed For MergeInto MOR Table [23]
[Spark Integration] Spark Query MOR Table Written By Flink Return Incorrect
Timestamp Value [24]
[Spark Integration] Exception When Merge With Null-Value Field [25]
[Spark Integration] CTAS Generate An External Table When Create Managed
Table [26]
[Hive Integration] Support batch synchronization of partition datas to hive
metastore to avoid oom problem [27]


[1] https://issues.apache.org/jira/browse/HUDI-2049
[2] https://issues.apache.org/jira/browse/HUDI-2050
[3] https://issues.apache.org/jira/browse/HUDI-1909
[4] https://issues.apache.org/jira/browse/HUDI-2031
[5] https://issues.apache.org/jira/browse/HUDI-2047
[6] https://issues.apache.org/jira/browse/HUDI-2013
[7] https://issues.apache.org/jira/browse/HUDI-1717
[8] https://issues.apache.org/jira/browse/HUDI-1988
[9] https://issues.apache.org/jira/browse/HUDI-2054
[10] https://issues.apache.org/jira/browse/HUDI-2038
[11] https://issues.apache.org/jira/browse/HUDI-2061
[12] https://issues.apache.org/jira/browse/HUDI-2053
[13] https://issues.apache.org/jira/browse/HUDI-2069
[14] https://issues.apache.org/jira/browse/HUDI-2062
[15] https://issues.apache.org/jira/browse/HUDI-2073
[16] https://issues.apache.org/jira/browse/HUDI-2074
[17] https://issues.apache.org/jira/browse/HUDI-2067
[18] https://issues.apache.org/jira/browse/HUDI-2084
[19] https://issues.apache.org/jira/browse/HUDI-2097
[20] https://issues.apache.org/jira/browse/HUDI-2092
[21] https://issues.apache.org/jira/browse/HUDI-2103
[22] https://issues.apache.org/jira/browse/HUDI-2088
[23] https://issues.apache.org/jira/browse/HUDI-2105
[24] https://issues.apache.org/jira/browse/HUDI-2114
[25] https://issues.apache.org/jira/browse/HUDI-2123
[26] https://issues.apache.org/jira/browse/HUDI-2057
[27] https://issues.apache.org/jira/browse/HUDI-2116

======================================
Tests

[Tests] Increase timeout for deltaStreamerTestRunner in
TestHoodieDeltaStreamer [1]
[Tests] Fix TestHoodieBackedMetadata#testOnlyValidPartitionsAdded [2]
[Tests] Added tests for KafkaOffsetGen [3]
[Tests] Move schema util tests out from TestHiveSyncTool [4]
[Tests] Adding more yaml templates to test suite [5]

[1] https://issues.apache.org/jira/browse/HUDI-1248
[2] https://issues.apache.org/jira/browse/HUDI-2064
[3] https://issues.apache.org/jira/browse/HUDI-2060
[4] https://issues.apache.org/jira/browse/HUDI-2081
[5] https://issues.apache.org/jira/browse/HUDI-2006

Best,
Leesf