You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hudi.apache.org by leesf <le...@gmail.com> on 2021/07/18 15:46:00 UTC

[ANNOUNCE] Hudi Community Update(2021-07-04 ~ 2021-07-18)

Dear community,

Nice to share Hudi community bi-weekly updates for 2021-07-04 ~ 2021-07-18
with updates on features, bug fixes and tests.


=======================================
Features

[Hive Integration] Support batch synchronization of partition datas to hive
metastore to avoid oom problem  [1]
[Spark Integration] support incremental query for
insert_overwrite_table/insert_overwrite operation on cow table [2]
[Hive Integration] Support hive1 metadata sync for flink writer [3]
[Core] Implement RockDbBasedMap as an alternate to DiskBasedMap in
ExternalSpillableMap [4]
[Flink Integration] Add compaction schedule option for flink [5]
[Deltastreamer] Added deltastreamer metric for time of lastSync [6]
[Core] Adding functionality to allow the providing of basic auth creds for
confluent cloud schema registry [7]
[Spark Integration] Adding support for UserDefinedPartitioners and
SortModes to BulkInsert with Rows [8]
[Spark Integration] Adding dedup support for Bulk Insert w/ Rows [9]
[Flink Integration] Support Append only in Flink stream [10]
[Spark Integration] Support async clustering for deltastreamer and Spark
streaming [11]
[Spark Integration] Support Read Hoodie As DataSource Table For Flink And
DeltaStreamer [12]
[Flink Integration] Support Read Log Only MOR Table For Spark [13]
[Flink Integration] Add parallelism conf for bootstrap operator [14]
[Core] Support reading logs for MOR Hive rt table [15]
[Flink Integration] Support Transformer for HoodieFlinkStreamer [16]
[Core] Implement compression for DiskBasedMap in Spillable Map [17]
[Core] Make callback return HoodieWriteStat [18]


[1] https://issues.apache.org/jira/browse/HUDI-2116
[2] https://issues.apache.org/jira/browse/HUDI-2058
[3] https://issues.apache.org/jira/browse/HUDI-2133
[4] https://issues.apache.org/jira/browse/HUDI-2028
[5] https://issues.apache.org/jira/browse/HUDI-2094
[6] https://issues.apache.org/jira/browse/HUDI-2055
[7] https://issues.apache.org/jira/browse/HUDI-1996
[8] https://issues.apache.org/jira/browse/HUDI-1104
[9] https://issues.apache.org/jira/browse/HUDI-1105
[10] https://issues.apache.org/jira/browse/HUDI-2087
[11] https://issues.apache.org/jira/browse/HUDI-1483
[12] https://issues.apache.org/jira/browse/HUDI-2045
[13] https://issues.apache.org/jira/browse/HUDI-2107
[14] https://issues.apache.org/jira/browse/HUDI-2171
[15] https://issues.apache.org/jira/browse/HUDI-1969
[16] https://issues.apache.org/jira/browse/HUDI-2165
[17] https://issues.apache.org/jira/browse/HUDI-2029
[18] https://issues.apache.org/jira/browse/HUDI-1633

=======================================
Bugs

[Flink Integration] The coordinator send events to write function when
there are no data for the checkpoint [1]
[Core]  Initialize the maxMemorySizeInBytes in log scanner  [2]
[Flink Integration] StreamerUtil.medianInstantTime should return a valid
datetime string [3]
[Spark Integration] Exception Throw Out When MergeInto With Decimal Type
Field [4]
[Core] Improvement in packaging insert into smallfiles [5]
[Flink Integration] Make coordinator events as POJO for efficient
serialization [6]
[Flink Integraion] Fix flink batch compaction bug while user don't set
compaction tasks [7]
[Metadata Table] Ffix the bug that metatable cannot support non_partition
table [8]
[Core] Loaded too many classes like
sun/reflect/GeneratedSerializationConstructorAccessor in JVM metaspace [9]
[Flink Integration] Fix empty avro schema path caused by duplicate
parameters [10]
[Spark Integration] Incorrect Schema Inference For Schema Evolved Table [11]
[Metadata Table]  Fixed bootstrap of Metadata Table when some actions are
in progress [12]
[Core] FileSlices in the filegroup is not descending by timestamp [13]
[Flink Integration] Refactored String constants [14]
[Core] Add generics to avoif forced conversion in
BaseSparkCommitActionExecutor#partition [15]
[Spark Integration]  Fixing extra commit metadata in row writer path [16]
[Core] ]hive lock which state is WATING should be released, otherwise this
hive lock will be locked forever [17]
[Flink Integration] Fix conflict when flink-sql-connector-hive and
hudi-flink-bundle are both in flink lib [18]
[Spark Integration] Tweak the default compaction target IO to 500GB when
flink async compaction is off [19]
[Flink Integration] Support setting bucket assign parallelism for flink
write task [20]
[Flink Integration] Bug-Fix:Offline clustering(HoodieClusteringJob) will
cause insert action losing data [21]
[Flink Integration] Fix for AccessControlException for anonymous user [22]
[Spark Integration] Fix Compile Error For Spark3 [23]
[Spark Integration] Ensure and Audit docs for every configuration class in
the codebase [24]
[Spark Integration] Fix BucketAssignFunction Context NullPointerException
[25]
[Spark Integration] Remove the default parallelism of index bootstrap and
bucket assigner [26]


[1] https://issues.apache.org/jira/browse/HUDI-2126
[2] https://issues.apache.org/jira/browse/HUDI-2127
[3] https://issues.apache.org/jira/browse/HUDI-2129
[4] https://issues.apache.org/jira/browse/HUDI-2131
[5] https://issues.apache.org/jira/browse/HUDI-2122
[6] https://issues.apache.org/jira/browse/HUDI-2132
[7] https://issues.apache.org/jira/browse/HUDI-2106
[8] https://issues.apache.org/jira/browse/HUDI-2098
[9] https://issues.apache.org/jira/browse/HUDI-2046
[10] https://issues.apache.org/jira/browse/HUDI-2093
[11] https://issues.apache.org/jira/browse/HUDI-2061
[12] https://issues.apache.org/jira/browse/HUDI-2016
[13] https://issues.apache.org/jira/browse/HUDI-2115
[14] https://issues.apache.org/jira/browse/HUDI-2069
[15] https://issues.apache.org/jira/browse/HUDI-2134
[16] https://issues.apache.org/jira/browse/HUDI-2009
[17] https://issues.apache.org/jira/browse/HUDI-2099
[18] https://issues.apache.org/jira/browse/HUDI-2136
[19] https://issues.apache.org/jira/browse/HUDI-2143
[20] https://issues.apache.org/jira/browse/HUDI-2142
[21] https://issues.apache.org/jira/browse/HUDI-2144
[22] https://issues.apache.org/jira/browse/HUDI-2168
[23] https://issues.apache.org/jira/browse/HUDI-2180
[24] https://issues.apache.org/jira/browse/HUDI-2149
[25] https://issues.apache.org/jira/browse/HUDI-2153
[26] https://issues.apache.org/jira/browse/HUDI-2185

======================================
Tests

[Tests] Fix integration testing failure caused by sql results out of order
[1]
[Tests] Fixed the unit test
TestHoodieBackedMetadata.testOnlyValidPartitionsAdded [2]
[Tests] Update unit tests to support ORC as the base file format [3]

[1] https://issues.apache.org/jira/browse/HUDI-2113
[2] https://issues.apache.org/jira/browse/HUDI-2140
[3] https://issues.apache.org/jira/browse/HUDI-1828


Best,
Leesf