You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by leesf <le...@gmail.com> on 2020/01/13 01:05:07 UTC

[ANNOUNCE] Hudi Weekly Community Update (2020-01-05 ~ 2020-01-12)

Dear community,

Nice to share Hudi community weekly update for 2020-01-05 ~ 2020-01-12 with
updates on develpment, features, bug fixes.


Development

[Terminologies simplification] A full version to introduce the design and
architecture of HUDI has been written[1], and you are welcome to
contribute.
[JDBC Incremental Puller] A disscussion about introducing JDBC Delta
Streamer to make HUDI more powerful[2] has been started. and a RFC[3] has
been draft for comments.
[New Website] The PR provided by lamberKen to introduce new hudi web site
has been merged, you would check it out[4] and kindly feedback are
welcome[5].
[Weekly update] A disscussion thread about giving a weekly update of hudi
commnuity to expand the visibility of hudi.
[Configuration refactor] A disscussion thread about refactoring the
configuration framework of hudi is going to start [6].
[Release] A disscussion about the code freeze date(Jan 15) for next release
(0.5.1) reached a consensus.[7]

[1] https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture
[2]
https://lists.apache.org/thread.html/r31b03a964c234e0903847ba60d9d7b340d0b59daa5232ae922a5b38d%40%3Cdev.hudi.apache.org%3E
[3]
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller
[4] https://hudi.apache.org/newsite-content/
[5] https://github.com/apache/incubator-hudi/issues/1196
[6]
https://lists.apache.org/thread.html/1fd96c9ff258aa35c030d07b929fdc15c2ebe93b155e1067ff45259c%40%3Cdev.hudi.apache.org%3E
[7]
https://lists.apache.org/thread.html/r14291a41be93ff178f22faa292d5e2a09fc7c294b7d89216c132083a%40%3Cdev.hudi.apache.org%3E


Features

[DeltaStreamer] Adding Delete() support to DeltaStreamer[8]
[Client] Refactor HoodieWriteClient so that commit logic can be shareable
by both bootstrap and normal write operations[9]
[Docs] Add a new maven profile to generate unified Javadoc for all Java and
Scala classes[10]
[Hive Integration] Optimize HoodieInputformat.listStatus() for faster Hive
incremental queries on Hoodie[11]
[Writer] added option to overwrite payload implementation in
hoodie.properties file[12]
[DeltaStreamer] Introduce Default partition path in
TimestampBasedKeyGenerator[13]
[Spark Integration] Replace Databricks spark-avro with native
spark-avro[14]
[Writer] Upgrade Hudi to Spark 2.4[15]
[Utilities] Provide a custom time zone definition for
TimestampBasedKeyGenerator[16]

[8] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-377
[9] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-417
[10] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-319
[11] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-25
[12] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-114
[13] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-406
[14] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-91
[15] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-12
[16] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-502


Bugs

[Incremental Pull] Fix NPE when reading IncrementalPull.sqltemplate in
HiveIncrementalPuller[17]
[CLI] HoodieCommitMetadata only show first commit insert rows[18]
[CLI] CLI doesn't allow rolling back a Delta commit[19]
[DeltaStreamer] DeltaSteamer should pick checkpoints off only deltacommits
for MOR tables[20]

[17] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-484
[18] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-469
[19] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-248
[20] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-322

Re: [ANNOUNCE] Hudi Weekly Community Update (2020-01-05 ~ 2020-01-12)

Posted by Bhavani Sudha <bh...@gmail.com>.
Great. Thanks @leesf <le...@gmail.com> for compiling this!

On Sun, Jan 12, 2020 at 8:41 PM Vinoth Chandar <vi...@apache.org> wrote:

> Great to see this!
>
> On Sun, Jan 12, 2020 at 6:30 PM lamberken <la...@163.com> wrote:
>
> >
> >
> > Good Job !!
> >
> >
> >
> > At 2020-01-13 09:05:07, "leesf" <le...@gmail.com> wrote:
> > >Dear community,
> > >
> > >Nice to share Hudi community weekly update for 2020-01-05 ~ 2020-01-12
> > with
> > >updates on develpment, features, bug fixes.
> > >
> > >
> > >Development
> > >
> > >[Terminologies simplification] A full version to introduce the design
> and
> > >architecture of HUDI has been written[1], and you are welcome to
> > >contribute.
> > >[JDBC Incremental Puller] A disscussion about introducing JDBC Delta
> > >Streamer to make HUDI more powerful[2] has been started. and a RFC[3]
> has
> > >been draft for comments.
> > >[New Website] The PR provided by lamberKen to introduce new hudi web
> site
> > >has been merged, you would check it out[4] and kindly feedback are
> > >welcome[5].
> > >[Weekly update] A disscussion thread about giving a weekly update of
> hudi
> > >commnuity to expand the visibility of hudi.
> > >[Configuration refactor] A disscussion thread about refactoring the
> > >configuration framework of hudi is going to start [6].
> > >[Release] A disscussion about the code freeze date(Jan 15) for next
> > release
> > >(0.5.1) reached a consensus.[7]
> > >
> > >[1]
> > https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture
> > >[2]
> > >
> >
> https://lists.apache.org/thread.html/r31b03a964c234e0903847ba60d9d7b340d0b59daa5232ae922a5b38d%40%3Cdev.hudi.apache.org%3E
> > >[3]
> > >
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller
> > >[4] https://hudi.apache.org/newsite-content/
> > >[5] https://github.com/apache/incubator-hudi/issues/1196
> > >[6]
> > >
> >
> https://lists.apache.org/thread.html/1fd96c9ff258aa35c030d07b929fdc15c2ebe93b155e1067ff45259c%40%3Cdev.hudi.apache.org%3E
> > >[7]
> > >
> >
> https://lists.apache.org/thread.html/r14291a41be93ff178f22faa292d5e2a09fc7c294b7d89216c132083a%40%3Cdev.hudi.apache.org%3E
> > >
> > >
> > >Features
> > >
> > >[DeltaStreamer] Adding Delete() support to DeltaStreamer[8]
> > >[Client] Refactor HoodieWriteClient so that commit logic can be
> shareable
> > >by both bootstrap and normal write operations[9]
> > >[Docs] Add a new maven profile to generate unified Javadoc for all Java
> > and
> > >Scala classes[10]
> > >[Hive Integration] Optimize HoodieInputformat.listStatus() for faster
> Hive
> > >incremental queries on Hoodie[11]
> > >[Writer] added option to overwrite payload implementation in
> > >hoodie.properties file[12]
> > >[DeltaStreamer] Introduce Default partition path in
> > >TimestampBasedKeyGenerator[13]
> > >[Spark Integration] Replace Databricks spark-avro with native
> > >spark-avro[14]
> > >[Writer] Upgrade Hudi to Spark 2.4[15]
> > >[Utilities] Provide a custom time zone definition for
> > >TimestampBasedKeyGenerator[16]
> > >
> > >[8] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-377
> > >[9] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-417
> > >[10] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-319
> > >[11] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-25
> > >[12] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-114
> > >[13] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-406
> > >[14] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-91
> > >[15] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-12
> > >[16] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-502
> > >
> > >
> > >Bugs
> > >
> > >[Incremental Pull] Fix NPE when reading IncrementalPull.sqltemplate in
> > >HiveIncrementalPuller[17]
> > >[CLI] HoodieCommitMetadata only show first commit insert rows[18]
> > >[CLI] CLI doesn't allow rolling back a Delta commit[19]
> > >[DeltaStreamer] DeltaSteamer should pick checkpoints off only
> deltacommits
> > >for MOR tables[20]
> > >
> > >[17] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-484
> > >[18] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-469
> > >[19] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-248
> > >[20] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-322
> >
>

Re: [ANNOUNCE] Hudi Weekly Community Update (2020-01-05 ~ 2020-01-12)

Posted by Vinoth Chandar <vi...@apache.org>.
Great to see this!

On Sun, Jan 12, 2020 at 6:30 PM lamberken <la...@163.com> wrote:

>
>
> Good Job !!
>
>
>
> At 2020-01-13 09:05:07, "leesf" <le...@gmail.com> wrote:
> >Dear community,
> >
> >Nice to share Hudi community weekly update for 2020-01-05 ~ 2020-01-12
> with
> >updates on develpment, features, bug fixes.
> >
> >
> >Development
> >
> >[Terminologies simplification] A full version to introduce the design and
> >architecture of HUDI has been written[1], and you are welcome to
> >contribute.
> >[JDBC Incremental Puller] A disscussion about introducing JDBC Delta
> >Streamer to make HUDI more powerful[2] has been started. and a RFC[3] has
> >been draft for comments.
> >[New Website] The PR provided by lamberKen to introduce new hudi web site
> >has been merged, you would check it out[4] and kindly feedback are
> >welcome[5].
> >[Weekly update] A disscussion thread about giving a weekly update of hudi
> >commnuity to expand the visibility of hudi.
> >[Configuration refactor] A disscussion thread about refactoring the
> >configuration framework of hudi is going to start [6].
> >[Release] A disscussion about the code freeze date(Jan 15) for next
> release
> >(0.5.1) reached a consensus.[7]
> >
> >[1]
> https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture
> >[2]
> >
> https://lists.apache.org/thread.html/r31b03a964c234e0903847ba60d9d7b340d0b59daa5232ae922a5b38d%40%3Cdev.hudi.apache.org%3E
> >[3]
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller
> >[4] https://hudi.apache.org/newsite-content/
> >[5] https://github.com/apache/incubator-hudi/issues/1196
> >[6]
> >
> https://lists.apache.org/thread.html/1fd96c9ff258aa35c030d07b929fdc15c2ebe93b155e1067ff45259c%40%3Cdev.hudi.apache.org%3E
> >[7]
> >
> https://lists.apache.org/thread.html/r14291a41be93ff178f22faa292d5e2a09fc7c294b7d89216c132083a%40%3Cdev.hudi.apache.org%3E
> >
> >
> >Features
> >
> >[DeltaStreamer] Adding Delete() support to DeltaStreamer[8]
> >[Client] Refactor HoodieWriteClient so that commit logic can be shareable
> >by both bootstrap and normal write operations[9]
> >[Docs] Add a new maven profile to generate unified Javadoc for all Java
> and
> >Scala classes[10]
> >[Hive Integration] Optimize HoodieInputformat.listStatus() for faster Hive
> >incremental queries on Hoodie[11]
> >[Writer] added option to overwrite payload implementation in
> >hoodie.properties file[12]
> >[DeltaStreamer] Introduce Default partition path in
> >TimestampBasedKeyGenerator[13]
> >[Spark Integration] Replace Databricks spark-avro with native
> >spark-avro[14]
> >[Writer] Upgrade Hudi to Spark 2.4[15]
> >[Utilities] Provide a custom time zone definition for
> >TimestampBasedKeyGenerator[16]
> >
> >[8] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-377
> >[9] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-417
> >[10] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-319
> >[11] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-25
> >[12] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-114
> >[13] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-406
> >[14] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-91
> >[15] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-12
> >[16] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-502
> >
> >
> >Bugs
> >
> >[Incremental Pull] Fix NPE when reading IncrementalPull.sqltemplate in
> >HiveIncrementalPuller[17]
> >[CLI] HoodieCommitMetadata only show first commit insert rows[18]
> >[CLI] CLI doesn't allow rolling back a Delta commit[19]
> >[DeltaStreamer] DeltaSteamer should pick checkpoints off only deltacommits
> >for MOR tables[20]
> >
> >[17] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-484
> >[18] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-469
> >[19] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-248
> >[20] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-322
>

Re:[ANNOUNCE] Hudi Weekly Community Update (2020-01-05 ~ 2020-01-12)

Posted by lamberken <la...@163.com>.

Good Job !!



At 2020-01-13 09:05:07, "leesf" <le...@gmail.com> wrote:
>Dear community,
>
>Nice to share Hudi community weekly update for 2020-01-05 ~ 2020-01-12 with
>updates on develpment, features, bug fixes.
>
>
>Development
>
>[Terminologies simplification] A full version to introduce the design and
>architecture of HUDI has been written[1], and you are welcome to
>contribute.
>[JDBC Incremental Puller] A disscussion about introducing JDBC Delta
>Streamer to make HUDI more powerful[2] has been started. and a RFC[3] has
>been draft for comments.
>[New Website] The PR provided by lamberKen to introduce new hudi web site
>has been merged, you would check it out[4] and kindly feedback are
>welcome[5].
>[Weekly update] A disscussion thread about giving a weekly update of hudi
>commnuity to expand the visibility of hudi.
>[Configuration refactor] A disscussion thread about refactoring the
>configuration framework of hudi is going to start [6].
>[Release] A disscussion about the code freeze date(Jan 15) for next release
>(0.5.1) reached a consensus.[7]
>
>[1] https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture
>[2]
>https://lists.apache.org/thread.html/r31b03a964c234e0903847ba60d9d7b340d0b59daa5232ae922a5b38d%40%3Cdev.hudi.apache.org%3E
>[3]
>https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller
>[4] https://hudi.apache.org/newsite-content/
>[5] https://github.com/apache/incubator-hudi/issues/1196
>[6]
>https://lists.apache.org/thread.html/1fd96c9ff258aa35c030d07b929fdc15c2ebe93b155e1067ff45259c%40%3Cdev.hudi.apache.org%3E
>[7]
>https://lists.apache.org/thread.html/r14291a41be93ff178f22faa292d5e2a09fc7c294b7d89216c132083a%40%3Cdev.hudi.apache.org%3E
>
>
>Features
>
>[DeltaStreamer] Adding Delete() support to DeltaStreamer[8]
>[Client] Refactor HoodieWriteClient so that commit logic can be shareable
>by both bootstrap and normal write operations[9]
>[Docs] Add a new maven profile to generate unified Javadoc for all Java and
>Scala classes[10]
>[Hive Integration] Optimize HoodieInputformat.listStatus() for faster Hive
>incremental queries on Hoodie[11]
>[Writer] added option to overwrite payload implementation in
>hoodie.properties file[12]
>[DeltaStreamer] Introduce Default partition path in
>TimestampBasedKeyGenerator[13]
>[Spark Integration] Replace Databricks spark-avro with native
>spark-avro[14]
>[Writer] Upgrade Hudi to Spark 2.4[15]
>[Utilities] Provide a custom time zone definition for
>TimestampBasedKeyGenerator[16]
>
>[8] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-377
>[9] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-417
>[10] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-319
>[11] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-25
>[12] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-114
>[13] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-406
>[14] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-91
>[15] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-12
>[16] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-502
>
>
>Bugs
>
>[Incremental Pull] Fix NPE when reading IncrementalPull.sqltemplate in
>HiveIncrementalPuller[17]
>[CLI] HoodieCommitMetadata only show first commit insert rows[18]
>[CLI] CLI doesn't allow rolling back a Delta commit[19]
>[DeltaStreamer] DeltaSteamer should pick checkpoints off only deltacommits
>for MOR tables[20]
>
>[17] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-484
>[18] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-469
>[19] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-248
>[20] https://issues.apache.org/jira/projects/HUDI/issues/HUDI-322