You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by 蒋晓峰 <pr...@163.com> on 2019/11/27 06:04:07 UTC

Hudi integration module through plug-inization

Hi guys,


Feeling the pain of supporting Flink engine for Hudi, it is necessary to discuss the design of high cohesion, low coupling, and plug-in for the calculation engine module here. 


Now Hudi's design, in order to highlight its core components, is a patchwork of the Spark RDD API mixed with business logic scattered in multiple modules and various types of methods. As a result, developers with a background in computing engines have difficulty understanding the main process of Spark job, and the calculation engine plug-in is also more difficult, because the general interface carries the context of RDD and Spark, unless large-scale restructuring is started.


In my opinion, it is necessary to refactor the Hudi integration module through plug-inization to facilitate the subsequent integration of Spark and FLink.


Best,
Nicholas