You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by ki...@apache.org on 2022/06/08 12:10:49 UTC

[incubator-seatunnel-website] branch main updated: add Blog (#125)

This is an automated email from the ASF dual-hosted git repository.

kirs pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-seatunnel-website.git


The following commit(s) were added to refs/heads/main by this push:
     new bddde4690 add Blog (#125)
bddde4690 is described below

commit bddde46908b8a8c673d558424581ec91ce8fa884
Author: lifeng <53...@users.noreply.github.com>
AuthorDate: Wed Jun 8 20:10:43 2022 +0800

    add Blog (#125)
    
    * add Blog
    
    * Fix file name
    
    * Fix docs config error and renome some files
    
    Co-authored-by: CalvinKirs <ki...@apache.org>
---
 ...l_support_fo.md => 2022-03-18-2-1-0-release.md} |   0
 blog/2022-05-01_Kidswant.md                        | 225 +++++++++++++++++++++
 blog/2022-05-10-ClickHouse.md                      | 170 ++++++++++++++++
 blog/2022-05-31-engine.md                          | 168 +++++++++++++++
 ...ctoring_full_support_fo.md => 2-1-0-release.md} | 120 +++++------
 .../2022-05-01_Kidswant.md                         | 224 ++++++++++++++++++++
 .../2022-05-10-ClickHouse.md                       | 163 +++++++++++++++
 .../2022-05-31-engine.md                           | 203 +++++++++++++++++++
 static/image/20220501/ch/0-1.png                   | Bin 0 -> 4666384 bytes
 static/image/20220501/ch/0.png                     | Bin 0 -> 280397 bytes
 static/image/20220501/ch/1.png                     | Bin 0 -> 81145 bytes
 static/image/20220501/ch/10.png                    | Bin 0 -> 16146 bytes
 static/image/20220501/ch/11.png                    | Bin 0 -> 8526 bytes
 static/image/20220501/ch/12.jpg                    | Bin 0 -> 138345 bytes
 static/image/20220501/ch/2.png                     | Bin 0 -> 534893 bytes
 static/image/20220501/ch/3.png                     | Bin 0 -> 962511 bytes
 static/image/20220501/ch/4.png                     | Bin 0 -> 1071868 bytes
 static/image/20220501/ch/5-1.png                   | Bin 0 -> 63912 bytes
 static/image/20220501/ch/5.png                     | Bin 0 -> 27072 bytes
 static/image/20220501/ch/7.png                     | Bin 0 -> 50198 bytes
 static/image/20220501/ch/8.jpg                     | Bin 0 -> 291953 bytes
 static/image/20220501/ch/9.png                     | Bin 0 -> 42647 bytes
 static/image/20220501/en/0-1.png                   | Bin 0 -> 4666384 bytes
 static/image/20220501/en/0.png                     | Bin 0 -> 280397 bytes
 static/image/20220501/en/1.png                     | Bin 0 -> 3364314 bytes
 static/image/20220501/en/10.png                    | Bin 0 -> 2250697 bytes
 static/image/20220501/en/11.png                    | Bin 0 -> 1969856 bytes
 static/image/20220501/en/12.png                    | Bin 0 -> 276968 bytes
 static/image/20220501/en/2.png                     | Bin 0 -> 2651805 bytes
 static/image/20220501/en/3.png                     | Bin 0 -> 3387383 bytes
 static/image/20220501/en/4.png                     | Bin 0 -> 2957579 bytes
 static/image/20220501/en/5.png                     | Bin 0 -> 3450849 bytes
 static/image/20220501/en/6.png                     | Bin 0 -> 3023920 bytes
 static/image/20220501/en/7.png                     | Bin 0 -> 61864 bytes
 static/image/20220501/en/8.png                     | Bin 0 -> 234765 bytes
 static/image/20220501/en/9.png                     | Bin 0 -> 2170094 bytes
 static/image/20220510/ch/0-1.png                   | Bin 0 -> 1035226 bytes
 static/image/20220510/ch/0.jpg                     | Bin 0 -> 348483 bytes
 static/image/20220510/ch/1.png                     | Bin 0 -> 132557 bytes
 static/image/20220510/ch/2.png                     | Bin 0 -> 69024 bytes
 static/image/20220510/ch/3.png                     | Bin 0 -> 145084 bytes
 static/image/20220510/ch/4.png                     | Bin 0 -> 97608 bytes
 static/image/20220510/ch/5.png                     | Bin 0 -> 123128 bytes
 static/image/20220510/ch/6.png                     | Bin 0 -> 81386 bytes
 static/image/20220510/ch/7.png                     | Bin 0 -> 59101 bytes
 static/image/20220510/ch/8.png                     | Bin 0 -> 46437 bytes
 static/image/20220510/en/0-1.png                   | Bin 0 -> 1035226 bytes
 static/image/20220510/en/0.jpg                     | Bin 0 -> 348483 bytes
 static/image/20220510/en/1.png                     | Bin 0 -> 132557 bytes
 static/image/20220510/en/2.png                     | Bin 0 -> 69024 bytes
 static/image/20220510/en/3.png                     | Bin 0 -> 145084 bytes
 static/image/20220510/en/4.png                     | Bin 0 -> 97608 bytes
 static/image/20220510/en/5.png                     | Bin 0 -> 123128 bytes
 static/image/20220510/en/6.png                     | Bin 0 -> 81386 bytes
 static/image/20220510/en/7.png                     | Bin 0 -> 59101 bytes
 static/image/20220510/en/8.png                     | Bin 0 -> 46437 bytes
 static/image/20220531/ch/0.jpg                     | Bin 0 -> 454262 bytes
 static/image/20220531/ch/1.jpg                     | Bin 0 -> 100342 bytes
 static/image/20220531/ch/2.jpg                     | Bin 0 -> 61881 bytes
 static/image/20220531/ch/3.jpg                     | Bin 0 -> 56889 bytes
 static/image/20220531/ch/4.jpg                     | Bin 0 -> 63420 bytes
 static/image/20220531/ch/5.jpg                     | Bin 0 -> 53005 bytes
 static/image/20220531/ch/6.jpg                     | Bin 0 -> 63970 bytes
 static/image/20220531/ch/7.jpg                     | Bin 0 -> 46727 bytes
 static/image/20220531/ch/8.jpg                     | Bin 0 -> 43573 bytes
 static/image/20220531/en/0-1.png                   | Bin 0 -> 298452 bytes
 static/image/20220531/en/0.jpg                     | Bin 0 -> 454262 bytes
 static/image/20220531/en/1.jpg                     | Bin 0 -> 100342 bytes
 static/image/20220531/en/2.jpg                     | Bin 0 -> 61881 bytes
 static/image/20220531/en/3.jpg                     | Bin 0 -> 56889 bytes
 static/image/20220531/en/4.jpg                     | Bin 0 -> 63420 bytes
 static/image/20220531/en/5.jpg                     | Bin 0 -> 53005 bytes
 static/image/20220531/en/6.jpg                     | Bin 0 -> 63970 bytes
 static/image/20220531/en/7.jpg                     | Bin 0 -> 46727 bytes
 static/image/20220531/en/8.jpg                     | Bin 0 -> 43573 bytes
 75 files changed, 1213 insertions(+), 60 deletions(-)

diff --git a/blog/2022-03-18-SeaTunnel_s_first_Apache_release_2_1_0_release_kernel_refactoring_full_support_fo.md b/blog/2022-03-18-2-1-0-release.md
similarity index 100%
rename from blog/2022-03-18-SeaTunnel_s_first_Apache_release_2_1_0_release_kernel_refactoring_full_support_fo.md
rename to blog/2022-03-18-2-1-0-release.md
diff --git a/blog/2022-05-01_Kidswant.md b/blog/2022-05-01_Kidswant.md
new file mode 100644
index 000000000..7fb32335e
--- /dev/null
+++ b/blog/2022-05-01_Kidswant.md
@@ -0,0 +1,225 @@
+---
+slug: SeaTunnel Application and Refactoring at Kidswant
+title: SeaTunnel Application and Refactoring at Kidswant
+tags: [Meetup]
+---
+
+# SeaTunnel Application and Refactoring at Kidswant
+
+![](/image/20220501/en/0.png)
+
+At the Apache SeaTunnel (Incubating) Meetup in April, Yuan Hongjun, a big data expert and OLAP platform architect at Kidswant, shared a topic of SeaTunnel Application and Refactoring at Kidswant.
+
+The presentation contains five parts.
+
+- Background of the introduction of Apache SeaTunnel (Incubating) by Kidswant
+- A comparison of mainstream tools for big data processing
+- The implementation of Apache SeaTunnel (Incubating)
+- Common problems in Apache SeaTunnel (Incubating) refactoring
+- Predictions on the future development of Kidswant
+
+
+
+![](/image/20220501/en/0-1.png)
+
+
+
+Yuan Hongjun, Big data expert, OLAP platform architect of Kidswant. He has many years of experience in big data platform development and management, and has rich research experience in data assets, data lineage mapping, data governance, OLAP, and other fields.
+
+
+## 01 Background
+
+
+
+![](/image/20220501/en/1.png)
+
+
+
+At present, Kidswant’s OLAP platform consists of seven parts: metadata layer, task layer, storage layer, SQL layer, scheduling layer, service layer, and monitoring layer. This sharing focuses on offline tasks in the task layer.
+
+
+In fact, Kidswant had a complete internal collection and push system, but due to some historical legacy issues, the company’s existing platform could not quickly support the OLAP platform getting online, so at that time the company had to abandon its own platform and start developing a new system instead.
+There were three options in front of OLAP at the time.
+
+
+1, Re-develop the collection and push system.
+
+
+2、Self-R&D.
+
+
+3, Participate in open source projects.
+
+
+## 02 Big data processing mainstream tools comparison
+
+
+These three options have their own pros and cons. Carrying re-research and development based on the collection and push system is convenient for us to take advantage of the experience of previous results and avoid repeatedly stepping into the pit. But the disadvantage is that it requires a large amount of code, time, a longer research period, and with less abstract code and lots of customized functions bound to the business, it’s difficult to do the re-development.
+
+
+If completely self-developed, though the development process is autonomous and controllable, some engines such as Spark can be done to fit our own architecture, while the disadvantage is that we may encounter some unknown problems.
+
+
+For the last choice, if we use open-source frameworks, the advantage is that there is more abstract code, and the framework can be guaranteed in terms of performance and stability after verification by other major companies. Therefore Kidswant mainly studied three open-source data synchronization tools, DATAX, Sqoop, and SeaTunnel in the early stages of OLAP data synchronization refactoring.
+
+
+![](/image/20220501/en/2.png)
+
+
+From the diagram we can see that Sqoop’s main function is data synchronization for RDB, and its implementation is based on MAP/REDUCE. Sqoop has rich parameters and command lines to perform various operations. The advantage of Sqoop is that it fits Hadoop ecology, and already supports most of the conversion from RDB to HIVE arbitrary source, with a complete set of commands and APIs.
+
+
+The disadvantages are that Sqoop only supports RDB data synchronization and has some limitations on data files, and there is no concept of data cleansing yet.
+
+
+
+![](/image/20220501/en/3.png)
+
+
+
+DataX mainly aims at synchronizing data from any source by configurable files + multi-threading, which runs three main processes: Reader, Framework, and Writer, where Framework mainly plays the role of communication and leaving empty space.
+
+
+The advantage of DataX is that it uses plug-in development, has its own flow control and data control, and is active in the community, with DataX’s official website offering data pushes from many different sources. The disadvantage of DataX, however, is that it is memory-based and there may be limitations on the amount of data available.
+
+
+
+![](/image/20220501/en/4.png)
+
+
+
+Apache SeaTunnel (Incubating) also does data synchronization from any source and implements the process in three steps: source, transform and sink based on configuration files, Spark or Flink. 
+
+The advantage is that the current 2.1.0 version has a very large number of plug-ins and source pushes, based on the idea of plug-ins also makes it very easy to extend and embrace Spark and Flink while with a distributed architecture. The only downside to Apache SeaTunnel (Incubating) is probably the lack of IP calls at the moment and the need to manage the UI interface by yourself.
+
+
+In summary, although Sqoop is distributed, it only supports data synchronization between RDB and HIVE, Hbase and has poor scalability, which is not convenient for re-development. DataX is scalable and stable overall, but because it is a standalone version, it cannot be deployed in a distributed cluster, and there is a strong dependency between data extraction capability and machine performance. SeaTunnel, on the other hand, is similar to DataX and makes up for the flaw of non-distributed [...]
+
+
+## 03 Implementation
+
+
+On the Apache SeaTunnel (Incubating) website, we can see that the basic process of Apache SeaTunnel (Incubating) consists of three parts: source, transform and sink. According to the guidelines on the website, Apache SeaTunnel (Incubating) requires a configuration script to start, but after some research, we found that the final execution of Apache SeaTunnel (Incubating) is bansed on an application submitted by spark-submit that relies on the config file.
+
+
+This initialization, although simple, has the problem of having to rely on the config file, which is generated and then cleared after each run, and although it can be dynamically generated in the scheduling script, it raises two questions: 1) whether frequent disk operations make sense; and 2) whether there is a more efficient way to support Apache SeaTunnel (Incubating).
+
+
+
+![](/image/20220501/en/5.png)
+
+
+
+With these considerations in mind, we added a Unified Configuration Template Platform module to the final design solution. Scheduling is done by initiating a commit command, and Apache SeaTunnel (Incubating) itself pulls the configuration information from the unified configuration template platform, then loads and initializes the parameters.
+
+
+
+![](/image/20220501/en/6.png)
+
+
+
+The diagram above shows the business process for Kidswant’s OLAP, which is divided into three sections. The overall flow of data from Parquet, i.e. Hive, through the Parquet tables to KYLIN and CK source.
+
+
+
+![](/image/20220501/en/7.png)
+
+
+
+This is the page where we construct the model, which is generated mainly through drag and drop, with some transactional operations between each table, and micro-processing for Apache SeaTunnel (Incubating) on the right.
+
+
+
+![](/image/20220501/en/8.png)
+
+
+
+So we end up submitting the commands as above, where the first one marked in red is [-conf customconfig/jars], referring to the fact that the user can then unify the configuration template platform for processing, or specify it separately when modeling. The last one marked in red is [421 $start_time $end_time $taskType] Unicode, which is a unique encoding.
+
+
+Below, on the left, are the 38 commands submitted by our final dispatch script. Below, on the right, is a modification made for Apache SeaTunnel (Incubating), and you can see a more specific tool class called WaterdropContext. It can first determine if Unicode exists and then use Unicode_code to get the configuration information for the different templates, avoiding the need to manipulate the config file.
+
+
+In the end, the reportMeta is used to report some information after the task is completed, which is also done in Apache SeaTunnel (Incubating).
+
+
+
+![](/image/20220501/en/9.png)
+
+
+
+
+
+![](/image/20220501/en/10.png)
+
+
+
+
+![](/image/20220501/en/11.png)
+
+
+
+In the finalized config file as above, it is worth noting that in terms of transforms, Kidswant has made some changes. The first is to do desensitization for mobile phones or ID numbers etc. If the user specifies a field, they do it by field, if not they will scan all fields and then desensitize and encrypt them according to pattern matching.
+
+
+Second, transform also supports custom processing, as mentioned above when talking about OLAP modeling. With the addition of HideStr, the first ten fields of a string of characters can be retained and all characters at the back encrypted, providing some security in the data.
+
+
+Then, on the sink side, we added pre_sql in order to support the idempotency of the task, which is mainly done for tasks such as data deletion, or partition deletion, as the task cannot be run only once during production, and this design needed to account for the data deviation and correctness once operations such as reruns or complement occur.
+
+
+On the right side of the diagram, on the Sink side of a Clickhouse, we have added an is_senseless_mode, which forms a read/write senseless mode, where the user does not perceive the whole area when querying and complementing but uses the CK partition conversion, i.e. the command called MOVE PARTITION TO TABLE to operate.
+
+
+A special note here is the Sink side of KYLIN. KYLIN is a very special source with its own set of data entry logic and its monitoring page, so the transformation we have done on KYLIN is simply a call to its API operation and a simple API call and constant polling of the state when using KYLIN, so the resources for KYLIN are limited in the Unified Template Configuration platform.
+
+
+
+
+![](/image/20220501/en/12.png)
+
+
+## 04 Common problems about the Apache SeaTunnel (Incubating) transformation
+
+
+#### 01 OOM & Too Many Parts
+
+
+The problem usually arises during the Hive to Hive process, even if we go through automatic resource allocation, but there are cases where the data amount suddenly gets bigger, for example after holding several events. Such problems can only be avoided by manually and dynamically tuning the reference and adjusting the data synchronization batch time. In the future, we may try to control the data volume to achieve fine control.
+
+
+#### 02 Field and type inconsistency issues
+
+
+When the model runs, the user will make some changes to the upstream tables or fields that the task depends on, and these changes may lead to task failure if they are not perceived. The current solution is to rely on data lineage+ snapshots for advance awareness to avoid errors.
+
+
+#### 03 Custom data sources & custom separators
+
+
+If the finance department requires a customized separator or jar information, the user can now specify the loading of additional jar information as well as the separator information themselves in the unified configuration template platform.
+
+
+#### 04 Data skewing issues
+
+
+This may be due to users setting their parallelism but not being able to do so perfectly. We haven’t finished dealing with this issue yet, but we may add post-processing to the Source module to break up the data and complete the skew.
+
+
+#### 05 KYLIN global dictionary lock problem
+
+
+As the business grows, one cube will not be able to meet the needs of the users, so it will be necessary to create more than one cube. If the same fields are used between multiple cubes, the problem of KYLIN global dictionary lock will be encountered. The current solution is to separate the scheduling time between two or more tasks, or if this is not possible, we can make a distributed lock control, where the sink side of KYLIN has to get the lock to run.
+
+
+## 05 An outlook on the future of Kidswant
+
+1. Multi-source data synchronization, maybe processing for RDB sources
+2. Real-time Flink-based implementation
+3. Take over the existing collection and scheduling platform (mainly to solve the problem of splitting library and tables)
+4. Data quality verification, like some null values, the vacancy rate of the whole data, main time judgment, etc.
+
+
+This is all I have to share, I hope we can communicate more with the community in the future and make progress together, thanks!
+
+
diff --git a/blog/2022-05-10-ClickHouse.md b/blog/2022-05-10-ClickHouse.md
new file mode 100644
index 000000000..7380b84db
--- /dev/null
+++ b/blog/2022-05-10-ClickHouse.md
@@ -0,0 +1,170 @@
+---
+slug: How to synchronize tens of billions of data based on SeaTunnel's ClickHouse
+title: How to synchronize tens of billions of data based on SeaTunnel's ClickHouse
+tags: [Meetup]
+---
+
+# How to synchronize tens of billions of data based on SeaTunnel's ClickHouse 
+
+![](/image/20220510/en/0.jpg)
+
+
+Author | Fan Jia, Apache SeaTunnel(Incubating) Contributor
+Editor | Test Engineer Feng Xiulan
+
+For importing billions of batches of data, the traditional JDBC approach does not perform as well as it should in some massive data synchronization scenarios. To write data faster, Apache SeaTunnel (Incubating) has just released version 2.1.1 to provide support for ClickhouseFile-Connector to implement Bulk load data writing.
+
+Bulk load means synchronizing large amounts of data to the target DB. SeaTunnel currently supports data synchronization to ClickHouse.
+
+At the Apache SeaTunnel (Incubating) April Meetup, Apache SeaTunnel (Incubating) contributor Fan Jia shared the topic of "ClickHouse bulk load implementation based on SeaTunnel", explaining in detail the implementation principle and process of ClickHouseFile for efficient processing of large amounts of data.
+
+Thanks to the test engineer Feng Xiulan for the article arrangement!
+
+This presentation contains seven parts.
+
+- State of ClickHouse Sink
+- Scenarios that ClickHouse Sink isn't good at 
+- Introduction to the ClickHouseFile plugin
+- ClickHouseFile core technologies
+- Analysis of ClickHouseFile plugin implementation
+- Comparison of plug-in capabilities
+- Post-optimization directions
+
+
+
+![](/image/20220510/en/0-1.png)
+
+
+Fan Jia,  Apache SeaTunnel (Incubating) contributor, Senior Enginee of WhaleOps.
+
+## 01 Status of ClickHouse Sink 
+
+At present, the process of synchronizing data from SeaTunnel to ClickHouse is as follows: as long as the data source is supported by SeaTunnel, the data can be extracted, converted (or not), and written directly to the ClickHouse sink connector, and then written to the ClickHouse server via JDBC. 
+
+
+![](/image/20220510/en/1.png)
+
+
+However, there are some problems with writing to the ClickHouse server via traditional JDBC.
+
+Firstly, the tool used now is the driver provided by ClickHouse and implemented via HTTP, however, HTTP is not very efficient to implement in certain scenarios. The second is the huge amount of data, if there is duplicate data or a large amount of data written at once, it needs to generate the corresponding insert statement and send it via HTTP to the ClickHouse server-side by the traditional method, where it is parsed and executed item by item or in batches, which does not allow data co [...]
+
+Finally, there is the problem we often encounter, i.e. too much data may lead to an OOM on the SeaTunnel side or a server-side hang due to too much data being written to the server-side too often.
+
+So we thought, is there a faster way to send than HTTP? If data pre-processing or data compression could be done on the SeaTunnel side, then the network bandwidth pressure would be reduced and the transmission rate would be increased.
+
+## 02 Scenarios that ClickHouse Sink isn't good at
+
+1. If the HTTP transfer protocol is used, HTTP may not be able to handle it when the volume of data is too large and the batch is sending requests in micro-batches.
+ 
+2. Too many INSERT requests may put too much pressure on the server. The bandwidth can handle a large number of requests, but the server-side is not always able to carry them. The online server not only needs data inserts but more importantly, the query data can be used by other business teams. If the server cluster goes down due to too much-inserted data, it is more than worth the cost.
+
+## 03 ClickHouse File core technologies
+
+In response to these scenarios that ClickHouse is not good at, we wondered is there a way to do data compression right on the Spark side, without increasing the resource load on the Server when writing data, and with the ability to write large amounts of data quickly? So we developed the ClickHouseFile plugin to solve the problem.
+
+The key technology of the ClickHouseFile plugin is ClickHouse -local. ClickHouse-local mode allows users to perform fast processing of local files without having to deploy and configure a ClickHouse Server. C lickHouse-local uses the same core as ClickHouse Server, so it supports most features as well as the same format and table engine.
+
+These two features mean that users can work directly with local files without having to do the processing on the ClickHouse Server side. Because it is the same format, the data generated by the operations we perform on the remote or SeaTunnel side is seamlessly compatible with the server-side and can be written to using ClickHouse local. ClickHouse local is the core technology for the implementation of ClickHouseFile, which allows for implementing the ClickHouse file connector.
+
+ClickHouse local core is used in the following ways.
+
+
+
+![](/image/20220510/en/2.png)
+
+
+
+First line: pass the data to the test_table table of the ClickHouse-local program via the Linux pipeline.
+
+Lines two to five: create a result_table for receiving data.
+
+The sixth line: pass data from test_table to the result_table.
+
+Line 7: Define the disk path for data processing.
+
+By calling the Clickhouse-local component, the Apache SeaTunnel (Incubating) is used to generate the data files and compress the data. By communicating with the Server, the generated data is sent directly to the different nodes of Clickhouse and the data files are then made available to the nodes for the query.
+
+Comparison of the original and current implementations.
+
+
+
+![](/image/20220510/en/3.png)
+
+
+
+Originally, the data, including the insert statements was sent by Spark to the server, and the server did the SQL parsing, generated and compressed the table data files, generated the corresponding files, and created the corresponding indexes. If we use ClickHouse local technology, the data file generation, file compression and index creation are done by SeaTunnel, and the final output is a file or folder for the server-side, which is synchronized to the server and the server can queries [...]
+
+
+## 04 Core technical points
+
+
+
+![](/image/20220510/en/4.png)
+
+
+
+The above process makes data synchronization more efficient, thanks to three optimizations we have made to it.
+
+Firstly, the data is transferred from the pipeline to the ClickHouseFile by the division, which imposes limitations in terms of length and memory. For this reason, we write the data received by the ClickHouse connector, i.e. the sink side, to a temporary file via MMAP technology, and then the ClickHouse local reads the data from the temporary file to generate our target local file, in order to achieve the effect of incremental data reading and solve the OM problem.
+
+
+
+![](/image/20220510/en/5.png)
+
+
+
+Secondly, it supports sharding. If only one file or folder is generated in a cluster, the file is distributed to only one node, which will greatly reduce the performance of the query. Therefore, we carry out slicing support. Users can set the key for slicing in the configuration folder, and the algorithm will divide the data into multiple log files and write them to different cluster nodes, significantly improving the read performance.
+
+
+
+![](/image/20220510/en/6.png)
+
+
+
+The third key optimization is file transfer. Currently, SeaTunnel supports two file transfer methods, one is SCP, which is characterized by security, versatility, and no additional configuration; the other is RSYNC, which is somewhat fast and efficient and supports breakpoint resume, but requires additional configuration, users can choose between the way suits their needs.
+
+## 05 Plugin implementation analysis
+
+In summary, the general implementation process of ClickHouseFile is as follows.
+
+
+
+![](/image/20220510/en/7.png)
+
+
+
+1.caching data to the ClickHouse sink side.
+2.calling ClickHouse-local to generate the file.
+3.sending the data to the ClickHouse server.
+4.Execution of the ATTACH command.
+
+With the above four steps, the generated data reaches a queryable state.
+
+## 06 Comparison of plug-in capabilities
+
+
+
+![](/image/20220510/en/8.png)
+
+
+(a) In terms of data transfer, ClickHouseFile is more suitable for massive amounts of data, with the advantage that no additional configuration is required and it is highly versatile, while ClickHouseFile is more complex to configure and currently supports fewer engines.
+
+In terms of environmental complexity, ClickHouse is more suitable for complex environments and can be run directly without additional configuration.
+
+In terms of versatility, ClickHouse, due to being an officially supported JDBC diver by SeaTunnel, basically supports all engines for data writing, while ClickHouseFile supports relatively few engines.
+
+In terms of server pressure, ClickHouseFile's advantage shows when it comes to massive data transfers that don't put too much pressure on the server.
+
+However, the two are not in competition and the choice needs to be based on the usage scenario.
+
+## 07 Follow-up plans
+
+Although SeaTunnel currently supports the ClickHouseFile plugin, there are still many defects that need to be optimized, mainly including
+
+- Rsync support.
+- Exactly-Once support.
+- Zero Copy support for transferring data files.
+- More Engine support.
+
+Anyone interested in the above issues is welcome to contribute to the follow-up plans, or tell me your ideas!
\ No newline at end of file
diff --git a/blog/2022-05-31-engine.md b/blog/2022-05-31-engine.md
new file mode 100644
index 000000000..7729db411
--- /dev/null
+++ b/blog/2022-05-31-engine.md
@@ -0,0 +1,168 @@
+---
+slug: How dose Apache SeaTunnel refactor the API to decouple with the computing engine
+title: How dose Apache SeaTunnel refactor the API to decouple with the computing engine
+tags: [Meetup]
+---
+
+# How dose Apache SeaTunnel (Incubating) refactor the API to decouple with the computing engine 
+
+![](/image/20220531/en/0.jpg)
+
+Translator | Critina
+
+In the May joint Meetup between Apache SeaTunnel and Apache Inlong, Li Zongwen, a senior engineer at WhaleOps, shared his experiences about finding and refactoring of the the four major problems with Apache SeaTunnel (Incubating).i.e. the connectors of SeaTunnel have to be implemented many times,the inconsistent parameters, SeaTunnel is not supportive of multiple versions of the engine, and it’s difficult to upgrade the engine. In order to solve these problems, Li Zongwen aimed to decoup [...]
+
+This speech mainly consists of five parts.The first part is about Apache SeaTunnel (Incubator) refactoring background and motivation. The second part introduces Apache SeaTunnel (Incubating) Target for refactoring.The third part discusses Apache SeaTunnel (Incubating) overall design for refactoring. The last two parts is about Apache SeaTunnel (Incubating) Source API design and Apache SeaTunnel (Incubating) Sink API design.
+
+## 01 Background and motivation for refactoring
+
+Those of you who have used Apache SeaTunnel (Incubator) or developers should know that Apache SeaTunnel (Incubator) is now fully coupled with the engine, which is entirely based on Spark or Flink, and so are the configuration file parameters. From the perspective of contributors and users, we can find they face some problems.
+
+In the view of the contributors, repeated implementing connector is meaningless and it is unable for potential contributors to contribute to the community due to inconsistent engine versions.
+
+At present, many companies use Lambda architecture, Spark is used for offline operations and Flink is used for real-time operations. In the view of the users, it can be found that Spark may have the Connector of SeaTunnel, but Flink does not, and the parameters of the two engines for the Connector of the same storage engine are not unified, thus resulting a high cost of and deviating from its original intention of being easy to use. And some users question that Flink version 1.14 is not  [...]
+
+As a result, it was difficult for us to either upgrade engine or support more versions.
+
+In addition, Spark and Flink both adopt the Checkpoint fault-tolerant mechanism implemented by Chandy-Lamport algorithm and internally unify DataSet and DataStream. On this premise, we believe decoupling is feasible.
+
+## 02 Apache SeaTunnel (Incubating) decouples with computing engine
+
+
+Therefore, in order to solve the problems raised above, we set the following goals.
+
+1. Connector is only implemented once. To solve the problems that parameters are not unified and Connector is implemented for too many times, we hope to achieve a unified Source and Sink API;
+
+2. Multiple versions of Spark and Flink engines are supported. A translation layer above the Source and Sink API is added to support multiple versions of Spark and Flink engines.
+
+3. The logic for parallel shard of Source and the Sink submission should be clarified. We must provide a good API to support Connector development.
+
+4. The full database synchronization in real-time scenarios should be supported. This is a derivative requirement that many users have mentioned for CDC support. I once participated the Flink CDC community before and many users pointed out that in the CDC scenario, if you wanted to use the Flink CDC directly, each table would have a link and there would be thousands of links for thousands of tables when you need to synchronize the whole library, which was unacceptable for both the databa [...]
+
+5. Automatic discovery and storage of meta information are realized. The users should have awful experience due to the storage engines such as Kafka lacking of record of the data structure, when we need to read structured data, the user must define the topic of structured data types before read one topic at a time . We hope once the configuration is completed, there is no need to do any redundant work again.
+
+Some people may wonder why we don’t use Apache Beam directly. That is because Beam sources are divided into BOUNDED and UNBOUNDED sources, which means it needs to be implemented twice. Moreover, some features of Source and Sink are not supported, which will be mentioned later.
+
+## 03 Apache SeaTunnel(Incubating) overall design for refactoring
+
+![](/image/20220531/en/1.jpg)
+
+The Apache SeaTunnel(Incubating) API architecture is described in the picture above.
+
+The Source & Sink API is one of the core APIS of data integration, which defines the logic for parallel shard of Source and the commitment of Sink to realize the Connector.
+
+The Engine API includes the translation and the execution layers. The translation is used to translate Souce and Sink API of SeaTunnel into connectors that can be run inside the engine.
+
+The execution defines the execution logic of Source, Transform, Sink and other operations in the engine.
+
+The Table SPI is mainly used to expose the interface of Source and Sink in SPI mode, and to specify mandatory and optional parameters of Connector etc.
+
+The DataType includes SeaTunnel data structure used to isolate engines and declare Table schema.
+
+The Catalog is Used to obtain Table schemes and Options, etc. The Catalog Storage is used to store Table Schemes defined by unstructured engines such as Kafka.
+
+![](/image/20220531/en/2.jpg)
+
+
+The execution flow we assumed nowadays can be see in the above picture.
+
+1. Obtain task parameters from configuration files or UI.
+
+2. Obtain the Table Schema, Option and other information by analyzing the parameters from Catalog.
+
+3. Pull up the Connector of SeaTunnel in SPI mode and inject Table information.
+
+4. Translate the Connector from SeaTunnel into the Connector within the engine.
+
+5. Execute the operation logic of the engine. The multi-table distribution in the picture only exists in the synchronization of the whole database of CDC, while other connectors are single tables and do not need the distribution logic.
+
+It can be seen that the hardest part of the plan is to translate Source and Sink into an internal Source and Sink in the engine.
+
+Many users today use Apache SeaTunnel (Incubating) not only as a data integration tool but also as a data storage tool, and use a lot of Spark and Flink SQLs. We want to preserve that SQL capability for users to upgrade seamlessly.
+
+
+![](/image/20220531/en/3.jpg)
+
+
+According to our research, the feature above shows the ideal execution logic of Source and Sink. Since SeaTunnel is incubated as WaterDrop, the terms in the figure are tended towards Spark.
+
+Ideally, the Source and Sink coordinators can be run on the Driver, and the Source Reader and Sink Writer will run on the Worker. In terms of the Source Coordinator, we expect it to support several features.
+
+The first capability is that the slicing logic of data can be dynamically added to the Reader.
+
+The second is that the coordination of Reader can be supported. Source Reader is used to read data, and then send the data to the engine, and finally to the Source Writer for data writing. Meanwhile, Writer can support the two-phase transaction submission, and the coordinator of Sink supports the aggregation submission requirements of Connector such as Iceberg.
+
+## 04 Source API
+
+
+After research, we found the following features that are required by Source.
+
+1. Unified offline and real-time API , which supports that source is implemented only once and supports both offline and real-time API;
+
+2. Supportive of parallel reading. For example that Kafka generates a reader for each partition and execute in parallel.
+
+3. Supporting dynamic slice-adding. For example, Kafka defines a regular topic, when a new topic needs to be added due to the volume of business, the Source API allows to dynamically add the slice to the job.
+
+4. Supporting the work of coordinating reader, which is currently only needed in the CDC Connector. CDC is currently supported by NetFilx’s DBlog parallel algorithms, which requires reader coordination between full synchronization and incremental synchronization.
+
+5. Supporting a single reader to process multiple tables, i.e. to allows the whole database synchronization in the real-time scenario as mentioned above.
+
+![](/image/20220531/en/4.jpg)
+
+
+Based on the above requirements, we have created the basic API as shown in the figure above. And the code has been submitted to the API-Draft branch in the Apache SeaTunnel(Incubator) community. If you’re interested, you can view the code in detail.
+
+### How to adapt to Spark and Flink engines
+
+
+Flink and Spark unify the API of DataSet and DataStream, and they can support the first two features. Then, for the remaining three features, how do we
+
+- Support dynamic slice-adding?
+- Support the work of coordinating reader?
+- Support a single reader to process multiple tables?
+
+Let's review the design with questions.
+
+![](/image/20220531/en/5.jpg)
+
+
+We found that other connectors do not need coordinators, except for CDC. For those connectors that do not need coordinators, we have a Source that supports parallel execution and engine translation.
+
+As shown in the figure above, there is a slice enumerator on the left, which can list which slices the source needs and show what there are. After enumerating slices in real time, each slice would be distributed to SourceReader, the real data reading module. Boundedness marker is used to differentiate offline and real-time operations. Connector can mark whether there is a stop Offset in a slice. For example, Kafka can support real-time and offline operations. The degree of parallelism ca [...]
+
+
+As shown in the figure above, in a scenario where a coordinator is required, Event transmission is done between the Reader and Enumerator. Enumerator coordinates events by the Event sent by the Reader. The Coordinated Source needs to ensure single parallelism at the engine level to ensure data consistency. Of course, this does not make good use of the engine’s memory management mechanism, but trade-offs are necessary.
+
+
+![](/image/20220531/en/6.jpg)
+
+
+For the last question, how can we support a single reader to process multiple tables? This involves the Table API layer. Once all the required tables have been read from the Catalog, some of the tables may belong to a single job and can be read by a link, and some may need to be separated, depending on how Source is implemented. Since this is a special requirement, we want to make it easier for the developers. In the Table API layer, we will provide a SupportMultipleTable interface to de [...]
+
+## 5 Sink API
+
+At present, there are not many features required by Sink, but three mojor requirements are considerable according to our research.
+
+The first is about idempotent writing, which requires no code and depends on whether the storage engine can support it.
+
+The second is about distributed transactions. The mainstream method is two-phase commitments, such as Kafka etc.
+
+The third is about the submission of aggregation. For Storage engines like Iceberg and Hoodie, we hope there is no issues triggered by small files, so we expect to aggregate these files into a single file and commit it as a whole.
+
+Based on these three requirements, we built three APIS: SinkWriter, SinkCommitter, and SinkAggregated Committer. SinkWriter plays a role of writing, which may or may not be idempotent. SinkCommitter supports for two-phase commitments. SinkAggregatedCommitter supports for aggregated commitments.
+
+![](/image/20220531/en/7.jpg)
+
+
+Ideally, AggregatedCommitter runs in Driver in single or parallel, and Writer and Committer run in Worker with multiple parallels, with each parallel carrying its own pre-commit work and then send Aggregated messages to Aggregated committers.
+
+Current advanced versions of Spark and Flink all support AggregatedCommitter running on the Driver(Job Manager) and Writer and Committer running on the worker(Job Manager).
+
+![](/image/20220531/en/8.jpg)
+
+
+However, for the lower versions of Flink, AggregatedCommitter cannot be supported to run in JM, so we are also carrying translation adaptation. Writer and Committer will act as pre-operators, packaged by Flink’s ProcessFunction, supports concurrent pre-delivery and write, and implement two-phase commitment based on Flink’s Checkpoint mechanism. This is also the current 2PC implementation of many of Flink connectors. The ProcessFunction can send messages about pre-commits to downstream Ag [...]
+
+Thank you for watching. If you’re interested in the specific implementations mentioned in my speech, you can refer to the Apache SeaTunnel (Incubating) community and check out the API-Draft branch code. Thank you again.
+
+
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/2022-03-18-SeaTunnel_s_first_Apache_release_2_1_0_release_kernel_refactoring_full_support_fo.md b/i18n/zh-CN/docusaurus-plugin-content-blog/2-1-0-release.md
similarity index 99%
rename from i18n/zh-CN/docusaurus-plugin-content-blog/2022-03-18-SeaTunnel_s_first_Apache_release_2_1_0_release_kernel_refactoring_full_support_fo.md
rename to i18n/zh-CN/docusaurus-plugin-content-blog/2-1-0-release.md
index 9fc5cc785..2f7f60459 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-blog/2022-03-18-SeaTunnel_s_first_Apache_release_2_1_0_release_kernel_refactoring_full_support_fo.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/2-1-0-release.md
@@ -1,61 +1,61 @@
----
-slug: Apache SeaTunnel(Incubating) 首个Apache 版本 2.1.0 发布,内核重构,全面支持Flink
-title: Apache SeaTunnel(Incubating) 首个Apache 版本 2.1.0 发布,内核重构,全面支持Flink
-tags: [2.1.0, Release]
----
-# Apache SeaTunnel(Incubating) 首个Apache 版本 2.1.0 发布,内核重构,全面支持Flink
- 
-2021 年 12 月 9 日,Apache SeaTunnel(Incubating) 进入 Apache 孵化器,在经过社区各位贡献者近四个月的努力下,我们于2022年3月18日发布了首个Apache版本,并且保证了首个版本一次性通过检查。这意味着 2.1.0 版本,是经过 Apache SeaTunnel(Incubating) 社区和 Apache 孵化器投票检查发布的官方版本,企业和个人用户可以放心安全使用。
-
-**Note:** **软件许可协议**是一种具有法律性质的合同或指导,目的在于规范受著作权保护的软件的使用或散布行为。通常的许可方式会允许用户来使用单一或多份该软件的复制,因为若无许可而径予使用该软件,将违反著作权法给予该软件开发者的专属保护。效用上来说,软件许可是软件开发者与其用户之间的一份合约,用来保证在符合许可范围的情况下,用户将不会受到控告。进入孵化器前后,我们花费了大量的时间来梳理整个项目的外部依赖以确保整个项目的合规性。需要说明的是,开源软件选择怎样的License并不意外着项目本身就一定合规。而ASF严苛的版本检查最大程度地保证了软件License的合规性,以及软件合理合法的流通分发。
-## 本次发布版本说明
-### 本次发布我们主要带来了以下特性:
-* 对微内核插件化的架构内核部分进行了大量优化,内核以 Java 为主,并对命令行参数解析,插件加载等做了大量改进,同时插件扩展可根据用户(或贡献者)所擅长的语言去做开发,极大程度地降低了插件开发门槛。
-* 全面支持 Flink ,但同时用户也可自由选择底层引擎,本次更新也为大家带来了大量的Flink插件,也欢迎大家后续贡献相关插件。
-* 提供本地开发极速启动环境支持(example),贡献者或用户可以在不更改任何代码的情况下快速丝滑启动,方便本地快速开发调试体验。对于需要自定义插件的贡献者或者用户来讲,这无疑是个令人激动的好消息。事实上,我们在发布前的测试中,也有大量贡献者采用这种方式快速对插件进行测试。
-* 提供了Docker容器安装,用户可以极快地通过Docker部署安装使用Apache SeaTunnel,未来我们也会围绕Docker&K8s做出大量迭代,欢迎大家讨论交流。
- 
- 
-具体发布说明:
-[Feature]
-* 使用 JCommander来做命令行参数解析,使得开发者更关注逻辑本身。
-* Flink从1.9升级至1.13.5,保持兼容旧版本,同时为后续CDC做好铺垫。
-* 支持 Doris 、Hudi、Phoenix、Druid等Connector 插件,完整的插件支持你可以在这里找到  [plugins-supported-by-seatunnel]([https://github.com/apache/incubator-seatunnel#plugins-supported-by-seatunnel](https://github.com/apache/incubator-seatunnel#plugins-supported-by-seatunnel)).
-* 本地开发极速启动环境支持,你可以在使用example模块,不修改任何代码的前提下快速启动,方便开发者本地调试体验。
-* 支持通过 Docker 容器安装和试用 Apache SeaTunnel。
-* Sql 组件支持 SET语句,支持配置变量。
-* Config模块重构,减少贡献者理解成本,同时保证项目的代码合规(License)。
-* 项目结构重新调整,以适应新的Roadmap。
-* CI&CD的支持,代码质量自动化管控,(后续会有更多的计划来支持CI&CD开发)。
- 
-## 【致谢】
-感谢以下参与贡献的同学(为GitHub ID,排名不分先后):
- 
-Al-assad, BenJFan, CalvinKirs, JNSimba, JiangTChen, Rianico, TyrantLucifer, Yves-yuan, ZhangchengHu0923, agendazhang, an-shi-chi-fan, asdf2014, bigdataf, chaozwn, choucmei, dailidong, dongzl, felix-thinkingdata, fengyuceNv, garyelephant, kalencaya, kezhenxu94, legendtkl, leo65535, liujinhui1994, mans2singh, marklightning, mosence, nielifeng, ououtt, ruanwenjun, simon824, totalo, wntp, wolfboys, wuchunfu, xbkaishui, xtr1993, yx91490, zhangbutao, zhaomin1423, zhongjiajie, zhuangchong, zixi0825.
- 
-同时也诚挚的感谢我们的Mentor:Zhenxu Ke,Willem Jiang, William Guo,LiDong Dai ,Ted Liu, Kevin,JB 在这个过程中给予的帮助
-## 未来几个版本的规划:
-* CDC的支持;
-* 监控体系的支持;
-* UI系统的支持;
-* 更多的 Connector 支持,以及更高效的Sink支持,如ClickHouse,很快会在下个版本跟大家见面。
- 
-后续Feature是由社区共同决定的,我们也在这里呼吁大家一同参与社区后续建设。
-欢迎大家关注以及贡献:)
- 
-## 社区发展
-### 【近期概况】
-自进入Apache孵化器以来,贡献者从13 人增长至 55 人,且持续保持上升趋势,平均周commits维持在20+,来自不同公司的三位贡献者(Lei Xie, HuaJie Wang,Chunfu Wu,)通过他们对社区的贡献被邀请成为Committer。我们举办了两场MeetUp,来自B站,OPPO、唯品会等企业讲师分享了SeaTunnel在他们在企业中的大规模生产落地实践(后续我们也会保持每月一次的meetup,欢迎各位使用SeaTunnel的用户或者贡献者分享SeaTunnel和你们的故事)。
-### 【Apache SeaTunnel(Incubating)的用户】
-Note:仅包含已登记用户
-Apache SeaTunnel(Incubating) 目前登记用户如下,如果您也在使用Apache SeaTunnel,欢迎在[Who is using SeaTunne](https://github.com/apache/incubator-seatunnel/issues/686)! 中登记!
- <div align="center">
-
-<img src="/image/20220321/1.png"/>
-
-</div>
-
- 
-## 【PPMC感言】
-Apache SeaTunnel(Incubating) PPMC LiFeng Nie在谈及首个Apache版本发布的时候说,从进入Apache Incubator的第一天,我们就一直在努力学习Apache Way以及各种Apache政策,第一个版本发布的过程花费了大量的时间(主要是合规性),但我们认为这种时间是值得花费的,这也是我们选择进入Apache的一个很重要的原因,我们需要让用户用得放心,而Apache无疑是最佳选择,其 License 近乎苛刻的检查会让用户尽可能地避免相关的合规性问题,保证软件合理合法的流通。另外,其践行Apache Way,例如公益使命、实用主义、社区胜于代码、公开透明与共识决策、任人唯贤等,可以帮助 SeaTunnel 社区更加开放、透明,向多元化方向发展。
+---
+slug: Apache SeaTunnel(Incubating) 首个Apache 版本 2.1.0 发布,内核重构,全面支持Flink
+title: Apache SeaTunnel(Incubating) 首个Apache 版本 2.1.0 发布,内核重构,全面支持Flink
+tags: [2.1.0, Release]
+---
+# Apache SeaTunnel(Incubating) 首个Apache 版本 2.1.0 发布,内核重构,全面支持Flink
+ 
+2021 年 12 月 9 日,Apache SeaTunnel(Incubating) 进入 Apache 孵化器,在经过社区各位贡献者近四个月的努力下,我们于2022年3月18日发布了首个Apache版本,并且保证了首个版本一次性通过检查。这意味着 2.1.0 版本,是经过 Apache SeaTunnel(Incubating) 社区和 Apache 孵化器投票检查发布的官方版本,企业和个人用户可以放心安全使用。
+
+**Note:** **软件许可协议**是一种具有法律性质的合同或指导,目的在于规范受著作权保护的软件的使用或散布行为。通常的许可方式会允许用户来使用单一或多份该软件的复制,因为若无许可而径予使用该软件,将违反著作权法给予该软件开发者的专属保护。效用上来说,软件许可是软件开发者与其用户之间的一份合约,用来保证在符合许可范围的情况下,用户将不会受到控告。进入孵化器前后,我们花费了大量的时间来梳理整个项目的外部依赖以确保整个项目的合规性。需要说明的是,开源软件选择怎样的License并不意外着项目本身就一定合规。而ASF严苛的版本检查最大程度地保证了软件License的合规性,以及软件合理合法的流通分发。
+## 本次发布版本说明
+### 本次发布我们主要带来了以下特性:
+* 对微内核插件化的架构内核部分进行了大量优化,内核以 Java 为主,并对命令行参数解析,插件加载等做了大量改进,同时插件扩展可根据用户(或贡献者)所擅长的语言去做开发,极大程度地降低了插件开发门槛。
+* 全面支持 Flink ,但同时用户也可自由选择底层引擎,本次更新也为大家带来了大量的Flink插件,也欢迎大家后续贡献相关插件。
+* 提供本地开发极速启动环境支持(example),贡献者或用户可以在不更改任何代码的情况下快速丝滑启动,方便本地快速开发调试体验。对于需要自定义插件的贡献者或者用户来讲,这无疑是个令人激动的好消息。事实上,我们在发布前的测试中,也有大量贡献者采用这种方式快速对插件进行测试。
+* 提供了Docker容器安装,用户可以极快地通过Docker部署安装使用Apache SeaTunnel,未来我们也会围绕Docker&K8s做出大量迭代,欢迎大家讨论交流。
+ 
+ 
+具体发布说明:
+[Feature]
+* 使用 JCommander来做命令行参数解析,使得开发者更关注逻辑本身。
+* Flink从1.9升级至1.13.5,保持兼容旧版本,同时为后续CDC做好铺垫。
+* 支持 Doris 、Hudi、Phoenix、Druid等Connector 插件,完整的插件支持你可以在这里找到  [plugins-supported-by-seatunnel]([https://github.com/apache/incubator-seatunnel#plugins-supported-by-seatunnel](https://github.com/apache/incubator-seatunnel#plugins-supported-by-seatunnel)).
+* 本地开发极速启动环境支持,你可以在使用example模块,不修改任何代码的前提下快速启动,方便开发者本地调试体验。
+* 支持通过 Docker 容器安装和试用 Apache SeaTunnel。
+* Sql 组件支持 SET语句,支持配置变量。
+* Config模块重构,减少贡献者理解成本,同时保证项目的代码合规(License)。
+* 项目结构重新调整,以适应新的Roadmap。
+* CI&CD的支持,代码质量自动化管控,(后续会有更多的计划来支持CI&CD开发)。
+ 
+## 【致谢】
+感谢以下参与贡献的同学(为GitHub ID,排名不分先后):
+ 
+Al-assad, BenJFan, CalvinKirs, JNSimba, JiangTChen, Rianico, TyrantLucifer, Yves-yuan, ZhangchengHu0923, agendazhang, an-shi-chi-fan, asdf2014, bigdataf, chaozwn, choucmei, dailidong, dongzl, felix-thinkingdata, fengyuceNv, garyelephant, kalencaya, kezhenxu94, legendtkl, leo65535, liujinhui1994, mans2singh, marklightning, mosence, nielifeng, ououtt, ruanwenjun, simon824, totalo, wntp, wolfboys, wuchunfu, xbkaishui, xtr1993, yx91490, zhangbutao, zhaomin1423, zhongjiajie, zhuangchong, zixi0825.
+ 
+同时也诚挚的感谢我们的Mentor:Zhenxu Ke,Willem Jiang, William Guo,LiDong Dai ,Ted Liu, Kevin,JB 在这个过程中给予的帮助
+## 未来几个版本的规划:
+* CDC的支持;
+* 监控体系的支持;
+* UI系统的支持;
+* 更多的 Connector 支持,以及更高效的Sink支持,如ClickHouse,很快会在下个版本跟大家见面。
+ 
+后续Feature是由社区共同决定的,我们也在这里呼吁大家一同参与社区后续建设。
+欢迎大家关注以及贡献:)
+ 
+## 社区发展
+### 【近期概况】
+自进入Apache孵化器以来,贡献者从13 人增长至 55 人,且持续保持上升趋势,平均周commits维持在20+,来自不同公司的三位贡献者(Lei Xie, HuaJie Wang,Chunfu Wu,)通过他们对社区的贡献被邀请成为Committer。我们举办了两场MeetUp,来自B站,OPPO、唯品会等企业讲师分享了SeaTunnel在他们在企业中的大规模生产落地实践(后续我们也会保持每月一次的meetup,欢迎各位使用SeaTunnel的用户或者贡献者分享SeaTunnel和你们的故事)。
+### 【Apache SeaTunnel(Incubating)的用户】
+Note:仅包含已登记用户
+Apache SeaTunnel(Incubating) 目前登记用户如下,如果您也在使用Apache SeaTunnel,欢迎在[Who is using SeaTunne](https://github.com/apache/incubator-seatunnel/issues/686)! 中登记!
+ <div align="center">
+
+<img src="/image/20220321/1.png"/>
+
+</div>
+
+ 
+## 【PPMC感言】
+Apache SeaTunnel(Incubating) PPMC LiFeng Nie在谈及首个Apache版本发布的时候说,从进入Apache Incubator的第一天,我们就一直在努力学习Apache Way以及各种Apache政策,第一个版本发布的过程花费了大量的时间(主要是合规性),但我们认为这种时间是值得花费的,这也是我们选择进入Apache的一个很重要的原因,我们需要让用户用得放心,而Apache无疑是最佳选择,其 License 近乎苛刻的检查会让用户尽可能地避免相关的合规性问题,保证软件合理合法的流通。另外,其践行Apache Way,例如公益使命、实用主义、社区胜于代码、公开透明与共识决策、任人唯贤等,可以帮助 SeaTunnel 社区更加开放、透明,向多元化方向发展。
  
\ No newline at end of file
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/2022-05-01_Kidswant.md b/i18n/zh-CN/docusaurus-plugin-content-blog/2022-05-01_Kidswant.md
new file mode 100644
index 000000000..47d2f73d7
--- /dev/null
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/2022-05-01_Kidswant.md
@@ -0,0 +1,224 @@
+# SeaTunnel 在孩子王的选型过程及应用改造实践
+
+
+![](/image/20220501/ch/0.png)
+
+
+
+在Apache SeaTunnel(Incubating) 4 月Meetup上,孩子王大数据专家、OLAP平台架构师 袁洪军 为我们带来了《Apache SeaTunnel (Incubating)在孩子王的应用实践》。
+
+
+本次演讲主要包含五个部分:
+
+- 孩子王引入Apache SeaTunnel (Incubating)的背景介绍
+
+- 大数据处理主流工具对比分析
+
+- Apache SeaTunnel (Incubating)的落地实践
+
+- Apache SeaTunnel (Incubating)改造中的常见问题
+
+- 对孩子王未来发展方向的预测展望
+
+
+![](/image/20220501/ch/0-1.png)
+
+
+袁洪军
+
+孩子王 大数据专家、OLAP 平台架构师。多年大数据平台研发管理经验,在数据资产、血缘图谱、数据治理、OLAP 等领域有着丰富的研究经验。
+
+
+## 01 背景介绍
+
+
+![](/image/20220501/ch/1.png)
+
+
+目前孩子王的OLAP平台主要包含元数据层、任务层、存储层、SQL层、调度层、服务层以及监控层七部分,本次分享主要关注任务层中的离线任务。
+
+
+其实孩子王内部有一套完整的采集推送系统,但由于一些历史遗留问题,公司现有的平台无法快速支持OLAP平台上线,因此当时公司只能选择放弃自身的平台,转而着手研发新的系统。
+
+
+当时摆在OLAP面前的有三个选择:
+
+
+1、给予采集推送系统做二次研发;
+
+2、完全自研;
+
+3、参与开源项目。
+
+
+## 02 大数据处理主流工具对比分析
+
+
+而这三项选择却各有优劣。若采基于采集推送做二次研发,其优点是有前人的经验,能够避免重复踩坑。但缺点是代码量大,研读时间、研读周期较长,而且抽象代码较少,与业务绑定的定制化功能较多,这也导致了其二开的难度较大。
+
+
+若完全自研,其优点第一是开发过程自主可控,第二是可以通过Spark等一些引擎做贴合我们自身的架构,但缺点是可能会遭遇一些未知的问题。
+
+
+最后如果使用开源框架,其优点一是抽象代码较多,二是经过其他大厂或公司的验证,框架在性能和稳定方面能够得到保障。因此孩子王在OLAP数据同步初期,我们主要研究了DATAX、Sqoop和SeaTunnel这三个开源数据同步工具。
+
+
+
+![](/image/20220501/ch/2.png)
+
+
+从脑图我们可以看到,Sqoop的主要功能是针对RDB的数据同步,其实现方式是基于MAP/REDUCE。Sqoop拥有丰富的参数和命令行可以去执行各种操作。Sqoop的优点在于它首先贴合Hadoop生态,并已经支持大部分RDB到HIVE任意源的转换,拥有完整的命令集和API的分布式数据同步工具。
+
+
+但其缺点是Sqoop只支持RDB的数据同步,并且对于数据文件有一定的限制,以及还没有数据清洗的概念。
+
+
+
+![](/image/20220501/ch/3.png)
+
+
+
+DataX的主要功能是任意源的数据同步,通过配置化文件+多线程的方式实现,主要分为三个流程:Reader、Framework和Writer,其中Framework主要起到通信和留空的作用。
+
+
+DataX的优点是它采用了插件式的开发,拥有自己的流控和数据管控,在社区活跃度上,DataX的官网上提供了许多不同源的数据推送。但DataX的缺点在于它基于内存,对数据量可能存在限制。
+
+
+
+![](/image/20220501/ch/4.png)
+
+
+
+Apache SeaTunnel (Incubating)做的也是任意源的数据同步,实现流程分为source、transform和sink三步,基于配置文件、Spark或Flink实现。其优点是目前官网2.1.0有非常多的插件和源的推送,基于插件式的思想也使其非常容易扩展,拥抱Spark和Flink的同时也做到了分布式的架构。要说Apache SeaTunnel (Incubating)唯一的缺点可能是目前缺少IP的调用,UI界面需要自己做管控。
+
+
+综上所述,Sqoop虽然是分布式,但是仅支持RDB和HIVE、Hbase之间的数据同步且扩展能力差,不利于二开。DataX扩展性好,整体性稳定,但由于是单机版,无法分布式集群部署,且数据抽取能力和机器性能有强依赖关系。而SeaTunnel和DataX类似并弥补了DataX非分布式的问题,对于实时流也做了很好的支持,虽然是新产品,但社区活跃度高。基于是否支持分布式、是否需要单独机器部署等诸多因素的考量,最后我们选择了SeaTunnel。
+
+
+## 03 Apache SeaTunnel (Incubating)的落地实践
+
+
+在Apache SeaTunnel (Incubating)的官网我们可以看到Apache SeaTunnel (Incubating)的基础流程包括source、transform和sink三部分。根据官网的指南,Apache SeaTunnel (Incubating)的启动需要配置脚本,但经过我们的研究发现,Apache SeaTunnel (Incubating)的最终执行是依赖config文件的spark-submit提交的一个Application应用。
+
+
+这种初始化方式虽然简单,但存在必须依赖Config文件的问题,每次运行任务后都会生成再进行清除,虽然可以在调度脚本中动态生成,但也产生了两个问题。1、频繁的磁盘操作是否有意义;2、是否存在更为高效的方式支持Apache SeaTunnel (Incubating)的运行。
+
+
+
+![](/image/20220501/ch/5.png)
+
+
+
+基于以上考量,在最终的设计方案中,我们增加了一个统一配置模板平台模块。调度时只需要发起一个提交命令,由Apache SeaTunnel (Incubating)自身去统一配置模板平台中拉取配置信息,再去装载和初始化参数。
+
+
+
+![](/image/20220501/ch/5-1.png)
+
+
+
+上图展示的便是孩子王OLAP的业务流程,主要分为三块。数据从Parquet,即Hive,通过Parquet表的方式到KYLIN和CK source的整体流程。
+
+
+
+![](/image/20220501/ch/7.png)
+
+
+
+这是我们建模的页面,主要通过拖拉拽的方式生成最终模型,每个表之间通过一些交易操作,右侧是针对Apache SeaTunnel (Incubating)的微处理。
+
+
+
+![](/image/20220501/ch/8.jpg)
+
+
+
+因此我们最终提交的命令如上,其中标红的首先是【-conf customconfig/jars】,指用户可以再统一配置模板平台进行处理,或者建模时单独指定。最后标红的【421 $start_time $end_time $taskType】Unicode,属于唯一编码。
+
+
+下方图左就是我们最终调度脚本提交的38个命令,下方图右是针对Apache SeaTunnel (Incubating)做的改造,可以看到一个较为特殊的名为WaterdropContext的工具类。可以首先判断Unicode是否存在,再通过Unicode_code来获取不同模板的配置信息,避免了config文件的操作。
+
+
+在最后的reportMeta则是用于在任务执行完成后上报一些信息,这也会在Apache SeaTunnel (Incubating)中完成。
+
+
+
+![](/image/20220501/ch/9.png)
+
+
+
+![](/image/20220501/ch/10.png)
+
+
+
+![](/image/20220501/ch/11.png)
+
+
+
+在最终完成的config文件如上,值得注意的是在transform方面,孩子王做了一些改造。首先是针对手机或者身份证号等做脱敏处理,如果用户指定字段,就按照字段做,如果不指定字段就扫描所有字段,然后根据模式匹配,进行脱敏加密。
+
+
+第二transform还支持自定义处理,如上文说道OLAP建模的时候说到。加入了HideStr,可以保留一串字符的前十个字段,加密后方的所有字符,在数据安全上有所保障。
+
+
+然后,在sink端,我们为了支持任务的幂等性,我们加入了pre_sql,这主要完成的任务是数据的删除,或分区的删除,因为任务在生产过程中不可能只运行一次,一旦出现重跑或补数等操作,就需要这一部分为数据的不同和正确性做考量。
+
+
+在图右方的一个Clickhouse的Sink端,这里我们加入了一个is_senseless_mode,它组成了一个读写分离的无感模式,用户在查询和补数的时候不感知整体区域,而是用到CK的分区转换,即名为MOVE PARTITION TO TABLE的命令进行操作的。
+
+
+
+此处特别说明KYLIN的Sink端,KYLIN是一个非常特殊的源,拥有自己一整套数据录入的逻辑,而且,他有自己的监控页面,因此我们给予KYLIN的改造只是简单地调用其API操作,在使用KYLIN时也只是简单的API调用和不断轮询的状态,所以KYLIN这块的资源在统一模板配置平台就被限制地很小。
+
+
+
+![](/image/20220501/ch/12.jpg)
+
+
+
+## 04 Apache SeaTunnel (Incubating)改造中的常见问题
+
+
+1、OOM&Too many Parts
+
+
+问题通常会出现在Hive到Hive的过程中,即使我们通过了自动资源的分配,但也存在数据突然间变大的情况,比如在举办了多次活动之后。这样的问题其实只能通过手动动态地调参,调整数据同步批量时间来避免。未来我们可能尽力去完成对于数据量的掌握,做到精细的控制。
+
+
+2、字段、类型不一致问题
+
+
+模型上线后,任务依赖的上游表或者字段,用户都会做一些修改,这些修改若无法感知,可能导致任务的失败。目前解决方法是依托血缘+快照的方式进行提前感知来避免错误。
+
+
+3、自定义数据源&自定义分隔符
+
+
+如财务部门需要单独使用的分割符,或是jar信息,现在用户可以自己在统一配置模板平台指定加载额外jar信息以及分割符信息。
+
+
+4、数据倾斜问题
+
+
+这可能因为用户自己设置了并行度,但无法做到尽善尽美。这一块我们暂时还没有完成处理,后续的思路可能在Source模块中添加post处理,对数据进行打散,完成倾斜。
+
+
+5、KYLIN全局字典锁问题
+
+
+随着业务发展,一个cube无法满足用户使用,就能需要建立多个cube,如果多个cube之间用了相同的字段,就会遇到KYLIN全局字典锁的问题。目前解决的思路是把两个或多个任务之间的调度时间进行隔开,如果无法隔开,可以做一个分布式锁的控制。KYLIN的sink端必须要拿到锁才能运行。
+
+
+05 对孩子王未来发展方向的预测展望
+
+
+- 多源数据同步,未来可能针对RDB源进行处理
+
+- 基于实时Flink的实现
+
+- 接管已有采集调度平台(主要解决分库分表的问题)
+
+- 数据质量校验,像一些空值、整个数据的空置率、主时间的判断等
+
+
+我的分享就到这里,希望以后可以和社区多多交流,共同进步,谢谢!
\ No newline at end of file
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/2022-05-10-ClickHouse.md b/i18n/zh-CN/docusaurus-plugin-content-blog/2022-05-10-ClickHouse.md
new file mode 100644
index 000000000..b4bdb7239
--- /dev/null
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/2022-05-10-ClickHouse.md
@@ -0,0 +1,163 @@
+# 百亿级数据同步,如何基于 SeaTunnel 的 ClickHouse 实现?
+
+
+![](/image/20220510/ch/0.jpg)
+
+
+
+作者 | Apache SeaTunnel(Incubating) Contributor 范佳
+
+整理 | 测试工程师 冯秀兰
+
+对于百亿级批数据的导入,传统的 JDBC 方式在一些海量数据同步场景下的表现并不尽如人意。为了提供更快的写入速度,Apache SeaTunnel(Incubating) 在刚刚发布的 2.1.1 版本中提供了 ClickhouseFile-Connector 的支持,以实现 Bulk load 数据写入。
+
+Bulk load 指把海量数据同步到目标 DB 中,目前 SeaTunnel 已实现数据同步到 ClickHouse 中。
+
+在 Apache SeaTunnel(Incubating) 4 月 Meetup 上,Apache SeaTunnel(Incubating) Contributor 范佳分享了《基于 SeaTunnel 的 ClickHouse bulk load 实现》,详细讲解了 ClickHouseFile 高效处理海量数据的具体实现原理和流程。
+
+感谢本文整理志愿者 测试工程师 冯秀兰 对 Apache SeaTunnel(Incubating) 项目的支持!
+
+本次演讲主要包含七个部分:
+
+- ClickHouse Sink 现状
+    
+- ClickHouse Sink 弱场景
+    
+- ClickHouseFile 插件介绍
+    
+- ClickHouseFile 核心技术点
+    
+- ClickHouseFile 插件的实现解析
+    
+- 插件能力对比
+    
+- 后期优化方向
+   
+
+
+![](/image/20220510/ch/0-1.png)
+
+
+
+​范 佳白鲸开源 高级工程师
+
+# 01 ClickHouse Sink 现状
+
+现阶段,SeaTunnel 把数据同步到 ClickHouse 的流程是:只要是 SeaTunnel 支持的数据源,都可以把数据抽取出来,抽取出来之后,经过转换(也可以不转换),直接把源数据写入 ClickHouse sink connector 中,再通过 JDBC 写入到 ClickHouse 的服务器中。
+
+
+![](/image/20220510/ch/1.png)
+
+
+但是,通过传统的 JDBC 写入到 ClickHouse 服务器中会存在一些问题。
+
+首先,现阶段使用的工具是 ClickHouse 提供的驱动,实现方式是通过 HTTP,然而 HTTP 在某些场景下,实现效率不高。其次是海量数据,如果有重复数据或者一次性写入大量数据,使用传统的方式是生成对应的插入语句,通过 HTTP 发送到 ClickHouse 服务器端,在服务器端来进行逐条或分批次解析、执行,无法实现数据压缩。
+
+最后就是我们通常会遇到的问题,数据量过大可能导致 SeaTunnel 端 OOM,或者服务器端因为写入数据量过大,频率过高,导致服务器端挂掉。
+
+于是我们思考,是否有比 HTTP 更快的发送方式?如果可以在 SeaTunnel 端做数据预处理或数据压缩,那么网络带宽压力会降低,传输速率也会提高。
+
+# 02 ClickHouse Sink 的弱场景
+
+如果使用 HTTP 传输协议,当数据量过大,批处理以微批的形式发送请求,HTTP 可能处理不过来;
+
+太多的 insert 请求,服务器压力大。假设带宽可以承受大量的请求,但服务器端不一定能承载。线上的服务器不仅需要数据插入,更重要的是查询数据为其他业务团队使用。若因为插入数据过多导致服务器集群宕机,是得不偿失的。
+
+# 03 ClickHouse File 核心技术点
+
+针对这些 ClickHouse 的弱场景,我们想,有没有一种方式,既能在 Spark 端就能完成数据压缩,还可以在数据写入时不增加 Server 的资源负载,并且能快速写入海量数据?于是我们开发了 ClickHouseFile 插件来满足这些需求。
+
+ClickHouseFile 插件的关键技术是 ClickHouse -local。ClickHouse-local 模式可以让用户能够对本地文件执行快速处理,而无需部署和配置 ClickHouse 服务器。ClickHouse-local 使用与 ClickHouse Server 相同的核心,因此它支持大多数功能以及相同的格式和表引擎。
+
+因为有这 2 个特点,这意味着用户可以直接处理本地文件,而无需在 ClickHouse 服务器端做处理。由于是相同的格式,我们在远端或者 SeaTunnel 端进行的操作所产生的数据和服务器端是无缝兼容的,可以使用 ClickHouse local 来进行数据写入。ClickHouse local 是实现 ClickHouseFile 的核心技术点,因为有了这个插件,现阶段才能够实现 ClickHouse file 连接器。
+
+ClickHouse local 核心使用方式:
+
+
+![](/image/20220510/ch/2.png)
+
+
+第一行:将数据通过 Linux 管道传递给 ClickHouse-local 程序的 test_table 表。
+
+第二至五行:创建一个 result_table 表用于接收数据。
+
+第六行:将数据从 test\_table 到 result\_table 表。
+
+第七行:定义数据处理的磁盘路径。
+
+通过调用 Clickhouse-local 组件,实现在 Apache SeaTunnel(Incubating) 端完成数据文件的生成,以及数据压缩等一系列操作。再通过和 Server 进行通信,将生成的数据直接发送到 Clickhouse 的不同节点,再将数据文件提供给节点查询。
+
+原阶段和现阶段实现方式对比:
+
+
+![](/image/20220510/ch/3.png)
+
+
+原来是 Spark 把数据包括 insert 语句,发送给服务器端,服务器端做 SQL 的解析,表的数据文件生成、压缩,生成对应的文件、建立对应索引。若使用 ClickHouse local 技术,则由 SeaTunnel 端做数据文件的生成、文件压缩,以及索引的创建,最终产出就是给服务器端使用的文件或文件夹,同步给服务器后,服务器就只需对数据查询,不需要做额外的操作。
+
+# 04 核心技术点
+
+
+![](/image/20220510/ch/4.png)
+
+
+以上流程可以促使数据同步更加高效,得益于我们对其中的三点优化。
+
+第一,数据实际上师从管道传输到 ClickHouseFile,在长度和内存上会有限制。为此,我们将 ClickHouse connector,也就是 sink 端收到的数据通过 MMAP 技术写入临时文件,再由 ClickHouse local 读取临时文件的数据,生成我们的目标 local file,以达到增量读取数据的效果,解决 OM 的问题。
+
+
+![](/image/20220510/ch/5.png)
+
+
+第二,支持分片。因为如果在集群中使用,如果只生成一个文件或文件夹,实际上文件只分发到一个节点上,会大大降低查询的性能。因此,我们进行了分片支持,用户可以在配置文件夹中设置分片的 key,算法会将数据分为多个 log file,写入到不同的集群节点中,大幅提升读取性能。
+
+
+![](/image/20220510/ch/6.png)
+
+
+第三个重要的优化是文件传输,目前 SeaTunnel 支持两种文件传输方式,一种是 SCP,其特点是安全、通用、无需额外配置;另一种是 RSYNC,其有点事快速高效,支持断点续传,但需要额外配置,用户可以根据需要选择适合自己的方式。
+
+# 05 插件实现解析
+
+概括而言,ClickHouseFile 的总体实现流程如下:
+
+
+![](/image/20220510/ch/7.png)
+
+
+- 缓存数据,缓存到 ClickHouse sink 端;
+    
+- 调用本地的 ClickHouse-local 生成文件;
+    
+- 将数据发送到 ClickHouse 服务端;
+    
+- 执行 ATTACH 命令
+    
+
+通过以上四个步骤,生成的数据达到可查询的状态。
+
+# 06 插件能力对比
+
+
+![](/image/20220510/ch/8.png)
+
+
+从数据传输角度来说,ClickHouseFile 更适用于海量数据,优势在于不需要额外的配置,通用性强,而 ClickHouseFile 配置比较复杂,目前支持的 engine 较少;
+
+就环境复杂度来说,ClickHouse 更适合环境复杂度高的情况,不需要额外配置就能直接运行;
+
+在通用性上,ClickHouse 由于是 SeaTunnel 官方支持的 JDBC diver,基本上支持所有的 engine 的数据写入,ClickHouseFile 支持的 engine 相对较少;从服务器压力方面来说,ClickHouseFile 的优势在海量数据传输时就体现出来了,不会对服务器造成太大的压力。
+
+但这二者并不是竞争关系,需要根据使用场景来选择。
+
+# 07 后续计划
+
+目前虽然 SeaTunnel 支持 ClickHouseFile 插件,但是还有很多地方需要优化,主要包括:
+
+- Rsync 支持;
+    
+- Exactly-Once 支持;
+    
+- 支持 Zero Copy 传输数据文件;
+    
+- 更多 Engine 的支持  
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/2022-05-31-engine.md b/i18n/zh-CN/docusaurus-plugin-content-blog/2022-05-31-engine.md
new file mode 100644
index 000000000..a2e401af6
--- /dev/null
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/2022-05-31-engine.md
@@ -0,0 +1,203 @@
+# Apache SeaTunnel 与计算引擎的解耦之道,重构API我们做了些什么
+
+
+![](/image/20220531/ch/0.jpg)
+
+
+
+Apache SeaTunnel (Incubating) 与 Apache Inlong (Incubating) 的5月联合Meetup中,第二位分享的嘉宾是来自白鲸开源的高级工程师李宗文。在使用Apache SeaTunnel (Incubating) 的过程中,他发现了 Apache SeaTunnel (Incubating) 存在的四大问题:Connector实现次数多、参数不统一、难以支持多个版本的引擎以及引擎升级难的问题。为了解决以上的难题,李宗文将目标放在将Apache SeaTunnel (Incubating)与计算引擎进行解耦,重构其中Source与Sink API,实现改良了开发体验。
+
+本次演讲主要包含四个部分:
+
+1.  Apache SeaTunnel (Incubating)**重构的背景和动机**
+    
+2.  Apache SeaTunnel (Incubating)**重构的目标**
+    
+3.  Apache SeaTunnel (Incubating)**重构整体的设计**
+    
+4.  Apache SeaTunnel (Incubating) **Source API的设计**
+    
+5.  Apache SeaTunnel (Incubating) **Sink API的设计**
+    
+
+
+![](/image/20220531/ch/1.jpg)
+
+
+
+**李宗文**
+
+白鲸开源 高级工程师
+
+Apache SeaTunnel(Incubating)
+
+& Flink Contributor, Flink CDC & Debezium Contributor
+
+## **01** 重构的背景与动机
+
+### 01 Apache SeaTunnel(Incubating)与引擎耦合
+
+用过Apache SeaTunnel (Incubating) 的小伙伴或者开发者应该知道,目前Apache SeaTunnel (Incubating) 与引擎完全耦合,完全基于Spark、Flink开发,其中的配置文件参数都基于Flink、Spark引擎。从贡献者和用户的角度出发,我们能发现一些问题。
+
+**从贡献者的角度**:反复实现Connector,没有收获感;潜在贡献者由于引擎版本不一致无法贡献社区;
+
+**从用户的角度**:目前很多公司采用Lambda架构,离线作业使用Spark,实时作业使用Flink, 使用中就会发现SeaTunnel 的Connector可能Spark有,但是Flink没有,以及两个引擎对于同一存储引擎的Connector的参数也不统一,有较高的使用成本,脱离了SeaTunnel简单易用的初衷;还有用户提问说目前支不支持Flink的1.14版本,按照目前SeaTunnel的架构,想要支持Flink的1.14就必须抛弃之前的版本,因此这也会对之前版本的用户造成很大的问题。
+
+因此,我们不管是做引擎升级或者支持更多的版本的用户都很困难。
+
+另外Spark和Flink都采用了Chandy-lamport算法实现的Checkpoint容错机制,也在内部进行了DataSet与DataStream的统一,以此为前提我们认为解耦是可行的。
+
+## **02** Apache SeaTunnel(Incubating)与引擎解耦
+
+因此为了解决以上提出的问题,我们有了以下的目标:
+
+1.  **Connector只实现一次**:针对参数不统一、Connector多次实现的问题,我们希望实现一个统一的Source 与Sink API;
+    
+2.  **支持多个版本的Spark与Flink引擎**:在Source与Sink API上再加入翻译层去支持多个版本与Spark和Flink引擎,解耦后这个代价会小很多。
+    
+3.  **明确Source的分片并行逻辑和Sink的提交逻辑**:我们必须提供一个良好的API去支持Connector开发;
+    
+4.  **支持实时场景下的数据库整库同步**:这个是目前很多用户提到**需要CDC**支持衍生的需求。我之前参与过Flink CDC社区,当时有许多用户提出在CDC的场景中,如果直接使用Flink CDC的话会导致每一个表都持有一个链接,当遇到需要整库同步需求时,千张表就有千个链接,该情况无论是对于数据库还是DBA都是不能接受的,如果要解决这个问题,最简单的方式就是引入Canal、Debezium等组件,使用其拉取增量数据到Kafka等MQ做中间存储,再使用Flink SQL进行同步,这实际已经违背了Flink CDC最早减少链路的想法,但是Flink CDC的定位只是一个Connector,无法做全链路的需求,所以该proposal在Flink CDC社区中没有被提出,我们借着本次重构,将proposa提交到了SeaTunnel社区中。
+    
+5.  **支持元信息的自动发现与存储**:这一部分用户应该有所体验,如Kafka这类存储引擎,没有记录数据结构的功能,但我们在读取数据时又必须是结构化的,导致每次读取一个topic之前,用户都必须定义topic的结构化数据类型,我们希望做到用户只需要完成一次配置,减少重复的操作。
+    
+
+可能也有同学有疑惑为什么我们不直接使用Apache Beam,Beam的Source分为BOUNDED与UNBOUNDED,也就是需要实现两遍,并且有些Source与Sink的特性也不支持,具体所需的特性在后面会提到;
+
+## **03** Apache SeaTunnel(Incubating)重构整体的设计
+
+
+![](/image/20220531/ch/1.jpg)
+
+
+Apache SeaTunnel(Incubating) API总体结构的设计如上图;
+
+**Source & Sink API**:数据集成的核心API之一,明确Source的分片并行逻辑和Sink的提交逻辑,用于实现Connector;
+
+**Engine API**:
+
+Translation: 翻译层,用于将SeaTunnel的Souce与Sink API翻译成引擎内部可以运行的Connector;
+
+**Execution**:执行逻辑,用于定义Source、Transform、Sink等操作在引擎内部的执行逻辑;
+
+**Table API**:
+
+**Table SPI**:主要用于以SPI的方式暴露Source与Sink接口,并明确Connector的必填与可选参数等;
+
+**DataType**:SeaTunnel的数据结构,用于隔离引擎,声明Table Schema等;
+
+**Catalog**:用于获取Table Scheme、Options等;
+
+**Catalog Storage**: 用于存储用户定义Kafka等非结构化引擎的Table Scheme等;
+
+
+![](/image/20220531/ch/2.jpg)
+
+
+**从上图是我们现在设想的执行流程**:
+
+1.  从配置文件或UI等方式获取任务参数;
+    
+2.  通过参数从Catalog中解析得到Table Schema、Option等信息;
+    
+3.  以SPI方式拉起SeaTunnel的Connector,并注入Table信息等;
+    
+4.  将SeaTunnel的Connector翻译为引擎内部的Connector;
+    
+5.  执行引擎的作业逻辑,图中的多表分发目前只存在CDC整库同步场景下,其他Connector都是单表,不需要分发逻辑;
+    
+
+从以上可以看出,最难的部分是**如何将Apache SeaTunnel(Incubating) 的Source和Sink翻译成引擎内部的Source和Sink。**
+
+当下许多用户不仅把Apache SeaTunnel (Incubating) 当做一个数据集成方向的工具,也当做数仓方向的工具,会使用很多Spark和Flink的SQL,我们目前希望能够保留这样的SQL能力,让用户实现无缝升级。
+
+
+![](/image/20220531/ch/3.jpg)
+
+
+根据我们的调研,如上图,是对Source与Sink的理想执行逻辑,由于SeaTunnel以WaterDrop孵化,所以图上的术语偏向Spark;
+
+理想情况下,在Driver上可以运行Source和Sink的协调器,然后Worker上运行Source的Reader和Sink的Writer。在Source协调器方面,我们希望它能支持几个能力。
+
+**一、是数据的分片逻辑**,可以将分片动态添加到Reader中。
+
+**二、是可以支持Reader的协调**。SourceReader用于读取数据,然后将数据发送到引擎中流转,最终流转到Source Writer中进行数据写入,同时Writer可以支持二阶段事务提交,并由Sink的协调器支持Iceberg等Connector的聚合提交需求;
+
+## **04** Source API
+
+通过我们的调研,发现Source所需要的以下特性:
+
+1.  **统一离线和实时API**:Source只实现一次,同时支持离线和实时;
+    
+2.  **能够支持并行读取**:比如Kafka每一个分区都生成一个的读取器,并行的执行;
+    
+3.  **支持动态添加分片**:比如Kafka定于一个topic正则,由于业务量的需求,需要新增一个topic,该Source API可以支持我们动态添加到作业中。
+    
+4.  **支持协调读取器的工作**:这个目前只发现在CDC这种Connector需要支持。CDC目前都是基于Netfilx的DBlog并行算法去支持,该情况在全量同步和增量同步两个阶段的切换时需要协调读取器的工作。
+    
+5.  **支持单个读取器处理多张表**:即由前面提到的支持实时场景下的数据库整库同步需求;
+    
+
+
+![](/image/20220531/ch/4.jpg)
+
+
+对应以上需求,我们做出了基础的API,如上图,目前代码以提交到Apache SeaTunnel(Incubating)的社区中api-draft分支,感兴趣的可以查看代码详细了解。
+
+### **如何适配Spark和Flink引擎**
+
+Flink与Spark都在后面统一了DataSet与DataStream API,即能够支持前两个特性,那么对于剩下的3个特性:
+
+-   如何支持动态添加分片?
+    
+-   如何支持协调读取器?
+    
+-   如何支持单个读取器处理多张表?
+    
+
+带着问题,进入目前的设计。
+
+
+![](/image/20220531/ch/5.jpg)
+
+
+我们发现除了**CDC**之外,其他Connector是不需要协调器的,针对不需要协调器的,我们会有一个支持并行的Source,并进行引擎翻译。
+
+如上图中左边是一个**分片的enumerator**,可以列举source需要哪些分片,有哪些分片,实时进行分片的枚举,随后将每个分片分发到真正的数据读取模块SourceReader中。**对于离线与实时作业的区分使用Boundedness标记**,Connector可以在分片中标记是否有停止的Offset,如Kafka可以支持实时,同时也可以支持离线。ParallelSource可以在引擎设置任意并行度,以支持并行读取。
+
+
+![](/image/20220531/ch/6.jpg)
+
+
+在需要协调器的场景,如上图,需要在Reader和Enumerator之间进行Event传输,** Enumerator**通过Reader发送的Event进行协调工作。**Coordinated Source**需要在引擎层面保证单并行度,以保证数据的一致性;当然这也不能良好的使用引擎的内存管理机制,但是取舍是必要的;
+
+**对于最后一个问题,我们如何支持单个读取器处理多张表。这会涉及到Table API层**,通过Catalog读取到了所有需要的表后,有些表可能属于一个作业,可以通过一个链接去读取,有些可能需要分开,这个依赖于Source是怎么实现的。基于这是一个特殊需求,我们想要减少普通开发者的难度,在Table API这一层,我们会提供一个SupportMultipleTable接口,用于声明Source支持多表的读取。Source在实现时,要根据多张表实现对应的反序列化器。针对衍生的多表数据如何分离,Flink将采用Side Output机制,Spark预想使用Filter或Partition机制。
+
+## **05** Sink API
+
+目前Sink所需的特性并不是很多,**经过调研目前发现有三个需求**:
+
+1.  幂等写入,这个不需要写代码,主要看存储引擎是否能支持。
+    
+2.  分布式事务,主流是二阶段提交,如Kafka都是可以支持分布式事务的。
+    
+3.  聚合提交,对于Iceberg、hoodie等存储引擎而言,我们不希望有小文件问题,于是期望将这些文件聚合成一个文件,再进行提交。
+    
+
+基于以上三个需求,我们有对应的**三个API**,分别是**SinkWriter、SinkCommitter、SinkAggregated Committer**。SinkWriter是作为基础写入,可能是幂等写入,也可能不是。SinkCommitter支持二阶段提交。SinkAggregatedCommitter支持聚合提交。
+
+
+![](/image/20220531/ch/7.jpg)
+
+
+理想状态下,**AggregatedCommitter**单并行的在Driver中运行,Writer与Committer运行在Worker中,可能有多个并行度,每个并行度都有自己的预提交工作,然后把自己提交的信息发送给Aggregated Committer再进行聚合。
+
+**目前Spark和Flink的高版本都支持在Driver**(Job Manager)运行AggregatedCommitter,worker(Job Manager)运行writer和Committer。
+
+
+![](/image/20220531/ch/8.jpg)
+
+
+但是对于**Flink低版本**,无法支持AggregatedCommitter在JM中运行,我们也进行翻译适配的设计。Writer与Committer会作为前置的算子,使用Flink的ProcessFunction进行包裹,支持并发的预提交与写入工作,基于Flink的Checkpoint机制实现二阶段提交,这也是目前Flink的很多Connector的2PC实现方式。这个ProcessFunction会将预提交信息发送到下游的Aggregated Committer中,Aggregated Committer可以采用SinkFunction或Process Function等算子包裹,当然,**我们需要保证AggregatedCommitter只会启动一个,即单并行度**,否则聚合提交的逻辑就会出现问题。
+
+感谢各位的观看,如果大家对具体实现感兴趣,可以去 Apache SeaTunnel (Incubating) 的社区查看**api-draft**分支代码,谢谢大家。
diff --git a/static/image/20220501/ch/0-1.png b/static/image/20220501/ch/0-1.png
new file mode 100644
index 000000000..e5e44c8f5
Binary files /dev/null and b/static/image/20220501/ch/0-1.png differ
diff --git a/static/image/20220501/ch/0.png b/static/image/20220501/ch/0.png
new file mode 100644
index 000000000..78c6215de
Binary files /dev/null and b/static/image/20220501/ch/0.png differ
diff --git a/static/image/20220501/ch/1.png b/static/image/20220501/ch/1.png
new file mode 100644
index 000000000..df2b1d0da
Binary files /dev/null and b/static/image/20220501/ch/1.png differ
diff --git a/static/image/20220501/ch/10.png b/static/image/20220501/ch/10.png
new file mode 100644
index 000000000..b1f1d2bc7
Binary files /dev/null and b/static/image/20220501/ch/10.png differ
diff --git a/static/image/20220501/ch/11.png b/static/image/20220501/ch/11.png
new file mode 100644
index 000000000..3d3122e1e
Binary files /dev/null and b/static/image/20220501/ch/11.png differ
diff --git a/static/image/20220501/ch/12.jpg b/static/image/20220501/ch/12.jpg
new file mode 100644
index 000000000..efe2c1936
Binary files /dev/null and b/static/image/20220501/ch/12.jpg differ
diff --git a/static/image/20220501/ch/2.png b/static/image/20220501/ch/2.png
new file mode 100644
index 000000000..f4ac1cec1
Binary files /dev/null and b/static/image/20220501/ch/2.png differ
diff --git a/static/image/20220501/ch/3.png b/static/image/20220501/ch/3.png
new file mode 100644
index 000000000..9ee690f3f
Binary files /dev/null and b/static/image/20220501/ch/3.png differ
diff --git a/static/image/20220501/ch/4.png b/static/image/20220501/ch/4.png
new file mode 100644
index 000000000..761272da1
Binary files /dev/null and b/static/image/20220501/ch/4.png differ
diff --git a/static/image/20220501/ch/5-1.png b/static/image/20220501/ch/5-1.png
new file mode 100644
index 000000000..f331443cd
Binary files /dev/null and b/static/image/20220501/ch/5-1.png differ
diff --git a/static/image/20220501/ch/5.png b/static/image/20220501/ch/5.png
new file mode 100644
index 000000000..0cd63cdc1
Binary files /dev/null and b/static/image/20220501/ch/5.png differ
diff --git a/static/image/20220501/ch/7.png b/static/image/20220501/ch/7.png
new file mode 100644
index 000000000..24e2fb33c
Binary files /dev/null and b/static/image/20220501/ch/7.png differ
diff --git a/static/image/20220501/ch/8.jpg b/static/image/20220501/ch/8.jpg
new file mode 100644
index 000000000..68bb1a5fa
Binary files /dev/null and b/static/image/20220501/ch/8.jpg differ
diff --git a/static/image/20220501/ch/9.png b/static/image/20220501/ch/9.png
new file mode 100644
index 000000000..d13b1d7e9
Binary files /dev/null and b/static/image/20220501/ch/9.png differ
diff --git a/static/image/20220501/en/0-1.png b/static/image/20220501/en/0-1.png
new file mode 100644
index 000000000..e5e44c8f5
Binary files /dev/null and b/static/image/20220501/en/0-1.png differ
diff --git a/static/image/20220501/en/0.png b/static/image/20220501/en/0.png
new file mode 100644
index 000000000..78c6215de
Binary files /dev/null and b/static/image/20220501/en/0.png differ
diff --git a/static/image/20220501/en/1.png b/static/image/20220501/en/1.png
new file mode 100644
index 000000000..22ee5c35d
Binary files /dev/null and b/static/image/20220501/en/1.png differ
diff --git a/static/image/20220501/en/10.png b/static/image/20220501/en/10.png
new file mode 100644
index 000000000..fc2f22e3a
Binary files /dev/null and b/static/image/20220501/en/10.png differ
diff --git a/static/image/20220501/en/11.png b/static/image/20220501/en/11.png
new file mode 100644
index 000000000..7c0a9a557
Binary files /dev/null and b/static/image/20220501/en/11.png differ
diff --git a/static/image/20220501/en/12.png b/static/image/20220501/en/12.png
new file mode 100644
index 000000000..915fcd23d
Binary files /dev/null and b/static/image/20220501/en/12.png differ
diff --git a/static/image/20220501/en/2.png b/static/image/20220501/en/2.png
new file mode 100644
index 000000000..aaa69a7e6
Binary files /dev/null and b/static/image/20220501/en/2.png differ
diff --git a/static/image/20220501/en/3.png b/static/image/20220501/en/3.png
new file mode 100644
index 000000000..973201039
Binary files /dev/null and b/static/image/20220501/en/3.png differ
diff --git a/static/image/20220501/en/4.png b/static/image/20220501/en/4.png
new file mode 100644
index 000000000..6cf165e15
Binary files /dev/null and b/static/image/20220501/en/4.png differ
diff --git a/static/image/20220501/en/5.png b/static/image/20220501/en/5.png
new file mode 100644
index 000000000..b6d6a1b1f
Binary files /dev/null and b/static/image/20220501/en/5.png differ
diff --git a/static/image/20220501/en/6.png b/static/image/20220501/en/6.png
new file mode 100644
index 000000000..7c2b25d5e
Binary files /dev/null and b/static/image/20220501/en/6.png differ
diff --git a/static/image/20220501/en/7.png b/static/image/20220501/en/7.png
new file mode 100644
index 000000000..13e41a6ef
Binary files /dev/null and b/static/image/20220501/en/7.png differ
diff --git a/static/image/20220501/en/8.png b/static/image/20220501/en/8.png
new file mode 100644
index 000000000..71c28edc4
Binary files /dev/null and b/static/image/20220501/en/8.png differ
diff --git a/static/image/20220501/en/9.png b/static/image/20220501/en/9.png
new file mode 100644
index 000000000..b32b9a8bd
Binary files /dev/null and b/static/image/20220501/en/9.png differ
diff --git a/static/image/20220510/ch/0-1.png b/static/image/20220510/ch/0-1.png
new file mode 100644
index 000000000..16d907493
Binary files /dev/null and b/static/image/20220510/ch/0-1.png differ
diff --git a/static/image/20220510/ch/0.jpg b/static/image/20220510/ch/0.jpg
new file mode 100644
index 000000000..1ce771ea6
Binary files /dev/null and b/static/image/20220510/ch/0.jpg differ
diff --git a/static/image/20220510/ch/1.png b/static/image/20220510/ch/1.png
new file mode 100644
index 000000000..558a848f0
Binary files /dev/null and b/static/image/20220510/ch/1.png differ
diff --git a/static/image/20220510/ch/2.png b/static/image/20220510/ch/2.png
new file mode 100644
index 000000000..93c670b27
Binary files /dev/null and b/static/image/20220510/ch/2.png differ
diff --git a/static/image/20220510/ch/3.png b/static/image/20220510/ch/3.png
new file mode 100644
index 000000000..51ab3ae4f
Binary files /dev/null and b/static/image/20220510/ch/3.png differ
diff --git a/static/image/20220510/ch/4.png b/static/image/20220510/ch/4.png
new file mode 100644
index 000000000..b26d4a6e4
Binary files /dev/null and b/static/image/20220510/ch/4.png differ
diff --git a/static/image/20220510/ch/5.png b/static/image/20220510/ch/5.png
new file mode 100644
index 000000000..11155492f
Binary files /dev/null and b/static/image/20220510/ch/5.png differ
diff --git a/static/image/20220510/ch/6.png b/static/image/20220510/ch/6.png
new file mode 100644
index 000000000..656866555
Binary files /dev/null and b/static/image/20220510/ch/6.png differ
diff --git a/static/image/20220510/ch/7.png b/static/image/20220510/ch/7.png
new file mode 100644
index 000000000..6e6cda2c5
Binary files /dev/null and b/static/image/20220510/ch/7.png differ
diff --git a/static/image/20220510/ch/8.png b/static/image/20220510/ch/8.png
new file mode 100644
index 000000000..dd352365a
Binary files /dev/null and b/static/image/20220510/ch/8.png differ
diff --git a/static/image/20220510/en/0-1.png b/static/image/20220510/en/0-1.png
new file mode 100644
index 000000000..16d907493
Binary files /dev/null and b/static/image/20220510/en/0-1.png differ
diff --git a/static/image/20220510/en/0.jpg b/static/image/20220510/en/0.jpg
new file mode 100644
index 000000000..1ce771ea6
Binary files /dev/null and b/static/image/20220510/en/0.jpg differ
diff --git a/static/image/20220510/en/1.png b/static/image/20220510/en/1.png
new file mode 100644
index 000000000..558a848f0
Binary files /dev/null and b/static/image/20220510/en/1.png differ
diff --git a/static/image/20220510/en/2.png b/static/image/20220510/en/2.png
new file mode 100644
index 000000000..93c670b27
Binary files /dev/null and b/static/image/20220510/en/2.png differ
diff --git a/static/image/20220510/en/3.png b/static/image/20220510/en/3.png
new file mode 100644
index 000000000..51ab3ae4f
Binary files /dev/null and b/static/image/20220510/en/3.png differ
diff --git a/static/image/20220510/en/4.png b/static/image/20220510/en/4.png
new file mode 100644
index 000000000..b26d4a6e4
Binary files /dev/null and b/static/image/20220510/en/4.png differ
diff --git a/static/image/20220510/en/5.png b/static/image/20220510/en/5.png
new file mode 100644
index 000000000..11155492f
Binary files /dev/null and b/static/image/20220510/en/5.png differ
diff --git a/static/image/20220510/en/6.png b/static/image/20220510/en/6.png
new file mode 100644
index 000000000..656866555
Binary files /dev/null and b/static/image/20220510/en/6.png differ
diff --git a/static/image/20220510/en/7.png b/static/image/20220510/en/7.png
new file mode 100644
index 000000000..6e6cda2c5
Binary files /dev/null and b/static/image/20220510/en/7.png differ
diff --git a/static/image/20220510/en/8.png b/static/image/20220510/en/8.png
new file mode 100644
index 000000000..dd352365a
Binary files /dev/null and b/static/image/20220510/en/8.png differ
diff --git a/static/image/20220531/ch/0.jpg b/static/image/20220531/ch/0.jpg
new file mode 100644
index 000000000..eb56c583e
Binary files /dev/null and b/static/image/20220531/ch/0.jpg differ
diff --git a/static/image/20220531/ch/1.jpg b/static/image/20220531/ch/1.jpg
new file mode 100644
index 000000000..e1369cd4c
Binary files /dev/null and b/static/image/20220531/ch/1.jpg differ
diff --git a/static/image/20220531/ch/2.jpg b/static/image/20220531/ch/2.jpg
new file mode 100644
index 000000000..0acb265a8
Binary files /dev/null and b/static/image/20220531/ch/2.jpg differ
diff --git a/static/image/20220531/ch/3.jpg b/static/image/20220531/ch/3.jpg
new file mode 100644
index 000000000..6652eec42
Binary files /dev/null and b/static/image/20220531/ch/3.jpg differ
diff --git a/static/image/20220531/ch/4.jpg b/static/image/20220531/ch/4.jpg
new file mode 100644
index 000000000..a62e89db9
Binary files /dev/null and b/static/image/20220531/ch/4.jpg differ
diff --git a/static/image/20220531/ch/5.jpg b/static/image/20220531/ch/5.jpg
new file mode 100644
index 000000000..73f0a8c73
Binary files /dev/null and b/static/image/20220531/ch/5.jpg differ
diff --git a/static/image/20220531/ch/6.jpg b/static/image/20220531/ch/6.jpg
new file mode 100644
index 000000000..83a54c97e
Binary files /dev/null and b/static/image/20220531/ch/6.jpg differ
diff --git a/static/image/20220531/ch/7.jpg b/static/image/20220531/ch/7.jpg
new file mode 100644
index 000000000..1755a3840
Binary files /dev/null and b/static/image/20220531/ch/7.jpg differ
diff --git a/static/image/20220531/ch/8.jpg b/static/image/20220531/ch/8.jpg
new file mode 100644
index 000000000..9f43371bc
Binary files /dev/null and b/static/image/20220531/ch/8.jpg differ
diff --git a/static/image/20220531/en/0-1.png b/static/image/20220531/en/0-1.png
new file mode 100644
index 000000000..5036d28f1
Binary files /dev/null and b/static/image/20220531/en/0-1.png differ
diff --git a/static/image/20220531/en/0.jpg b/static/image/20220531/en/0.jpg
new file mode 100644
index 000000000..eb56c583e
Binary files /dev/null and b/static/image/20220531/en/0.jpg differ
diff --git a/static/image/20220531/en/1.jpg b/static/image/20220531/en/1.jpg
new file mode 100644
index 000000000..e1369cd4c
Binary files /dev/null and b/static/image/20220531/en/1.jpg differ
diff --git a/static/image/20220531/en/2.jpg b/static/image/20220531/en/2.jpg
new file mode 100644
index 000000000..0acb265a8
Binary files /dev/null and b/static/image/20220531/en/2.jpg differ
diff --git a/static/image/20220531/en/3.jpg b/static/image/20220531/en/3.jpg
new file mode 100644
index 000000000..6652eec42
Binary files /dev/null and b/static/image/20220531/en/3.jpg differ
diff --git a/static/image/20220531/en/4.jpg b/static/image/20220531/en/4.jpg
new file mode 100644
index 000000000..a62e89db9
Binary files /dev/null and b/static/image/20220531/en/4.jpg differ
diff --git a/static/image/20220531/en/5.jpg b/static/image/20220531/en/5.jpg
new file mode 100644
index 000000000..73f0a8c73
Binary files /dev/null and b/static/image/20220531/en/5.jpg differ
diff --git a/static/image/20220531/en/6.jpg b/static/image/20220531/en/6.jpg
new file mode 100644
index 000000000..83a54c97e
Binary files /dev/null and b/static/image/20220531/en/6.jpg differ
diff --git a/static/image/20220531/en/7.jpg b/static/image/20220531/en/7.jpg
new file mode 100644
index 000000000..1755a3840
Binary files /dev/null and b/static/image/20220531/en/7.jpg differ
diff --git a/static/image/20220531/en/8.jpg b/static/image/20220531/en/8.jpg
new file mode 100644
index 000000000..9f43371bc
Binary files /dev/null and b/static/image/20220531/en/8.jpg differ