You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by ty...@apache.org on 2022/09/26 09:25:28 UTC

[incubator-seatunnel] branch dev updated: [Improve][DOC] Perfect the connector v2 doc (#2800)

This is an automated email from the ASF dual-hosted git repository.

tyrantlucifer pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/incubator-seatunnel.git


The following commit(s) were added to refs/heads/dev by this push:
     new 2af310ac8 [Improve][DOC] Perfect the connector v2 doc (#2800)
2af310ac8 is described below

commit 2af310ac8db1ac73bf9c0b8e6d71449306a2e3c2
Author: liugddx <80...@qq.com>
AuthorDate: Mon Sep 26 17:25:22 2022 +0800

    [Improve][DOC] Perfect the connector v2 doc (#2800)
    
    * [Improve][DOC] Perfect the connector v2 doc
    
    * Update seatunnel-connectors-v2/README.zh.md
    
    Co-authored-by: Hisoka <fa...@qq.com>
    
    * [Improve][DOC] A little tinkering
    
    * [Improve][DOC] A little tinkering
    
    * [Doc][connector] add Console sink doc
    
    close #2794
    
    * [Doc][connector] add Console sink doc
    
    close #2794
    
    * fix some problem
    
    * fix some problem
    
    * fine tuning
    
    Co-authored-by: Hisoka <fa...@qq.com>
---
 seatunnel-connectors-v2/README.md    | 170 +++++++++++++++++++++++++++--------
 seatunnel-connectors-v2/README.zh.md |  97 ++++++++++++++++----
 2 files changed, 213 insertions(+), 54 deletions(-)

diff --git a/seatunnel-connectors-v2/README.md b/seatunnel-connectors-v2/README.md
index b24122d94..e6a6c6006 100644
--- a/seatunnel-connectors-v2/README.md
+++ b/seatunnel-connectors-v2/README.md
@@ -1,84 +1,180 @@
 # Purpose
-This article introduces the new interface and the new code structure on account of the newly designed API for Connectors in Apache SeaTunnel. This helps developers with quick overview regarding API, translation layer improvement, and development of new Connector.
+
+This article introduces the new interface and the new code structure on account of the newly designed API for Connectors
+in Apache SeaTunnel. This helps developers quickly understand API and transformation layer improvements. On the other
+hand, it can guide contributors how to use the new API to develop new connectors.See
+this [issue](https://github.com/apache/incubator-seatunnel/issues/1608) for details.
 
 ## **Code Structure**
-In order to separate from the old code, we have defined new modules for execution flow. This facilitates parallel development at the current stage, and reduces the difficulty of merging. All the relevant code at this stage is kept on the ``api-draft`` branch.
+
+In order to separate from the old code, we have defined new modules for execution flow. This facilitates parallel
+development at the current stage, and reduces the difficulty of merging.
 
 ### **Example**
-We have prepared a new version of the locally executable example program in ``seatunnel-examples``, which can be directly called using ``seatunnel-flink-connector-v2-example`` or ``seatunnel-spark-connector-v2-example`` in ``SeaTunnelApiExample``. This is also the debugging method that is often used in the local development of Connector. The corresponding configuration files are saved in the same module ``resources/examples`` folder as before.
 
+We have prepared two new version of the locally executable example program in `seatunnel-examples`,one
+is `seatunnel-examples/seatunnel-flink-connector-v2-example/src/main/java/org/apache/seatunnel/example/flink/v2/SeaTunnelApiExample.java`
+, it runs in the Flink engine. Another one
+is `seatunnel-examples/seatunnel-spark-connector-v2-example/src/main/java/org/apache/seatunnel/example/spark/v2/SeaTunnelApiExample.java`
+, it runs in the Spark engine. This is also the debugging method that is often used in the local development of
+Connector. You can debug these examples, which will help you better understand the running logic of the program. The
+configuration files used in example are saved in the "resources/examples" folder. If you want to add examples for your
+own connectors, you need to follow the steps below.
+
+1. Add the groupId, artifactId and version of the connector to be tested to
+   seatunnel-examples/seatunnel-flink-connector-v2-example/pom.xml(or add it to
+   seatunnel-examples/seatunnel-spark-connector-v2-example/pom.xml when you want to runs it in Spark engine) as a
+   dependency.
+2. Find the dependency in your connector pom file which scope is test or provided and then add them to
+   seatunnel-examples/seatunnel-flink-connector-v2-example/pom.xml(or add it to
+   seatunnel-examples/seatunnel-spark-connector-v2-example/pom.xml) file and modify the scope to compile.
+3. Refer to the SeaTunnelApiExample class to develop your sample code.
 
 ### **Startup Class**
-Aside from the old startup class, we have created two new startup class projects, namely ``seatunnel-core/seatunnel-flink-starter`` and ``seatunnel-core/seatunnel-spark-starter``. You can find out how to parse the configuration file into an executable Flink/Spark process here.
+
+Aside from the old startup class, we have created two new startup modules,
+namely ``seatunnel-core/seatunnel-flink-starter`` and ``seatunnel-core/seatunnel-spark-starter``. You can find out how
+to parse the configuration file into an executable Flink/Spark process here.
 
 ### **SeaTunnel API**
-A new ``seatunnel-api`` (not ``seatunnel-apis``) module has been created to store the new interfaces defined by the SeaTunnel API. By implementing these interfaces, developers can complete the SeaTunnel Connector that supports multiple engines.
+
+A new ``seatunnel-api`` (not ``seatunnel-apis``) module has been created to store the new interfaces defined by the
+SeaTunnel API. By implementing these interfaces, developers can complete the SeaTunnel Connector that supports multiple
+engines.
 
 ### **Translation Layer**
-We realize the conversion between SeaTunnel API and Engine API by adapting the interfaces of different engines, so as to achieve the effect of translation, and let our SeaTunnel Connector support the operation of multiple different engines. The corresponding code address, ``seatunnel-translation``, this module has the corresponding translation layer implementation. If you are interested, you can view the code and help us improve the current code.
+
+We realize the conversion between SeaTunnel API and Engine API by adapting the interfaces of different engines, so as to
+achieve the effect of translation, and let our SeaTunnel Connector support the operation of multiple different engines.
+The corresponding code address, ``seatunnel-translation``, this module has the corresponding translation layer
+implementation. If you are interested, you can view the code and help us improve the current code.
 
 ## **API introduction**
+
 The API design of the current version of SeaTunnel draws on the design concept of Flink.
 
 ### **Source**
+
 #### **SeaTunnelSource.java**
-- The Source of SeaTunnel adopts the design of stream-batch integration, ``getBoundedness`` which determines whether the current Source is a stream Source or a batch Source, so you can specify a Source by dynamic configuration (refer to the default method), which can be either a stream or a batch.
-- ``getRowTypeInfo`` To get the schema of the data, the connector can choose to hard-code to implement a fixed schema, or run the user to customize the schema through config configuration. The latter is recommended.
-- SeaTunnelSource is a class executed on the driver side, through which objects such as SourceReader, SplitEnumerator and serializers are obtained.
+
+- The Source of SeaTunnel adopts the design of stream-batch integration, ``getBoundedness`` which determines whether the
+  current Source is a stream Source or a batch Source, so you can specify a Source by dynamic configuration (refer to
+  the default method), which can be either a stream or a batch.
+- ``getRowTypeInfo`` To get the schema of the data, the connector can choose to hard-code to implement a fixed schema,
+  or run the user to customize the schema through config configuration. The latter is recommended.
+- SeaTunnelSource is a class executed on the driver side, through which objects such as SourceReader, SplitEnumerator
+  and serializers are obtained.
 - Currently, the data type supported by SeaTunnelSource must be SeaTunnelRow.
 
 #### **SourceSplitEnumerator.java**
-Use this enumerator to get the data read shard (SourceSplit) situation, different shards may be assigned to different SourceReaders to read data. Contains several key methods:
 
-- ``run``: Used to perform a spawn SourceSplit and call ``SourceSplitEnumerator.Context.assignSplit``: to distribute the shards to the SourceReader.
-- ``addSplitsBackSourceSplitEnumerator``: is required to redistribute these Splits when SourceSplit cannot be processed normally or restarted due to the exception of SourceReader.
-- ``registerReaderProcess``: some SourceReaders that are registered after the run is run. If there is no SourceSplit distributed at this time, it can be distributed to these new readers (yes, you need to maintain your SourceSplit distribution in SourceSplitEnumerator most of the time).
-- ``handleSplitRequest``: If some Readers actively request SourceSplit from SourceSplitEnumerator, this method can be called SourceSplitEnumerator.Context.assignSplit to sends shards to the corresponding Reader.
-- ``snapshotState``: It is used for stream processing to periodically return the current state that needs to be saved. If there is a state restoration, it will be called SeaTunnelSource.restoreEnumerator to constructs a SourceSplitEnumerator and restore the saved state to the SourceSplitEnumerator.
-- ``notifyCheckpointComplete``: It is used for subsequent processing after the state is successfully saved, and can be used to store the state or mark in third-party storage.
+Use this enumerator to get the data read shard (SourceSplit) situation, different shards may be assigned to different
+SourceReaders to read data. Contains several key methods:
+
+- ``run``: Used to perform a spawn SourceSplit and call ``SourceSplitEnumerator.Context.assignSplit``: to distribute the
+  shards to the SourceReader.
+- ``addSplitsBackSourceSplitEnumerator``: is required to redistribute these Splits when SourceSplit cannot be processed
+  normally or restarted due to the exception of SourceReader.
+- ``registerReaderProcess``: some SourceReaders that are registered after the run is run. If there is no SourceSplit
+  distributed at this time, it can be distributed to these new readers (yes, you need to maintain your SourceSplit
+  distribution in SourceSplitEnumerator most of the time).
+- ``handleSplitRequest``: If some Readers actively request SourceSplit from SourceSplitEnumerator, this method can be
+  called SourceSplitEnumerator.Context.assignSplit to sends shards to the corresponding Reader.
+- ``snapshotState``: It is used for stream processing to periodically return the current state that needs to be saved.
+  If there is a state restoration, it will be called SeaTunnelSource.restoreEnumerator to constructs a
+  SourceSplitEnumerator and restore the saved state to the SourceSplitEnumerator.
+- ``notifyCheckpointComplete``: It is used for subsequent processing after the state is successfully saved, and can be
+  used to store the state or mark in third-party storage.
 
 #### **SourceSplit.java**
-The interface used to save shards. Different shards need to define different splitIds. You can implement this interface to save the data that shards need to save, such as kafka's partition and topic, hbase's columnfamily and other information, which are used by SourceReader to determine Which part of the total data should be read.
+
+The interface used to save shards. Different shards need to define different splitIds. You can implement this interface
+to save the data that shards need to save, such as kafka's partition and topic, hbase's columnfamily and other
+information, which are used by SourceReader to determine Which part of the total data should be read.
 
 #### **SourceReader.java**
-The interface that directly interacts with the data source, and the action of reading data from the data source is completed by implementing this interface.
-- ``pollNext``: It is the core of Reader. Through this interface, the process of reading the data of the data source and returning it to SeaTunnel is realized. Whenever you are ready to pass data to SeaTunnel, you can call the ``Collector.collect`` method in the parameter, which can be called an infinite number of times to complete a large amount of data reading. But the data format supported at this stage can only be ``SeaTunnelRow``. Because our Source is a stream-batch integration, th [...]
 
-        if ( Boundedness . BOUNDED . equals ( context . getBoundedness ())) 
-        {
-        // signal to the source that we have reached the end of the data. context . signalNoMoreElement ();
-        break ;
-        }
+The interface that directly interacts with the data source, and the action of reading data from the data source is
+completed by implementing this interface.
+
+- ``pollNext``: It is the core of Reader. Through this interface, the process of reading the data of the data source and
+  returning it to SeaTunnel is realized. Whenever you are ready to pass data to SeaTunnel, you can call
+  the ``Collector.collect`` method in the parameter, which can be called an infinite number of times to complete a large
+  amount of data reading. But the data format supported at this stage can only be ``SeaTunnelRow``. Because our Source
+  is a stream-batch integration, the Connector has to decide when to end data reading in batch mode. For example, a
+  batch reads 100 pieces of data at a time. After the reading is completed, it needs ``pollNext`` to call in
+  to ``SourceReader.Context.signalNoMoreElementnotify`` SeaTunnel that there is no data to read . , then you can use
+  these 100 pieces of data for batch processing. Stream processing does not have this requirement, so most SourceReaders
+  with integrated stream batches will have the following code:
+
+```java
+if(Boundedness.BOUNDED.equals(context.getBoundedness())){
+    // signal to the source that we have reached the end of the data.
+    context.signalNoMoreElement();
+    break;
+    }
+```
 
 It means that SeaTunnel will be notified only in batch mode.
 
-- ``addSplits``:  Used by the framework to assign SourceSplit to different SourceReaders, SourceReader should save the obtained shards, and then pollNextread the corresponding shard data in it, but there may be times when the Reader does not read shards (maybe SourceSplit has not been generated or The current Reader is indeed not allocated), at this time, pollNextcorresponding processing should be made, such as continuing to wait.
-- ``handleNoMoreSplits``: When triggered, it indicates that there are no more shards, and the Connector Source is required to optionally make corresponding feedback
-- ``snapshotStateIt``: is used for stream processing to periodically return the current state that needs to be saved, that is, the fragmentation information (SeaTunnel saves the fragmentation information and state together to achieve dynamic allocation).
-- ``notifyCheckpointComplete``: Like ``notifyCheckpointAborted`` the name, it is a callback for different states of checkpoint.
+- ``addSplits``:  Used by the framework to assign SourceSplit to different SourceReaders, SourceReader should save the
+  obtained shards, and then pollNextread the corresponding shard data in it, but there may be times when the Reader does
+  not read shards (maybe SourceSplit has not been generated or The current Reader is indeed not allocated), at this
+  time, pollNextcorresponding processing should be made, such as continuing to wait.
+- ``handleNoMoreSplits``: When triggered, it indicates that there are no more shards, and the Connector Source is
+  required to optionally make corresponding feedback
+- ``snapshotStateIt``: is used for stream processing to periodically return the current state that needs to be saved,
+  that is, the fragmentation information (SeaTunnel saves the fragmentation information and state together to achieve
+  dynamic allocation).
+- ``notifyCheckpointComplete``: Like ``notifyCheckpointAborted`` the name, it is a callback for different states of
+  checkpoint.
 
 ### **Sink**
+
 #### **SeaTunnelSink.java**
-It is used to define the way to write data to the destination, and obtain instances such as ``SinkWriter`` and ``SinkCommitter`` through this interface. An important feature of the sink side is the processing of distributed transactions. SeaTunnel defines two different Committers: ``SinkCommitter`` used to process transactions for different subTasks ``SinkAggregatedCommitter``. Process transaction results for all nodes. Different Connector Sinks can be selected according to component pro [...]
+
+It is used to define the way to write data to the destination, and obtain instances such as ``SinkWriter``
+and ``SinkCommitter`` through this interface. An important feature of the sink side is the processing of distributed
+transactions. SeaTunnel defines two different Committers: ``SinkCommitter`` used to process transactions for different
+subTasks ``SinkAggregatedCommitter``. Process transaction results for all nodes. Different Connector Sinks can be
+selected according to component properties, whether to implement only ``SinkCommitter`` or ``SinkAggregatedCommitter``,
+or both.
 
 #### **SinkWriter.java**
-It is used to directly interact with the output source, and provide the data obtained by SeaTunnel through the data source to the Writer for data writing.
 
-- ``write``: Responsible for transferring data to ``SinkWriter``, you can choose to write it directly, or write it after buffering a certain amount of data. Currently, only the data type is supported ``SeaTunnelRow``.
-- ``prepareCommit``: Executed before commit, you can write data directly here, or you can implement phase one in 2pc, and then implement phase two in ``SinkCommitter`` or ``SinkAggregatedCommitter``. What this method returns is the commit information, which will be provided ``SinkCommitter`` and ``SinkAggregatedCommitter`` used for the next stage of transaction processing.
+It is used to directly interact with the output source, and provide the data obtained by SeaTunnel through the data
+source to the Writer for data writing.
+
+- ``write``: Responsible for transferring data to ``SinkWriter``, you can choose to write it directly, or write it after
+  buffering a certain amount of data. Currently, only the data type is supported ``SeaTunnelRow``.
+- ``prepareCommit``: Executed before commit, you can write data directly here, or you can implement phase one in 2pc,
+  and then implement phase two in ``SinkCommitter`` or ``SinkAggregatedCommitter``. What this method returns is the
+  commit information, which will be provided ``SinkCommitter`` and ``SinkAggregatedCommitter`` used for the next stage
+  of transaction processing.
 
 #### **SinkCommitter.java**
-It is used to process ``SinkWriter.prepareCommit`` the returned data information, including transaction information that needs to be submitted.
+
+It is used to process ``SinkWriter.prepareCommit`` the returned data information, including transaction information that
+needs to be submitted.
 
 #### **SinkAggregatedCommitter.java**
-It is used to process ``SinkWriter.prepareCommit`` the returned data information, including transaction information that needs to be submitted, etc., but it will be processed together on a single node, which can avoid the problem of inconsistency of the state caused by the failure of the second part of the stage.
 
-- ``combine``: Used ``SinkWriter.prepareCommit`` to aggregate the returned transaction information, and then generate aggregated transaction information.
+It is used to process ``SinkWriter.prepareCommit`` the returned data information, including transaction information that
+needs to be submitted, etc., but it will be processed together on a single node, which can avoid the problem of
+inconsistency of the state caused by the failure of the second part of the stage.
+
+- ``combine``: Used ``SinkWriter.prepareCommit`` to aggregate the returned transaction information, and then generate
+  aggregated transaction information.
 
 #### **Implement SinkCommitter or SinkAggregatedCommitter?**
-In the current version, it is recommended to implement ``SinkAggregatedCommitter`` as the first choice, which can provide strong consistency guarantee in Flink/Spark. At the same time, commit should be idempotent, and save engine retry can work normally.
+
+In the current version, it is recommended to implement ``SinkAggregatedCommitter`` as the first choice, which can
+provide strong consistency guarantee in Flink/Spark. At the same time, commit should be idempotent, and save engine
+retry can work normally.
 
 ## **Result**
-All Connector implementations should be under the ``seatunnel-connectors/seatunnel-connectors-seatuunelmodule``, and the examples that can be referred to at this stage are under this module.
+
+All Connector implementations should be under the ``seatunnel-connectors-v2``, and the examples that can be referred to
+at this stage are under this module.
 
 
diff --git a/seatunnel-connectors-v2/README.zh.md b/seatunnel-connectors-v2/README.zh.md
index 017b19907..02a59af2e 100644
--- a/seatunnel-connectors-v2/README.zh.md
+++ b/seatunnel-connectors-v2/README.zh.md
@@ -1,64 +1,127 @@
 ## 目的
-Because SeaTunnel design new API for connectors, 所以通过这篇文章来介绍新的接口以及新的代码结构,方便开发者快速的帮助新API和翻译层完善,以及开发出新的Connecotor.
+
+SeaTunnel为与计算引擎进行解耦,设计了新的连接器API,通过这篇文章来介绍新的接口以及新的代码结构,方便开发者快速上手使用新版API开发连接器并理解新版API运行原理.
+详细设计请查看该[提议](https://github.com/apache/incubator-seatunnel/issues/1608) 。
+
 ## 代码结构
-现阶段所有相关代码保存在`api-draft`分支上。
+
 为了和老的代码分开,方便现阶段的并行开发,以及降低merge的难度。我们为新的执行流程定义了新的模块
+
 ### Example
-我们已经在`seatunnel-examples`中准备好了新版本的可本地执行Example程序,直接调用`seatunnel-flink-connector-v2-example`或`seatunnel-spark-connector-v2-example`中的`SeaTunnelApiExample`即可。这也是本地开发Connector经常会用到的调试方式。
-对应的配置文件保存在同模块的`resources/examples`文件夹下,和以前一样。
+
+我们已经在`seatunnel-examples`
+准备了两个本地可执行的案例程序,其中一个是`seatunnel-examples/seatunnel-flink-connector-v2-example/src/main/java/org/apache/seatunnel/example/flink/v2/SeaTunnelApiExample.java`
+,它运行在flink引擎上。另外一个是`seatunnel-examples/seatunnel-spark-connector-v2-example/src/main/java/org/apache/seatunnel/example/spark/v2/SeaTunnelApiExample.java`
+,它运行在spark引擎上。你可以通过调试这些例子帮你更好的理解程序运行逻辑。使用的配置文件保存在`resources/examples`文件夹里。如果你想增加自己的connectors,你需要按照下面的步骤。
+
+1. 在`seatunnel-examples/seatunnel-flink-connector-v2-example/pom.xml`添加connector依赖的groupId, artifactId 和
+   version.(或者当你想在spark引擎运行时在`seatunnel-examples/seatunnel-spark-connector-v2-example/pom.xml`添加依赖)
+2. 如果你的connector中存在scope为test或provided的依赖,将这些依赖添加到seatunnel-examples/seatunnel-flink-connector-v2-example/pom.xml(
+   或者在seatunnel-examples/seatunnel-spark-connector-v2-example/pom.xml)中,并且修改scope为compile.
+3. 参考`SeaTunnelApiExample`开发自己的案例程序。
+
 ### 启动类
-和老的启动类分开,我们创建了两个新的启动类工程,分别是`seatunnel-core/seatunnel-flink-starter`和`seatunnel-core/seatunnel-spark-starter`. 可以在这里找到如何将配置文件解析为可以执行的Flink/Spark流程。
+
+和老的启动类分开,我们创建了两个新的启动类工程,分别是`seatunnel-core/seatunnel-flink-starter`和`seatunnel-core/seatunnel-spark-starter`.
+可以在这里找到如何将配置文件解析为可以执行的Flink/Spark流程。
+
 ### SeaTunnel API
+
 新建了一个`seatunnel-api`(不是`seatunnel-apis`)模块,用于存放SeaTunnel API定义的新接口, 开发者通过对这些接口进行实现,就可以完成支持多引擎的SeaTunnel Connector
+
 ### 翻译层
-我们通过适配不同引擎的接口,实现SeaTunnel API和Engine API的转换,从而达到翻译的效果,让我们的SeaTunnel Connector支持多个不同引擎的运行。
-对应代码地址为`seatunnel-translation`,该模块有对应的翻译层实现。感兴趣可以查看代码,帮助我们完善当前代码。
+
+我们通过适配不同引擎的接口,实现SeaTunnel API和Engine API的转换,从而达到翻译的效果,让我们的SeaTunnel Connector支持多个不同引擎的运行。 对应代码地址为`seatunnel-translation`
+,该模块有对应的翻译层实现。感兴趣可以查看代码,帮助我们完善当前代码。
+
 ## API 介绍
+
 `SeaTunnel 当前版本的API设计借鉴了Flink的设计理念`
+
 ### Source
+
 #### SeaTunnelSource.java
-- SeaTunnel的Source采用流批一体的设计,通过`getBoundedness`来决定当前Source是流Source还是批Source,所以可以通过动态配置的方式(参考default方法)来指定一个Source既可以为流,也可以为批。
+
+- SeaTunnel的Source采用流批一体的设计,通过`getBoundedness`
+  来决定当前Source是流Source还是批Source,所以可以通过动态配置的方式(参考default方法)来指定一个Source既可以为流,也可以为批。
 - `getRowTypeInfo`来得到数据的schema,connector可以选择硬编码来实现固定的schema,或者运行用户通过config配置来自定义schema,推荐后者。
 - SeaTunnelSource是执行在driver端的类,通过该类,来获取SourceReader,SplitEnumerator等对象以及序列化器。
 - 目前SeaTunnelSource支持的生产的数据类型必须是SeaTunnelRow类型。
+
 #### SourceSplitEnumerator.java
+
 通过该枚举器来获取数据读取的分片(SourceSplit)情况,不同的分片可能会分配给不同的SourceReader来读取数据。包含几个关键方法:
+
 - `run`用于执行产生SourceSplit并调用`SourceSplitEnumerator.Context.assignSplit`来将分片分发给SourceReader。
 - `addSplitsBack`用于处理SourceReader异常导致SourceSplit无法正常处理或者重启时,需要SourceSplitEnumerator对这些Split进行重新分发。
-- `registerReader`处理一些在run运行了之后才注册上的SourceReader,如果这个时候还没有分发下去的SourceSplit,就可以分发给这些新的Reader(对,你大多数时候需要在SourceSplitEnumerator里面维护你的SourceSplit分发情况)
-- `handleSplitRequest`如果有些Reader主动向SourceSplitEnumerator请求SourceSplit,那么可以通过该方法调用`SourceSplitEnumerator.Context.assignSplit`来向对应的Reader发送分片。
-- `snapshotState`用于流处理定时返回需要保存的当前状态,如果有状态恢复时,会调用`SeaTunnelSource.restoreEnumerator`来构造SourceSplitEnumerator,将保存的状态恢复给SourceSplitEnumerator。
+- `registerReader`
+  处理一些在run运行了之后才注册上的SourceReader,如果这个时候还没有分发下去的SourceSplit,就可以分发给这些新的Reader(对,你大多数时候需要在SourceSplitEnumerator里面维护你的SourceSplit分发情况)
+- `handleSplitRequest`
+  如果有些Reader主动向SourceSplitEnumerator请求SourceSplit,那么可以通过该方法调用`SourceSplitEnumerator.Context.assignSplit`来向对应的Reader发送分片。
+- `snapshotState`用于流处理定时返回需要保存的当前状态,如果有状态恢复时,会调用`SeaTunnelSource.restoreEnumerator`
+  来构造SourceSplitEnumerator,将保存的状态恢复给SourceSplitEnumerator。
 - `notifyCheckpointComplete`用于状态保存成功后的后续处理,可以用于将状态或者标记存入第三方存储。
+
 #### SourceSplit.java
+
 用于保存分片的接口,不同的分片需要定义不同的splitId,可以通过实现这个接口,保存分片需要保存的数据,比如kafka的partition和topic,hbase的columnfamily等信息,用于SourceReader来确定应该读取全部数据的哪一部分。
+
 #### SourceReader.java
+
 直接和数据源进行交互的接口,通过实现该接口完成从数据源读取数据的动作。
-- `pollNext`便是Reader的核心,通过这个接口,实现读取数据源的数据然后返回给SeaTunnel的流程。每当准备将数据传递给SeaTunnel时,就可以调用参数中的`Collector.collect`方法,可以无限次的调用该方法完成数据的大量读取。但是现阶段支持的数据格式只能是`SeaTunnelRow`。因为我们的Source是流批一体的,所以批模式的时候Connector要自己决定什么时候结束数据读取,比如批处理一次读取100条数据,读取完成后需要在`pollNext`中调用`SourceReader.Context.signalNoMoreElement`通知SeaTunnel没有数据读取了,那么就可以利用这100条数据进行批处理。流处理没有这个要求,那么大多数流批一体的SourceReader都会出现如下代码:
+
+- `pollNext`便是Reader的核心,通过这个接口,实现读取数据源的数据然后返回给SeaTunnel的流程。每当准备将数据传递给SeaTunnel时,就可以调用参数中的`Collector.collect`
+  方法,可以无限次的调用该方法完成数据的大量读取。但是现阶段支持的数据格式只能是`SeaTunnelRow`
+  。因为我们的Source是流批一体的,所以批模式的时候Connector要自己决定什么时候结束数据读取,比如批处理一次读取100条数据,读取完成后需要在`pollNext`
+  中调用`SourceReader.Context.signalNoMoreElement`
+  通知SeaTunnel没有数据读取了,那么就可以利用这100条数据进行批处理。流处理没有这个要求,那么大多数流批一体的SourceReader都会出现如下代码:
+
 ```java
 if (Boundedness.BOUNDED.equals(context.getBoundedness())) {
     // signal to the source that we have reached the end of the data.
     context.signalNoMoreElement();
     break;
-}
+    }
 ```
+
 代表着只有批模式的时候才会通知SeaTunnel。
-- `addSplits`用于框架将SourceSplit分配给不同的SourceReader,SourceReader应该将得到的分片保存起来,然后在`pollNext`中读取对应的分片数据,但是可能出现Reader没有分片读取的时候(可能SourceSplit还没生成或者当前Reader确实分配不到),这个时候`pollNext`应该做出对应的处理,比如继续等待。
+
+- `addSplits`用于框架将SourceSplit分配给不同的SourceReader,SourceReader应该将得到的分片保存起来,然后在`pollNext`
+  中读取对应的分片数据,但是可能出现Reader没有分片读取的时候(可能SourceSplit还没生成或者当前Reader确实分配不到),这个时候`pollNext`应该做出对应的处理,比如继续等待。
 - `handleNoMoreSplits`触发时表示没有更多分片,需要Connector Source可选的做出相应的反馈
 - `snapshotState`用于流处理定时返回需要保存的当前状态,也就是分片信息(SeaTunnel将分片信息和状态保存在一起,实现动态分配)。
 - `notifyCheckpointComplete`和`notifyCheckpointAborted`和名字一样,是checkpoint不同状态下的回调。
+
 ### Sink
+
 #### SeaTunnelSink.java
-用于定义数据写入目标端的方式,通过该接口来实现获取SinkWriter和SinkCommitter等实例。Sink端有一个重要特性就是分布式事务的处理,SeaTunnel定义了两种不同的Committer:`SinkCommitter`用于处理针对不同的subTask进行事务的处理,每个subTask处理各自的事务,然后成功后再由`SinkAggregatedCommitter`单线程的处理所有节点的事务结果。不同的Connector Sink可以根据组件属性进行选择,到底是只实现`SinkCommitter`或`SinkAggregatedCommitter`,还是都实现。
+
+用于定义数据写入目标端的方式,通过该接口来实现获取SinkWriter和SinkCommitter等实例。Sink端有一个重要特性就是分布式事务的处理,SeaTunnel定义了两种不同的Committer:`SinkCommitter`
+用于处理针对不同的subTask进行事务的处理,每个subTask处理各自的事务,然后成功后再由`SinkAggregatedCommitter`单线程的处理所有节点的事务结果。不同的Connector
+Sink可以根据组件属性进行选择,到底是只实现`SinkCommitter`或`SinkAggregatedCommitter`,还是都实现。
+
 #### SinkWriter.java
+
 用于直接和输出源进行交互,将SeaTunnel通过数据源取得的数据提供给Writer进行数据写入。
+
 - `write` 负责将数据传入SinkWriter,可以选择直接写入,或者缓存到一定数据后再写入,目前数据类型只支持`SeaTunnelRow`。
-- `prepareCommit` 在commit之前执行,可以在这直接写入数据,也可以实现2pc中的阶段一,然后在`SinkCommitter`或`SinkAggregatedCommitter`中实现阶段二。该方法返回的就是commit信息,将会提供给`SinkCommitter`和`SinkAggregatedCommitter`用于下一阶段事务处理。
+- `prepareCommit` 在commit之前执行,可以在这直接写入数据,也可以实现2pc中的阶段一,然后在`SinkCommitter`或`SinkAggregatedCommitter`
+  中实现阶段二。该方法返回的就是commit信息,将会提供给`SinkCommitter`和`SinkAggregatedCommitter`用于下一阶段事务处理。
+
 #### SinkCommitter.java
+
 用于处理`SinkWriter.prepareCommit`返回的数据信息,包含需要提交的事务信息等。
+
 #### SinkAggregatedCommitter.java
+
 用于处理`SinkWriter.prepareCommit`返回的数据信息,包含需要提交的事务信息等,但是会在单个节点一起处理,这样可以避免阶段二部分失败导致状态不一致的问题。
+
 - `combine` 用于将`SinkWriter.prepareCommit`返回的事务信息进行聚合,然后生成聚合的事务信息。
+
 #### 我应该实现SinkCommitter还是SinkAggregatedCommitter?
+
 当前版本推荐将实现SinkAggregatedCommitter作为首选,可以在Flink/Spark中提供较强的一致性保证,同时commit应该要实现幂等性,保存引擎重试能够正常运作。
+
 ## 实现
-所有的Connector实现都应该在`seatunnel-connectors/seatunnel-connectors-seatuunel`模块下,现阶段可参考的示例均在此模块下。
\ No newline at end of file
+
+现阶段所有的连接器实现及可参考的示例都在seatunnel-connectors-v2下,用户可自行查阅参考。