You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by GitBox <gi...@apache.org> on 2022/06/08 11:39:51 UTC

[GitHub] [incubator-inlong-website] kipshi opened a new pull request, #400: [INLONG-397][Release] Add blog for extend data source in manager

kipshi opened a new pull request, #400:
URL: https://github.com/apache/incubator-inlong-website/pull/400

   Fixes #397 
   
   where *XYZ* should be replaced by the actual issue number.
   
   ### Motivation
   
   ### Modifications
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads (10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Documentation
   
     - Does this pull request introduce a new feature? (yes / no)
     - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
     - If a feature is not applicable for documentation, explain why?
     - If a feature is not documented yet in this PR, please create a followup issue for adding the documentation
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] dockerzhang commented on a diff in pull request #400: [INLONG-397][Doc] Add guide for extend data source in manager

Posted by GitBox <gi...@apache.org>.
dockerzhang commented on code in PR #400:
URL: https://github.com/apache/incubator-inlong-website/pull/400#discussion_r892258457


##########
docs/design_and_concept/how_to_extend_data_source.md:
##########
@@ -0,0 +1,141 @@
+---
+title: Extended Data Source

Review Comment:
   ->
   Data Node Plugin



##########
docs/design_and_concept/how_to_extend_data_source.md:
##########
@@ -0,0 +1,141 @@
+---
+title: Extended Data Source
+sidebar_position: 6
+---
+
+## Overview
+
+Inlong is aimed at create dataflow between different data sources, now Inlong has support several universal data sources such as **MySQL**, **Apache Kafka**, **ClickHouse** on Input/Output respectively,
+You can refer to [data_node](https://inlong.apache.org/docs/next/data_node/extract_node/auto_push) for specific information.
+We Plan to support more data sources in the future, and this article is a development manual to extend data sources.

Review Comment:
   data sources
   ->
   data nodes



##########
docs/design_and_concept/how_to_extend_data_source.md:
##########
@@ -0,0 +1,141 @@
+---
+title: Extended Data Source
+sidebar_position: 6
+---
+
+## Overview
+
+Inlong is aimed at create dataflow between different data sources, now Inlong has support several universal data sources such as **MySQL**, **Apache Kafka**, **ClickHouse** on Input/Output respectively,
+You can refer to [data_node](https://inlong.apache.org/docs/next/data_node/extract_node/auto_push) for specific information.
+We Plan to support more data sources in the future, and this article is a development manual to extend data sources.
+
+## Extend Data Extract Node

Review Comment:
   Extend Extract Node
   



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_extend_data_source.md:
##########
@@ -0,0 +1,141 @@
+---
+title: 数据源扩展

Review Comment:
   数据节点插件



##########
docs/design_and_concept/how_to_extend_data_source.md:
##########
@@ -0,0 +1,141 @@
+---
+title: Extended Data Source
+sidebar_position: 6
+---
+
+## Overview
+
+Inlong is aimed at create dataflow between different data sources, now Inlong has support several universal data sources such as **MySQL**, **Apache Kafka**, **ClickHouse** on Input/Output respectively,
+You can refer to [data_node](https://inlong.apache.org/docs/next/data_node/extract_node/auto_push) for specific information.
+We Plan to support more data sources in the future, and this article is a development manual to extend data sources.
+
+## Extend Data Extract Node
+
+In order to extend an input data sources , also refered to **extract node** in Inlong. We take **MySQL_BINLOG** for example.
+
+- Develop extract node plugin in sort, refer to [how_to_write_plugin_sort](https://inlong.apache.org/docs/next/design_and_concept/how_to_write_plugin_sort)
+- Add **TaskType** in `org.apache.inlong.common.enums.TaskTypeEnum`
+```java
+public enum TaskTypeEnum {
+
+    DATABASE_MIGRATION(0),
+    SQL(1),
+    BINLOG(2),
+    FILE(3),
+    KAFKA(4),
+    PULSAR(5),
+    POSTGRES(6),
+    ORACLE(7),
+    SQLSERVER(8),
+    MONGODB(9),
+    ...
+```
+- Add **SourceType** in `org.apache.inlong.manager.common.enums.SourceType`
+```java
+public enum SourceType {
+
+    AUTO_PUSH("AUTO_PUSH", null),
+    FILE("FILE", TaskTypeEnum.FILE),
+    SQL("SQL", TaskTypeEnum.SQL),
+    BINLOG("BINLOG", TaskTypeEnum.BINLOG),
+    KAFKA("KAFKA", TaskTypeEnum.KAFKA),
+    PULSAR("PULSAR", TaskTypeEnum.PULSAR),
+    POSTGRES("POSTGRES", TaskTypeEnum.POSTGRES),
+    ORACLE("ORACLE", TaskTypeEnum.ORACLE),
+    SQLSERVER("SQLSERVER", TaskTypeEnum.SQLSERVER),
+    MONGODB("MONGO", TaskTypeEnum.MONGODB),
+    ...
+```
+- Create new package under package path: `org.apache.inlong.manager.common.pojo.source`, develop every entity class needed.
+  ![](img/Binlog_Entity_Class.png)
+- Create Operation class for new data source under package path: `org.apache.inlong.manager.service.source`.
+  ![](img/Binlog_Operation.png)
+- Transfer data source to **ExtractNode** supported in **Sort**
+```java
+public class ExtractNodeUtils {
+    
+    public static ExtractNode createExtractNode(StreamSource sourceInfo) {
+        SourceType sourceType = SourceType.forType(sourceInfo.getSourceType());
+        switch (sourceType) {
+            case BINLOG:
+                return createExtractNode((MySQLBinlogSource) sourceInfo);
+            case KAFKA:
+                return createExtractNode((KafkaSource) sourceInfo);
+            case PULSAR:
+                return createExtractNode((PulsarSource) sourceInfo);
+            case POSTGRES:
+                return createExtractNode((PostgresSource) sourceInfo);
+            case ORACLE:
+                return createExtractNode((OracleSource) sourceInfo);
+            case SQLSERVER:
+                return createExtractNode((SqlServerSource) sourceInfo);
+            case MONGODB:
+                return createExtractNode((MongoDBSource) sourceInfo);
+            default:
+                throw new IllegalArgumentException(
+                        String.format("Unsupported sourceType=%s to create extractNode", sourceType));
+        }
+    }
+    ...
+```
+## Extend Data Load Node

Review Comment:
   need one more line.



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_extend_data_source.md:
##########
@@ -0,0 +1,141 @@
+---
+title: 数据源扩展
+sidebar_position: 6
+---
+
+## 总览
+
+Inlong 设计初衷即是为了在不同数据源之间创建数据流, 到目前为止,Inlong已经支持了多种常用数据源的读取和写入,如 **MySQL**, **Apache Kafka**, **ClickHouse** 等,
+详细内容可参考 [数据节点](https://inlong.apache.org/zh-CN/docs/next/data_node/extract_node/auto_push).
+我们预计会在未来支持更多的常用数据源, 故本文会简短介绍如何在现有框架下扩展数据源.
+
+## 扩展读取节点
+ 
+以**MySQL_BINLOG**为例,下午会介绍如何在Inlong框架下扩展读取节点.
+
+- 首先需要在Sort组件内支持该数据源, 详情参考 [Sort 插件](https://inlong.apache.org/zh-CN/docs/next/design_and_concept/how_to_write_plugin_sort)
+- 在枚举类`org.apache.inlong.common.enums.TaskTypeEnum`中增加对应的枚举
+```java
+public enum TaskTypeEnum {
+
+    DATABASE_MIGRATION(0),
+    SQL(1),
+    BINLOG(2),
+    FILE(3),
+    KAFKA(4),
+    PULSAR(5),
+    POSTGRES(6),
+    ORACLE(7),
+    SQLSERVER(8),
+    MONGODB(9),
+    ...
+```
+- 在枚举类`org.apache.inlong.manager.common.enums.SourceType`中同样增加对应枚举
+```java
+public enum SourceType {
+
+    AUTO_PUSH("AUTO_PUSH", null),
+    FILE("FILE", TaskTypeEnum.FILE),
+    SQL("SQL", TaskTypeEnum.SQL),
+    BINLOG("BINLOG", TaskTypeEnum.BINLOG),
+    KAFKA("KAFKA", TaskTypeEnum.KAFKA),
+    PULSAR("PULSAR", TaskTypeEnum.PULSAR),
+    POSTGRES("POSTGRES", TaskTypeEnum.POSTGRES),
+    ORACLE("ORACLE", TaskTypeEnum.ORACLE),
+    SQLSERVER("SQLSERVER", TaskTypeEnum.SQLSERVER),
+    MONGODB("MONGO", TaskTypeEnum.MONGODB),
+    ...
+```
+- 在`org.apache.inlong.manager.common.pojo.source`路径下创建文件夹, 创建对应实体类.
+  ![](img/Binlog_Entity_Class.png)
+- 在`org.apache.inlong.manager.service.source`路径下,创建对应工具类.
+  ![](img/Binlog_Operation.png)
+- 支持数据源到**ExtractNode**的转换函数
+```java
+public class ExtractNodeUtils {
+    
+    public static ExtractNode createExtractNode(StreamSource sourceInfo) {
+        SourceType sourceType = SourceType.forType(sourceInfo.getSourceType());
+        switch (sourceType) {
+            case BINLOG:
+                return createExtractNode((MySQLBinlogSource) sourceInfo);
+            case KAFKA:
+                return createExtractNode((KafkaSource) sourceInfo);
+            case PULSAR:
+                return createExtractNode((PulsarSource) sourceInfo);
+            case POSTGRES:
+                return createExtractNode((PostgresSource) sourceInfo);
+            case ORACLE:
+                return createExtractNode((OracleSource) sourceInfo);
+            case SQLSERVER:
+                return createExtractNode((SqlServerSource) sourceInfo);
+            case MONGODB:
+                return createExtractNode((MongoDBSource) sourceInfo);
+            default:
+                throw new IllegalArgumentException(
+                        String.format("Unsupported sourceType=%s to create extractNode", sourceType));
+        }
+    }
+    ...
+```
+## 扩展写入节点

Review Comment:
   need one more line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] dockerzhang commented on pull request #400: [INLONG-397][Doc] Add guide for extend data source in manager

Posted by GitBox <gi...@apache.org>.
dockerzhang commented on PR #400:
URL: https://github.com/apache/incubator-inlong-website/pull/400#issuecomment-1149821803

   how_to_extend_data_source.md
   ->
   inlong_data_node_plugin.md


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] dockerzhang merged pull request #400: [INLONG-397][Doc] Add guide for extend data source in manager

Posted by GitBox <gi...@apache.org>.
dockerzhang merged PR #400:
URL: https://github.com/apache/incubator-inlong-website/pull/400


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org