You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@inlong.apache.org by GitBox <gi...@apache.org> on 2022/06/14 09:04:34 UTC

[GitHub] [incubator-inlong-website] gong commented on a diff in pull request #404: [INLONG-403][Sort] Add doc about how to extend Extract or Load node

gong commented on code in PR #404:
URL: https://github.com/apache/incubator-inlong-website/pull/404#discussion_r896567043


##########
docs/design_and_concept/how_to_extend_extract_or_load_node_en.md:
##########
@@ -0,0 +1,216 @@
+---
+title: Sort Plugin
+sidebar_position: 3
+---
+
+# Overview
+
+InLong-Sort is known as a real-time ETL system.  Currently, supported extract or load includes elasticsearch, HBase, hive, iceberg, JDBC, Kafka, mongodb, mysql, orcale, Postgres, pulsar, etc。InLong-Sort is an ETL solution based on Flink SQL，The powerful expressive power of Flink SQL brings high scalability and flexibility. Basically, the semantics supported by Flink SQL are supported by InLong-Sort。In some scenarios, when the built-in functions of Flink SQL do not meet the requirements, they can also be extended through various UDFs in InLong-Sort. At the same time, it will be easier for those who have used SQL, especially Flink SQL, to get started.
+
+This article describes how to extend a new source (abstracted as extract node in inlong) or a new sink (abstracted as load node in inlong) in InLong-Sort.  After understanding the InLong-Sort architecture, you can understand how the source corresponds to the extract node, and how the sink corresponds to the load node. The architecture of inlong sort can be represented by UML object relation diagram as: 
+
+![sort_UML](img/sort_uml.png)
+
+The concepts of each component are:
+
+**Group**: data flow group, including multiple data flows, one group represents one data access
+
+**Stream**: data flow, a data flow has a specific flow direction
+
+**GroupInfo**: encapsulation of data flow in sort. a groupinfo can contain multiple dataflowinfo
+
+**StreamInfo**: abstract of data flow in sort, including various sources, transformations, destinations, etc. of the data flow
+
+**Node**: abstraction of data source, data transformation and data destination in data synchronization
+
+**ExtractNode**: source-side abstraction for data synchronization
+
+**TransformNode**: transformation process abstraction of data synchronization
+
+**LoadNode**: destination abstraction for data synchronization
+
+**NodeRelationShip**:  abstraction of each node relationship in data synchronization
+
+**FieldRelationShip**:  abstraction of the relationship between upstream and downstream node fields in data synchronization
+
+**FieldInfo**: node field
+
+**BuiltInFieldInfo**: node built-in fields

Review Comment:
   It should be `MetaFieldInfo` now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org