You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by GitBox <gi...@apache.org> on 2022/06/16 03:29:28 UTC

[GitHub] [incubator-inlong-website] Oneal65 opened a new pull request, #418: [INLONG-409][Sort]add md for mongodb-cdc

Oneal65 opened a new pull request, #418:
URL: https://github.com/apache/incubator-inlong-website/pull/418

   Fixes #409 
   
   where *XYZ* should be replaced by the actual issue number.
   
   ### Motivation
   
   add md for mongodb-cdc
   
   ### Modifications
   add documents about mongodb extract node
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads (10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Documentation
   
     - Does this pull request introduce a new feature? (yes / no)
     - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
     - If a feature is not applicable for documentation, explain why?
     - If a feature is not documented yet in this PR, please create a followup issue for adding the documentation
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] gong commented on a diff in pull request #418: [INLONG-409][Sort]add md for mongodb-cdc

Posted by GitBox <gi...@apache.org>.
gong commented on code in PR #418:
URL: https://github.com/apache/incubator-inlong-website/pull/418#discussion_r899079239


##########
docs/data_node/extract_node/mongodb-cdc-en.md:
##########
@@ -0,0 +1,197 @@
+---
+title: MongoDB-CDC
+sidebar_position: 7
+---
+
+## MongoDB-CDC Extract Node
+
+The MongoDB CDC connector allows for reading snapshot data and incremental data from MongoDB. This document describes how to setup the MongoDB CDC connector to run SQL queries against MongoDB.
+
+## Supported Version
+| Extract Node                    | Version                                      |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## Dependencies
+
+In order to setup the MongoDB CDC connector, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.
+
+### Maven dependency
+
+```
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## Setup MongoDB
+
+### Availability
+
+- MongoDB version
+
+  MongoDB version >= 3.6
+  We use [change streams](https://docs.mongodb.com/manual/changeStreams/) feature (new in version 3.6) to capture change data.
+
+- Cluster Deployment
+
+  [replica sets](https://docs.mongodb.com/manual/replication/) or [sharded clusters](https://docs.mongodb.com/manual/sharding/) is required.
+
+- Storage Engine
+
+  [WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger) storage engine is required.
+
+- [Replica set protocol version](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  Replica set protocol version 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion) is required.
+  Starting in version 4.0, MongoDB only supports pv1. pv1 is the default for all new replica sets created with MongoDB 3.2 or later.
+
+- Privileges
+
+  `changeStream` and `read` privileges are required by MongoDB Kafka Connector.
+
+  You can use the following example for simple authorization.
+  For more detailed authorization, please refer to [MongoDB Database User Roles](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles).
+
+  ```json
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, //read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } //for snapshot reading
+    ]
+  });
+  ```
+
+## How to create a MongoDB Extract Node
+
+### Usage for SQL API
+
+The example below shows how to create an MongoDB Extract Node with `Flink SQL` :
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a MySQL table 'mongodb_extract_node' in Flink SQL
+Flink SQL> CREATE TABLE mongodb_extract_node (
+  _id STRING, // must be declared
+  name STRING,
+  weight DECIMAL(10,3),
+  tags ARRAY<STRING>, -- array
+  price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+  suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+  PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+  'connector' = 'mongodb-cdc',
+  'hosts' = 'localhost:27017,localhost:27018,localhost:27019',
+  'username' = 'flinkuser',
+  'password' = 'flinkpw',
+  'database' = 'inventory',
+  'collection' = 'mongodb_extract_node'
+);
+
+-- Read snapshot and binlogs from mongodb_extract_node
+Flink SQL> SELECT * FROM mongodb_extract_node;
+```
+
+**Note that**
+
+MongoDB’s change event record doesn’t have update before message. So, we can only convert it to Flink’s UPSERT changelog stream. An upsert stream requires a unique key, so we must declare `_id` as primary key. We can’t declare other column as primary key, becauce delete operation do not contain’s the key and value besides `_id` and `sharding key`. 
+
+### Usage for InLong Dashboard
+
+TODO: It will be supported in the future.
+
+### Usage for InLong Manager Client
+
+TODO: It will be supported in the future.
+
+## MongoDB Extract Node Options
+
+| **Option**                | **Required** | **Default**      | **Type** | **Description**                                              |
+| ------------------------- | ------------ | ---------------- | -------- | ------------------------------------------------------------ |
+| connector                 | required     | (none)           | String   | Specify what connector to use, here should be `mongodb-cdc`. |
+| hosts                     | required     | (none)           | String   | The comma-separated list of hostname and port pairs of the MongoDB servers. eg. `localhost:27017,localhost:27018` |
+| username                  | optional     | (none)           | String   | Name of the database user to be used when connecting to MongoDB. This is required only when MongoDB is configured to use authentication. |
+| password                  | optional     | (none)           | String   | Password to be used when connecting to MongoDB. This is required only when MongoDB is configured to use authentication. |
+| database                  | required     | (none)           | String   | Name of the database to watch for changes.                   |
+| collection                | required     | (none)           | String   | Name of the collection in the database to watch for changes. |
+| connection.options        | optional     | (none)           | String   | The ampersand-separated [connection options](https://docs.mongodb.com/manual/reference/connection-string/#std-label-connections-connection-options) of MongoDB. eg. `replicaSet=test&connectTimeoutMS=300000` |
+| errors.tolerance          | optional     | none             | String   | Whether to continue processing messages if an error is encountered. Accept `none` or `all`. When set to `none`, the connector reports an error and blocks further processing of the rest of the records when it encounters an error. When set to `all`, the connector silently ignores any bad messages. |
+| errors.log.enable         | optional     | true             | Boolean  | Whether details of failed operations should be written to the log file. |
+| copy.existing             | optional     | true             | Boolean  | Whether copy existing data from source collections.          |
+| copy.existing.pipeline    | optional     | (none)           | String   | An array of JSON objects describing the pipeline operations to run when copying existing data. This can improve the use of indexes by the copying manager and make copying more efficient. eg. `[{"$match": {"closed": "false"}}]` ensures that only documents in which the closed field is set to false are copied. |
+| copy.existing.max.threads | optional     | Processors Count | Integer  | The number of threads to use when performing the data copy.  |
+| copy.existing.queue.size  | optional     | 16000            | Integer  | The max size of the queue to use when copying data.          |
+| poll.max.batch.size       | optional     | 1000             | Integer  | Maximum number of change stream documents to include in a single batch when polling for new data. |
+| poll.await.time.ms        | optional     | 1500             | Integer  | The amount of time to wait before checking for new results on the change stream. |
+| heartbeat.interval.ms     | optional     | 0                | Integer  | The length of time in milliseconds between sending heartbeat messages. Use 0 to disa |
+
+
+## Available Metadata
+
+The following format metadata can be exposed as read-only (VIRTUAL) columns in a table definition.
+
+| Key             | DataType                  | Description                                                  |
+| --------------- | ------------------------- | ------------------------------------------------------------ |
+| database_name   | STRING NOT NULL           | Name of the database that contain the row.                   |
+| collection_name | STRING NOT NULL           | Name of the collection that contain the row.                 |
+| op_ts           | TIMESTAMP_LTZ(3) NOT NULL | It indicates the time that the change was made in the database. If the record is read from snapshot of the table instead of the change stream, the value is always 0. |
+
+
+The extended CREATE TABLE example demonstrates the syntax for exposing these metadata fields:
+```sql
+CREATE TABLE `mysql_extract_node` (
+    db_name STRING METADATA FROM 'database_name' VIRTUAL,
+    table_name STRING METADATA  FROM 'table_name' VIRTUAL,

Review Comment:
   `table_name` should be `collection_name`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] gong commented on a diff in pull request #418: [INLONG-409][Sort]add md for mongodb-cdc

Posted by GitBox <gi...@apache.org>.
gong commented on code in PR #418:
URL: https://github.com/apache/incubator-inlong-website/pull/418#discussion_r899107906


##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -20,10 +20,10 @@ I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工
 
 ```xml
 <dependency>
-  <groupId>com.ververica</groupId>
-  <artifactId>flink-connector-mongodb-cdc</artifactId>
-  <!-- the dependency is available only for stable releases. -->
-  <version>2.1.1</version>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>sort-connector-mongodb-cdc</artifactId>
+    <!-- Choose the version that suits your application -->

Review Comment:
   translate it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] dockerzhang merged pull request #418: [INLONG-409][Sort] Add guide doc for the mongodb-cdc connector

Posted by GitBox <gi...@apache.org>.
dockerzhang merged PR #418:
URL: https://github.com/apache/incubator-inlong-website/pull/418


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] gong commented on a diff in pull request #418: [INLONG-409][Sort] Add guide doc for the mongodb-cdc connector

Posted by GitBox <gi...@apache.org>.
gong commented on code in PR #418:
URL: https://github.com/apache/incubator-inlong-website/pull/418#discussion_r899714971


##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -3,14 +3,14 @@ title: MongoDB-CDC
 sidebar_position: 7
 ---
 
-## MongoDB-CDC Extract节点
+## MongoDB-CDC Extract 节点
 
 MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
 
 ## 支持的版本
-| Extract节点                     | 版本                                         |
-| ------------------------------- | -------------------------------------------- |
-| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+| Extract节点                     | 版本                                          |
+| ------------------------------- | --------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): `>=` 3.6 |

Review Comment:
   `mongodb-cdc` change to `MongoDB-CDC`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] yunqingmoswu commented on a diff in pull request #418: [INLONG-409][Sort] Add guide doc for the mongodb-cdc connector

Posted by GitBox <gi...@apache.org>.
yunqingmoswu commented on code in PR #418:
URL: https://github.com/apache/incubator-inlong-website/pull/418#discussion_r899712806


##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract 节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract 节点                     | 版本                                          |
+| ------------------------------- | --------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): `>=` 3.6 |

Review Comment:
   [mongodb-cdc] -> [MongoDB-CDC]



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract 节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract 节点                     | 版本                                          |
+| ------------------------------- | --------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): `>=` 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```xml
+<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>sort-connector-mongodb-cdc</artifactId>
+    <!-- 选择你使用的 inlong 的版本 -->
+    <version>inlong_version</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 \>= 3.6
+  我们使用 [更改流](https://docs.mongodb.com/manual/changeStreams/)功能(3.6 版中的新功能)来捕获更改数据。
+
+- 集群部署
+
+  需要 [副本集](https://docs.mongodb.com/manual/replication/)或 [分片集群](https://docs.mongodb.com/manual/sharding/)。
+
+- 存储引擎
+
+  需要 [WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger)存储引擎。
+
+- [副本集协议版本](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  需要副本集协议版本 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)。
+  从版本 4.0 开始,MongoDB 仅支持 pv1。pv1 是使用 MongoDB 3.2 或更高版本创建的所有新副本集的默认值。
+
+- 特权
+
+  `changeStream` MongoDB Kafka 连接器 `read` 需要权限。
+
+  您可以使用以下示例进行简单授权。
+  更详细的授权请参考 [MongoDB 数据库用户角色](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles)。
+
+  ```shell
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, // read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } // for snapshot reading
+    ]
+  });
+  ```
+
+## 如何创建 MongoDB Extract 节点
+
+### SQL API 用法
+
+这个例子展示了如何使用 `Flink SQL` 创建一个 MongoDB Extract 节点:
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a MySQL table 'mongodb_extract_node' in Flink SQL
+Flink SQL> CREATE TABLE mongodb_extract_node (
+  _id STRING, // must be declared
+  name STRING,
+  weight DECIMAL(10,3),
+  tags ARRAY<STRING>, -- array
+  price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+  suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+  PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+  'connector' = 'mongodb-cdc',
+  'hosts' = 'localhost:27017,localhost:27018,localhost:27019',
+  'username' = 'flinkuser',
+  'password' = 'flinkpw',
+  'database' = 'inventory',
+  'collection' = 'mongodb_extract_node'
+);
+
+-- Read snapshot and binlogs from mongodb_extract_node
+Flink SQL> SELECT * FROM mongodb_extract_node;
+```
+
+**注意**
+
+MongoDB 的更改事件记录在消息之前没有更新。所以,我们只能将其转换为 Flink 的 UPSERT 变更日志流。UPSERT 流需要唯一键,因此我们必须声明 `_id` 为主键。我们不能将其他列声明为主键,因为删除操作不包含除 `_id` 和 `sharding key` 之外的键和值。
+
+### InLong Dashboard 用法
+
+TODO: 未来会支持
+
+### InLong Manager 用法
+
+TODO: 未来会支持
+
+## MongoDB Extract 节点选项

Review Comment:
   选项 -> 参数



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract节点                     | 版本                                         |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```xml
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 \>= 3.6

Review Comment:
   \>  -> `&gt; `



##########
docs/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,197 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract Node
+
+The MongoDB CDC connector allows for reading snapshot data and incremental data from MongoDB. This document describes how to setup the MongoDB CDC connector to run SQL queries against MongoDB.
+
+## Supported Version
+| Extract Node                    | Version                                      |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): `>=` 3.6 |
+
+## Dependencies
+
+In order to setup the MongoDB CDC connector, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.
+
+### Maven dependency
+
+```xml
+<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>sort-connector-mongodb-cdc</artifactId>
+    <!-- select inlong version -->
+    <version>inlong_version</version>
+</dependency>
+```
+
+## Setup MongoDB
+
+### Availability
+
+- MongoDB version
+
+  MongoDB version \>= 3.6
+  We use [change streams](https://docs.mongodb.com/manual/changeStreams/) feature (new in version 3.6) to capture change data.
+
+- Cluster Deployment
+
+  [replica sets](https://docs.mongodb.com/manual/replication/) or [sharded clusters](https://docs.mongodb.com/manual/sharding/) is required.
+
+- Storage Engine
+
+  [WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger) storage engine is required.
+
+- [Replica set protocol version](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  Replica set protocol version 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion) is required.
+  Starting in version 4.0, MongoDB only supports pv1. pv1 is the default for all new replica sets created with MongoDB 3.2 or later.
+
+- Privileges
+
+  `changeStream` and `read` privileges are required by MongoDB Kafka Connector.
+
+  You can use the following example for simple authorization.
+  For more detailed authorization, please refer to [MongoDB Database User Roles](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles).
+
+  ```shell
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, // read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } // for snapshot reading
+    ]
+  });
+  ```
+
+## How to create a MongoDB Extract Node
+
+### Usage for SQL API
+
+The example below shows how to create an MongoDB Extract Node with `Flink SQL` :
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a MySQL table 'mongodb_extract_node' in Flink SQL
+Flink SQL> CREATE TABLE mongodb_extract_node (
+  _id STRING, // must be declared
+  name STRING,
+  weight DECIMAL(10,3),
+  tags ARRAY<STRING>, -- array
+  price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+  suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+  PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+  'connector' = 'mongodb-cdc',
+  'hosts' = 'localhost:27017,localhost:27018,localhost:27019',
+  'username' = 'flinkuser',
+  'password' = 'flinkpw',
+  'database' = 'inventory',
+  'collection' = 'mongodb_extract_node'
+);
+
+-- Read snapshot and binlogs from mongodb_extract_node
+Flink SQL> SELECT * FROM mongodb_extract_node;
+```
+
+**Note that**

Review Comment:
   Note that -> Note



##########
docs/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,197 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract Node
+
+The MongoDB CDC connector allows for reading snapshot data and incremental data from MongoDB. This document describes how to setup the MongoDB CDC connector to run SQL queries against MongoDB.
+
+## Supported Version
+| Extract Node                    | Version                                      |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): `>=` 3.6 |

Review Comment:
   `>=` -> `&gt;=`



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract 节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract 节点                     | 版本                                          |
+| ------------------------------- | --------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): `>=` 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```xml
+<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>sort-connector-mongodb-cdc</artifactId>
+    <!-- 选择你使用的 inlong 的版本 -->
+    <version>inlong_version</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本

Review Comment:
   MongoDB后面加个空格



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] gong commented on a diff in pull request #418: [INLONG-409][Sort]add md for mongodb-cdc

Posted by GitBox <gi...@apache.org>.
gong commented on code in PR #418:
URL: https://github.com/apache/incubator-inlong-website/pull/418#discussion_r899080370


##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc-ch.md:
##########
@@ -0,0 +1,193 @@
+---
+title: MongoDB-CDC
+sidebar_position: 7
+---
+
+## MongoDB-CDC Extract节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract节点                     | 版本                                         |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 >= 3.6
+  我们使用[更改流](https://docs.mongodb.com/manual/changeStreams/)功能(3.6 版中的新功能)来捕获更改数据。
+
+- 集群部署
+
+  [需要副本集](https://docs.mongodb.com/manual/replication/)或[分片集群](https://docs.mongodb.com/manual/sharding/)。
+
+- 存储引擎
+
+  [需要WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger)存储引擎。
+
+- [副本集协议版本](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  需要副本集协议版本 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)。
+  从版本 4.0 开始,MongoDB 仅支持 pv1。pv1 是使用 MongoDB 3.2 或更高版本创建的所有新副本集的默认值。
+
+- 特权
+
+  `changeStream`MongoDB Kafka 连接器`read`需要权限。
+
+  您可以使用以下示例进行简单授权。
+  更详细的授权请参考[MongoDB 数据库用户角色](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles)。
+
+  ```json
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, //read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } //for snapshot reading
+    ]
+  });
+  ```
+
+## 如何创建 MongoDB Extract节点
+
+### SQL API的使用方法
+
+这个例子展示了如何使用`Flink SQL` 创建一个MongoDB Extract节点:
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a MySQL table 'mongodb_extract_node' in Flink SQL
+Flink SQL> CREATE TABLE mongodb_extract_node (
+  _id STRING, // must be declared
+  name STRING,
+  weight DECIMAL(10,3),
+  tags ARRAY<STRING>, -- array
+  price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+  suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+  PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+  'connector' = 'mongodb-cdc',
+  'hosts' = 'localhost:27017,localhost:27018,localhost:27019',
+  'username' = 'flinkuser',
+  'password' = 'flinkpw',
+  'database' = 'inventory',
+  'collection' = 'mongodb_extract_node'
+);
+
+-- Read snapshot and binlogs from mongodb_extract_node
+Flink SQL> SELECT * FROM mongodb_extract_node;
+```
+
+**注意**
+
+MongoDB 的更改事件记录在消息之前没有更新。所以,我们只能将其转换为 Flink 的 UPSERT 变更日志流。upsert 流需要唯一键,因此我们必须声明`_id`为主键。我们不能将其他列声明为主键,因为删除操作不包含除`_id`和`sharding key` 之外的键和值。
+
+### 在 InLong Dashboard的使用方法
+
+TODO: 将会支持
+
+### 在InLong Manager客户端的使用方法
+
+TODO:将会支持
+
+## MongoDB Extract节点选项
+
+| **选项**                  | **是否必须** | **默认**   | **类型** | **描述**                                                     |
+| ------------------------- | ------------ | ---------- | -------- | ------------------------------------------------------------ |
+| connector                 | 必须         | (none)     | String   | 指定要使用的连接器,这里应该是`mongodb-cdc`.                 |
+| hosts                     | 必须         | (none)     | String   | MongoDB 服务器的主机名和端口对的逗号分隔列表。例如。`localhost:27017,localhost:27018` |
+| username                  | 可选         | (none)     | String   | 连接到 MongoDB 时要使用的数据库用户的名称。<br/>仅当 MongoDB 配置为使用身份验证时才需要这样做。 |
+| password                  | 可选         | (none)     | String   | 连接 MongoDB 时使用的密码。<br/>仅当 MongoDB 配置为使用身份验证时才需要这样做。 |
+| database                  | 必须         | (none)     | String   | 要监视更改的数据库的名称。                                   |
+| collection                | 必须         | (none)     | String   | 数据库中要监视更改的集合的名称。                             |
+| connection.options        | 可选         | (none)     | String   | MongoDB的 & 分隔[连接选项](https://docs.mongodb.com/manual/reference/connection-string/#std-label-connections-connection-options)。例如。<br/>`replicaSet=test&connectTimeoutMS=300000` |
+| errors.tolerance          | 可选         | none       | String   | 如果遇到错误,是否继续处理消息。接受`none`或`all`。设置为`none`时,连接器会报告错误并在遇到错误时阻止对其余记录的进一步处理。设置为`all`时,连接器会静默忽略任何错误消息。 |
+| errors.log.enable         | 可选         | true       | Boolean  | 是否应将失败操作的详细信息写入日志文件。                     |
+| copy.existing             | 可选         | true       | Boolean  | 是否从源集合中复制现有数据。                                 |
+| copy.existing.pipeline    | 可选         | (none)     | String   | 一组 JSON 对象,描述在复制现有数据时要运行的管道操作。<br/>这可以提高复制管理器对索引的使用,并使复制更有效。例如。`[{"$match": {"closed": "false"}}]`确保仅复制已关闭字段设置为 false 的文档。 |
+| copy.existing.max.threads | 可选         | 处理器数量 | Integer  | 执行数据复制时使用的线程数。                                 |
+| copy.existing.queue.size  | 可选         | 16000      | Integer  | 执行数据复制时使用的线程数。                                 |
+| poll.max.batch.size       | 可选         | 1000       | Integer  | 轮询新数据时,单个批次中包含的最大更改流文档数。             |
+| poll.await.time.ms        | 可选         | 1500       | Integer  | 在更改流上检查新结果之前等待的时间量。                       |
+| heartbeat.interval.ms     | 可选         | 0          | Integer  | 发送心跳消息之间的时间长度(以毫秒为单位)。使用 0 禁用。    |
+
+## 可用元数据
+
+以下格式元数据可以作为表定义中的只读 (VIRTUAL) 列公开。
+
+| Key             | 数据类型                  | 描述                                                         |
+| --------------- | ------------------------- | ------------------------------------------------------------ |
+| database_name   | STRING NOT NULL           | 包含该行的数据库的名称。                                     |
+| collection_name | STRING NOT NULL           | 包含该行的集合的名称。                                       |
+| op_ts           | TIMESTAMP_LTZ(3) NOT NULL | 它指示在数据库中进行更改的时间。<br/>如果记录是从表的快照而不是更改流中读取的,则该值始终为 0。 |
+
+
+扩展的 CREATE TABLE 示例演示了公开这些元数据字段的语法:
+```sql
+CREATE TABLE `mysql_extract_node` (
+    db_name STRING METADATA FROM 'database_name' VIRTUAL,
+    table_name STRING METADATA  FROM 'table_name' VIRTUAL,

Review Comment:
   `tabe_name` should be `collection_name`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] gong commented on a diff in pull request #418: [INLONG-409][Sort] Add guide doc for the mongodb-cdc connector

Posted by GitBox <gi...@apache.org>.
gong commented on code in PR #418:
URL: https://github.com/apache/incubator-inlong-website/pull/418#discussion_r899714631


##########
docs/data_node/extract_node/mongodb-cdc.md:
##########
@@ -10,20 +10,20 @@ The MongoDB CDC connector allows for reading snapshot data and incremental data
 ## Supported Version
 | Extract Node                    | Version                                      |
 | ------------------------------- | -------------------------------------------- |
-| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): `>=` 3.6 |

Review Comment:
   `mongodb-cdc` change to `MongoDB-CDC`. keep consistent with title



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] gong commented on a diff in pull request #418: [INLONG-409][Sort]add md for mongodb-cdc

Posted by GitBox <gi...@apache.org>.
gong commented on code in PR #418:
URL: https://github.com/apache/incubator-inlong-website/pull/418#discussion_r899078719


##########
docs/data_node/extract_node/mongodb-cdc-en.md:
##########
@@ -0,0 +1,197 @@
+---
+title: MongoDB-CDC
+sidebar_position: 7
+---
+
+## MongoDB-CDC Extract Node
+
+The MongoDB CDC connector allows for reading snapshot data and incremental data from MongoDB. This document describes how to setup the MongoDB CDC connector to run SQL queries against MongoDB.
+
+## Supported Version
+| Extract Node                    | Version                                      |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## Dependencies
+
+In order to setup the MongoDB CDC connector, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.
+
+### Maven dependency
+
+```
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## Setup MongoDB
+
+### Availability
+
+- MongoDB version
+
+  MongoDB version >= 3.6
+  We use [change streams](https://docs.mongodb.com/manual/changeStreams/) feature (new in version 3.6) to capture change data.
+
+- Cluster Deployment
+
+  [replica sets](https://docs.mongodb.com/manual/replication/) or [sharded clusters](https://docs.mongodb.com/manual/sharding/) is required.
+
+- Storage Engine
+
+  [WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger) storage engine is required.
+
+- [Replica set protocol version](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  Replica set protocol version 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion) is required.
+  Starting in version 4.0, MongoDB only supports pv1. pv1 is the default for all new replica sets created with MongoDB 3.2 or later.
+
+- Privileges
+
+  `changeStream` and `read` privileges are required by MongoDB Kafka Connector.
+
+  You can use the following example for simple authorization.
+  For more detailed authorization, please refer to [MongoDB Database User Roles](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles).
+
+  ```json
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, //read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } //for snapshot reading
+    ]
+  });
+  ```
+
+## How to create a MongoDB Extract Node
+
+### Usage for SQL API
+
+The example below shows how to create an MongoDB Extract Node with `Flink SQL` :
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a MySQL table 'mongodb_extract_node' in Flink SQL
+Flink SQL> CREATE TABLE mongodb_extract_node (
+  _id STRING, // must be declared
+  name STRING,
+  weight DECIMAL(10,3),
+  tags ARRAY<STRING>, -- array
+  price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+  suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+  PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+  'connector' = 'mongodb-cdc',
+  'hosts' = 'localhost:27017,localhost:27018,localhost:27019',
+  'username' = 'flinkuser',
+  'password' = 'flinkpw',
+  'database' = 'inventory',
+  'collection' = 'mongodb_extract_node'
+);
+
+-- Read snapshot and binlogs from mongodb_extract_node
+Flink SQL> SELECT * FROM mongodb_extract_node;
+```
+
+**Note that**
+
+MongoDB’s change event record doesn’t have update before message. So, we can only convert it to Flink’s UPSERT changelog stream. An upsert stream requires a unique key, so we must declare `_id` as primary key. We can’t declare other column as primary key, becauce delete operation do not contain’s the key and value besides `_id` and `sharding key`. 
+
+### Usage for InLong Dashboard
+
+TODO: It will be supported in the future.
+
+### Usage for InLong Manager Client
+
+TODO: It will be supported in the future.
+
+## MongoDB Extract Node Options
+
+| **Option**                | **Required** | **Default**      | **Type** | **Description**                                              |
+| ------------------------- | ------------ | ---------------- | -------- | ------------------------------------------------------------ |
+| connector                 | required     | (none)           | String   | Specify what connector to use, here should be `mongodb-cdc`. |
+| hosts                     | required     | (none)           | String   | The comma-separated list of hostname and port pairs of the MongoDB servers. eg. `localhost:27017,localhost:27018` |
+| username                  | optional     | (none)           | String   | Name of the database user to be used when connecting to MongoDB. This is required only when MongoDB is configured to use authentication. |
+| password                  | optional     | (none)           | String   | Password to be used when connecting to MongoDB. This is required only when MongoDB is configured to use authentication. |
+| database                  | required     | (none)           | String   | Name of the database to watch for changes.                   |
+| collection                | required     | (none)           | String   | Name of the collection in the database to watch for changes. |
+| connection.options        | optional     | (none)           | String   | The ampersand-separated [connection options](https://docs.mongodb.com/manual/reference/connection-string/#std-label-connections-connection-options) of MongoDB. eg. `replicaSet=test&connectTimeoutMS=300000` |
+| errors.tolerance          | optional     | none             | String   | Whether to continue processing messages if an error is encountered. Accept `none` or `all`. When set to `none`, the connector reports an error and blocks further processing of the rest of the records when it encounters an error. When set to `all`, the connector silently ignores any bad messages. |
+| errors.log.enable         | optional     | true             | Boolean  | Whether details of failed operations should be written to the log file. |
+| copy.existing             | optional     | true             | Boolean  | Whether copy existing data from source collections.          |
+| copy.existing.pipeline    | optional     | (none)           | String   | An array of JSON objects describing the pipeline operations to run when copying existing data. This can improve the use of indexes by the copying manager and make copying more efficient. eg. `[{"$match": {"closed": "false"}}]` ensures that only documents in which the closed field is set to false are copied. |
+| copy.existing.max.threads | optional     | Processors Count | Integer  | The number of threads to use when performing the data copy.  |
+| copy.existing.queue.size  | optional     | 16000            | Integer  | The max size of the queue to use when copying data.          |
+| poll.max.batch.size       | optional     | 1000             | Integer  | Maximum number of change stream documents to include in a single batch when polling for new data. |
+| poll.await.time.ms        | optional     | 1500             | Integer  | The amount of time to wait before checking for new results on the change stream. |
+| heartbeat.interval.ms     | optional     | 0                | Integer  | The length of time in milliseconds between sending heartbeat messages. Use 0 to disa |
+
+
+## Available Metadata
+
+The following format metadata can be exposed as read-only (VIRTUAL) columns in a table definition.
+
+| Key             | DataType                  | Description                                                  |
+| --------------- | ------------------------- | ------------------------------------------------------------ |
+| database_name   | STRING NOT NULL           | Name of the database that contain the row.                   |
+| collection_name | STRING NOT NULL           | Name of the collection that contain the row.                 |
+| op_ts           | TIMESTAMP_LTZ(3) NOT NULL | It indicates the time that the change was made in the database. If the record is read from snapshot of the table instead of the change stream, the value is always 0. |
+
+
+The extended CREATE TABLE example demonstrates the syntax for exposing these metadata fields:
+```sql
+CREATE TABLE `mysql_extract_node` (
+    db_name STRING METADATA FROM 'database_name' VIRTUAL,
+    table_name STRING METADATA  FROM 'table_name' VIRTUAL,
+    operation_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL,
+    _id STRING, // must be declared
+    name STRING,
+    weight DECIMAL(10,3),
+    tags ARRAY<STRING>, -- array
+    price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+    suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+    PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+      'connector' = 'mongodb-cdc', 
+      'hostname' = 'YourHostname',
+      'username' = 'YourUsername',
+      'password' = 'YourPassword',
+      'database-name' = 'YourDatabase',
+      'table-name' = 'YourTable' 

Review Comment:
   `database-name` should be `database`.
   `table-name` should be `collection`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] yunqingmoswu commented on a diff in pull request #418: [INLONG-409][Sort]add md for mongodb-cdc

Posted by GitBox <gi...@apache.org>.
yunqingmoswu commented on code in PR #418:
URL: https://github.com/apache/incubator-inlong-website/pull/418#discussion_r898796924


##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract节点

Review Comment:
   英文前后空格



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract节点                     | 版本                                         |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```xml
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 \>= 3.6

Review Comment:
   \> 转义下



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract节点                     | 版本                                         |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```xml
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 \>= 3.6
+  我们使用[更改流](https://docs.mongodb.com/manual/changeStreams/)功能(3.6 版中的新功能)来捕获更改数据。
+
+- 集群部署
+
+  [需要副本集](https://docs.mongodb.com/manual/replication/)或[分片集群](https://docs.mongodb.com/manual/sharding/)。
+
+- 存储引擎
+
+  [需要WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger)存储引擎。
+
+- [副本集协议版本](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  需要副本集协议版本 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)。
+  从版本 4.0 开始,MongoDB 仅支持 pv1。pv1 是使用 MongoDB 3.2 或更高版本创建的所有新副本集的默认值。
+
+- 特权
+
+  `changeStream`MongoDB Kafka 连接器`read`需要权限。
+
+  您可以使用以下示例进行简单授权。
+  更详细的授权请参考[MongoDB 数据库用户角色](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles)。
+
+  ```json
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, //read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } //for snapshot reading
+    ]
+  });
+  ```
+
+## 如何创建 MongoDB Extract节点
+
+### SQL API的使用方法
+
+这个例子展示了如何使用`Flink SQL` 创建一个MongoDB Extract节点:
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a MySQL table 'mongodb_extract_node' in Flink SQL
+Flink SQL> CREATE TABLE mongodb_extract_node (
+  _id STRING, // must be declared
+  name STRING,
+  weight DECIMAL(10,3),
+  tags ARRAY<STRING>, -- array
+  price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+  suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+  PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+  'connector' = 'mongodb-cdc',
+  'hosts' = 'localhost:27017,localhost:27018,localhost:27019',
+  'username' = 'flinkuser',
+  'password' = 'flinkpw',
+  'database' = 'inventory',
+  'collection' = 'mongodb_extract_node'
+);
+
+-- Read snapshot and binlogs from mongodb_extract_node
+Flink SQL> SELECT * FROM mongodb_extract_node;
+```
+
+**注意**
+
+MongoDB 的更改事件记录在消息之前没有更新。所以,我们只能将其转换为 Flink 的 UPSERT 变更日志流。upsert 流需要唯一键,因此我们必须声明`_id`为主键。我们不能将其他列声明为主键,因为删除操作不包含除`_id`和`sharding key` 之外的键和值。
+
+### 在 InLong Dashboard的使用方法
+
+TODO: 将会支持
+
+### 在InLong Manager客户端的使用方法

Review Comment:
   在InLong Manager客户端的使用方法 -> InLong Manager 用法



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract节点                     | 版本                                         |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```xml
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 \>= 3.6
+  我们使用[更改流](https://docs.mongodb.com/manual/changeStreams/)功能(3.6 版中的新功能)来捕获更改数据。
+
+- 集群部署
+
+  [需要副本集](https://docs.mongodb.com/manual/replication/)或[分片集群](https://docs.mongodb.com/manual/sharding/)。
+
+- 存储引擎
+
+  [需要WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger)存储引擎。
+
+- [副本集协议版本](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  需要副本集协议版本 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)。
+  从版本 4.0 开始,MongoDB 仅支持 pv1。pv1 是使用 MongoDB 3.2 或更高版本创建的所有新副本集的默认值。
+
+- 特权
+
+  `changeStream`MongoDB Kafka 连接器`read`需要权限。
+
+  您可以使用以下示例进行简单授权。
+  更详细的授权请参考[MongoDB 数据库用户角色](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles)。
+
+  ```json
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, //read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } //for snapshot reading
+    ]
+  });
+  ```
+
+## 如何创建 MongoDB Extract节点
+
+### SQL API的使用方法
+
+这个例子展示了如何使用`Flink SQL` 创建一个MongoDB Extract节点:
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a MySQL table 'mongodb_extract_node' in Flink SQL
+Flink SQL> CREATE TABLE mongodb_extract_node (
+  _id STRING, // must be declared
+  name STRING,
+  weight DECIMAL(10,3),
+  tags ARRAY<STRING>, -- array
+  price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+  suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+  PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+  'connector' = 'mongodb-cdc',
+  'hosts' = 'localhost:27017,localhost:27018,localhost:27019',
+  'username' = 'flinkuser',
+  'password' = 'flinkpw',
+  'database' = 'inventory',
+  'collection' = 'mongodb_extract_node'
+);
+
+-- Read snapshot and binlogs from mongodb_extract_node
+Flink SQL> SELECT * FROM mongodb_extract_node;
+```
+
+**注意**
+
+MongoDB 的更改事件记录在消息之前没有更新。所以,我们只能将其转换为 Flink 的 UPSERT 变更日志流。upsert 流需要唯一键,因此我们必须声明`_id`为主键。我们不能将其他列声明为主键,因为删除操作不包含除`_id`和`sharding key` 之外的键和值。
+
+### 在 InLong Dashboard的使用方法
+
+TODO: 将会支持
+
+### 在InLong Manager客户端的使用方法
+
+TODO:将会支持
+
+## MongoDB Extract节点选项
+
+| **选项**                  | **是否必须** | **默认**   | **类型** | **描述**                                                     |
+| ------------------------- | ------------ | ---------- | -------- | ------------------------------------------------------------ |
+| connector                 | 必须         | (none)     | String   | 指定要使用的连接器,这里应该是`mongodb-cdc`.                 |
+| hosts                     | 必须         | (none)     | String   | MongoDB 服务器的主机名和端口对的逗号分隔列表。例如。`localhost:27017,localhost:27018` |
+| username                  | 可选         | (none)     | String   | 连接到 MongoDB 时要使用的数据库用户的名称。仅当 MongoDB 配置为使用身份验证时才需要这样做。 |
+| password                  | 可选         | (none)     | String   | 连接 MongoDB 时使用的密码。仅当 MongoDB 配置为使用身份验证时才需要这样做。 |
+| database                  | 必须         | (none)     | String   | 要监视更改的数据库的名称。                                   |
+| collection                | 必须         | (none)     | String   | 数据库中要监视更改的集合的名称。                             |
+| connection.options        | 可选         | (none)     | String   | MongoDB的 & 分隔[连接选项](https://docs.mongodb.com/manual/reference/connection-string/#std-label-connections-connection-options)。例如。`replicaSet=test&connectTimeoutMS=300000` |
+| errors.tolerance          | 可选         | none       | String   | 如果遇到错误,是否继续处理消息。接受`none`或`all`。设置为`none`时,连接器会报告错误并在遇到错误时阻止对其余记录的进一步处理。设置为`all`时,连接器会静默忽略任何错误消息。 |
+| errors.log.enable         | 可选         | true       | Boolean  | 是否应将失败操作的详细信息写入日志文件。                     |
+| copy.existing             | 可选         | true       | Boolean  | 是否从源集合中复制现有数据。                                 |
+| copy.existing.pipeline    | 可选         | (none)     | String   | 一组 JSON 对象,描述在复制现有数据时要运行的管道操作。这可以提高复制管理器对索引的使用,并使复制更有效。例如。`[{"$match": {"closed": "false"}}]`确保仅复制已关闭字段设置为 false 的文档。 |
+| copy.existing.max.threads | 可选         | 处理器数量 | Integer  | 执行数据复制时使用的线程数。                                 |
+| copy.existing.queue.size  | 可选         | 16000      | Integer  | 执行数据复制时使用的线程数。                                 |
+| poll.max.batch.size       | 可选         | 1000       | Integer  | 轮询新数据时,单个批次中包含的最大更改流文档数。             |
+| poll.await.time.ms        | 可选         | 1500       | Integer  | 在更改流上检查新结果之前等待的时间量。                       |
+| heartbeat.interval.ms     | 可选         | 0          | Integer  | 发送心跳消息之间的时间长度(以毫秒为单位)。使用 0 禁用。    |
+
+## 可用元数据
+
+以下格式元数据可以作为表定义中的只读 (VIRTUAL) 列公开。
+
+| Key             | 数据类型                  | 描述                                                         |
+| --------------- | ------------------------- | ------------------------------------------------------------ |
+| database_name   | STRING NOT NULL           | 包含该行的数据库的名称。                                     |
+| collection_name | STRING NOT NULL           | 包含该行的集合的名称。                                       |
+| op_ts           | TIMESTAMP_LTZ(3) NOT NULL | 它指示在数据库中进行更改的时间。如果记录是从表的快照而不是更改流中读取的,则该值始终为 0。 |
+
+
+扩展的 CREATE TABLE 示例演示了公开这些元数据字段的语法:
+```sql
+CREATE TABLE `mysql_extract_node` (
+    db_name STRING METADATA FROM 'database_name' VIRTUAL,
+    table_name STRING METADATA  FROM 'table_name' VIRTUAL,
+    operation_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL,
+    _id STRING, // must be declared
+    name STRING,
+    weight DECIMAL(10,3),
+    tags ARRAY<STRING>, -- array
+    price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+    suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+    PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+      'connector' = 'mongodb-cdc', 
+      'hostname' = 'YourHostname',
+      'username' = 'YourUsername',
+      'password' = 'YourPassword',
+      'database-name' = 'YourDatabase',
+      'table-name' = 'YourTable' 
+);
+```
+
+## 数据类型映射
+
+| BSON类型                                                     | Flink SQL类型                                                |

Review Comment:
   英文前后空格



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract节点                     | 版本                                         |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```xml
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 \>= 3.6
+  我们使用[更改流](https://docs.mongodb.com/manual/changeStreams/)功能(3.6 版中的新功能)来捕获更改数据。
+
+- 集群部署
+
+  [需要副本集](https://docs.mongodb.com/manual/replication/)或[分片集群](https://docs.mongodb.com/manual/sharding/)。
+
+- 存储引擎
+
+  [需要WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger)存储引擎。
+
+- [副本集协议版本](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  需要副本集协议版本 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)。
+  从版本 4.0 开始,MongoDB 仅支持 pv1。pv1 是使用 MongoDB 3.2 或更高版本创建的所有新副本集的默认值。
+
+- 特权
+
+  `changeStream`MongoDB Kafka 连接器`read`需要权限。
+
+  您可以使用以下示例进行简单授权。
+  更详细的授权请参考[MongoDB 数据库用户角色](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles)。
+
+  ```json
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, //read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } //for snapshot reading
+    ]
+  });
+  ```
+
+## 如何创建 MongoDB Extract节点
+
+### SQL API的使用方法

Review Comment:
   SQL API的使用方法 -> SQL API 用法



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract节点                     | 版本                                         |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```xml
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 \>= 3.6
+  我们使用[更改流](https://docs.mongodb.com/manual/changeStreams/)功能(3.6 版中的新功能)来捕获更改数据。
+
+- 集群部署
+
+  [需要副本集](https://docs.mongodb.com/manual/replication/)或[分片集群](https://docs.mongodb.com/manual/sharding/)。
+
+- 存储引擎
+
+  [需要WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger)存储引擎。
+
+- [副本集协议版本](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  需要副本集协议版本 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)。
+  从版本 4.0 开始,MongoDB 仅支持 pv1。pv1 是使用 MongoDB 3.2 或更高版本创建的所有新副本集的默认值。
+
+- 特权
+
+  `changeStream`MongoDB Kafka 连接器`read`需要权限。
+
+  您可以使用以下示例进行简单授权。
+  更详细的授权请参考[MongoDB 数据库用户角色](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles)。
+
+  ```json
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, //read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } //for snapshot reading
+    ]
+  });
+  ```
+
+## 如何创建 MongoDB Extract节点

Review Comment:
   英文前后空格



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract节点                     | 版本                                         |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```xml
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 \>= 3.6
+  我们使用[更改流](https://docs.mongodb.com/manual/changeStreams/)功能(3.6 版中的新功能)来捕获更改数据。
+
+- 集群部署
+
+  [需要副本集](https://docs.mongodb.com/manual/replication/)或[分片集群](https://docs.mongodb.com/manual/sharding/)。
+
+- 存储引擎
+
+  [需要WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger)存储引擎。
+
+- [副本集协议版本](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  需要副本集协议版本 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)。
+  从版本 4.0 开始,MongoDB 仅支持 pv1。pv1 是使用 MongoDB 3.2 或更高版本创建的所有新副本集的默认值。
+
+- 特权
+
+  `changeStream`MongoDB Kafka 连接器`read`需要权限。
+
+  您可以使用以下示例进行简单授权。
+  更详细的授权请参考[MongoDB 数据库用户角色](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles)。
+
+  ```json
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, //read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } //for snapshot reading
+    ]
+  });
+  ```
+
+## 如何创建 MongoDB Extract节点
+
+### SQL API的使用方法
+
+这个例子展示了如何使用`Flink SQL` 创建一个MongoDB Extract节点:
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a MySQL table 'mongodb_extract_node' in Flink SQL
+Flink SQL> CREATE TABLE mongodb_extract_node (
+  _id STRING, // must be declared
+  name STRING,
+  weight DECIMAL(10,3),
+  tags ARRAY<STRING>, -- array
+  price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+  suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+  PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+  'connector' = 'mongodb-cdc',
+  'hosts' = 'localhost:27017,localhost:27018,localhost:27019',
+  'username' = 'flinkuser',
+  'password' = 'flinkpw',
+  'database' = 'inventory',
+  'collection' = 'mongodb_extract_node'
+);
+
+-- Read snapshot and binlogs from mongodb_extract_node
+Flink SQL> SELECT * FROM mongodb_extract_node;
+```
+
+**注意**
+
+MongoDB 的更改事件记录在消息之前没有更新。所以,我们只能将其转换为 Flink 的 UPSERT 变更日志流。upsert 流需要唯一键,因此我们必须声明`_id`为主键。我们不能将其他列声明为主键,因为删除操作不包含除`_id`和`sharding key` 之外的键和值。

Review Comment:
   UPSERT 和 upsert保持统一



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract节点                     | 版本                                         |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```xml
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 \>= 3.6
+  我们使用[更改流](https://docs.mongodb.com/manual/changeStreams/)功能(3.6 版中的新功能)来捕获更改数据。
+
+- 集群部署
+
+  [需要副本集](https://docs.mongodb.com/manual/replication/)或[分片集群](https://docs.mongodb.com/manual/sharding/)。
+
+- 存储引擎
+
+  [需要WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger)存储引擎。
+
+- [副本集协议版本](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  需要副本集协议版本 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)。
+  从版本 4.0 开始,MongoDB 仅支持 pv1。pv1 是使用 MongoDB 3.2 或更高版本创建的所有新副本集的默认值。
+
+- 特权
+
+  `changeStream`MongoDB Kafka 连接器`read`需要权限。
+
+  您可以使用以下示例进行简单授权。
+  更详细的授权请参考[MongoDB 数据库用户角色](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles)。
+
+  ```json
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, //read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } //for snapshot reading
+    ]
+  });
+  ```
+
+## 如何创建 MongoDB Extract节点
+
+### SQL API的使用方法
+
+这个例子展示了如何使用`Flink SQL` 创建一个MongoDB Extract节点:
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a MySQL table 'mongodb_extract_node' in Flink SQL
+Flink SQL> CREATE TABLE mongodb_extract_node (
+  _id STRING, // must be declared
+  name STRING,
+  weight DECIMAL(10,3),
+  tags ARRAY<STRING>, -- array
+  price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+  suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+  PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+  'connector' = 'mongodb-cdc',
+  'hosts' = 'localhost:27017,localhost:27018,localhost:27019',
+  'username' = 'flinkuser',
+  'password' = 'flinkpw',
+  'database' = 'inventory',
+  'collection' = 'mongodb_extract_node'
+);
+
+-- Read snapshot and binlogs from mongodb_extract_node
+Flink SQL> SELECT * FROM mongodb_extract_node;
+```
+
+**注意**
+
+MongoDB 的更改事件记录在消息之前没有更新。所以,我们只能将其转换为 Flink 的 UPSERT 变更日志流。upsert 流需要唯一键,因此我们必须声明`_id`为主键。我们不能将其他列声明为主键,因为删除操作不包含除`_id`和`sharding key` 之外的键和值。
+
+### 在 InLong Dashboard的使用方法
+
+TODO: 将会支持
+
+### 在InLong Manager客户端的使用方法
+
+TODO:将会支持
+
+## MongoDB Extract节点选项

Review Comment:
   英文前后空格



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract节点                     | 版本                                         |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```xml
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 \>= 3.6
+  我们使用[更改流](https://docs.mongodb.com/manual/changeStreams/)功能(3.6 版中的新功能)来捕获更改数据。
+
+- 集群部署
+
+  [需要副本集](https://docs.mongodb.com/manual/replication/)或[分片集群](https://docs.mongodb.com/manual/sharding/)。
+
+- 存储引擎
+
+  [需要WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger)存储引擎。
+
+- [副本集协议版本](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  需要副本集协议版本 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)。
+  从版本 4.0 开始,MongoDB 仅支持 pv1。pv1 是使用 MongoDB 3.2 或更高版本创建的所有新副本集的默认值。
+
+- 特权
+
+  `changeStream`MongoDB Kafka 连接器`read`需要权限。
+
+  您可以使用以下示例进行简单授权。
+  更详细的授权请参考[MongoDB 数据库用户角色](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles)。
+
+  ```json
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, //read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } //for snapshot reading
+    ]
+  });
+  ```
+
+## 如何创建 MongoDB Extract节点
+
+### SQL API的使用方法
+
+这个例子展示了如何使用`Flink SQL` 创建一个MongoDB Extract节点:

Review Comment:
   英文前后空格



##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc.md:
##########
@@ -1,4 +1,192 @@
 ---
 title: MongoDB-CDC
 sidebar_position: 7
----
\ No newline at end of file
+---
+
+## MongoDB-CDC Extract节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract节点                     | 版本                                         |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```xml
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 \>= 3.6
+  我们使用[更改流](https://docs.mongodb.com/manual/changeStreams/)功能(3.6 版中的新功能)来捕获更改数据。
+
+- 集群部署
+
+  [需要副本集](https://docs.mongodb.com/manual/replication/)或[分片集群](https://docs.mongodb.com/manual/sharding/)。
+
+- 存储引擎
+
+  [需要WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger)存储引擎。
+
+- [副本集协议版本](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  需要副本集协议版本 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)。
+  从版本 4.0 开始,MongoDB 仅支持 pv1。pv1 是使用 MongoDB 3.2 或更高版本创建的所有新副本集的默认值。
+
+- 特权
+
+  `changeStream`MongoDB Kafka 连接器`read`需要权限。
+
+  您可以使用以下示例进行简单授权。
+  更详细的授权请参考[MongoDB 数据库用户角色](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles)。
+
+  ```json
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, //read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } //for snapshot reading
+    ]
+  });
+  ```
+
+## 如何创建 MongoDB Extract节点
+
+### SQL API的使用方法
+
+这个例子展示了如何使用`Flink SQL` 创建一个MongoDB Extract节点:
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a MySQL table 'mongodb_extract_node' in Flink SQL
+Flink SQL> CREATE TABLE mongodb_extract_node (
+  _id STRING, // must be declared
+  name STRING,
+  weight DECIMAL(10,3),
+  tags ARRAY<STRING>, -- array
+  price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+  suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+  PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+  'connector' = 'mongodb-cdc',
+  'hosts' = 'localhost:27017,localhost:27018,localhost:27019',
+  'username' = 'flinkuser',
+  'password' = 'flinkpw',
+  'database' = 'inventory',
+  'collection' = 'mongodb_extract_node'
+);
+
+-- Read snapshot and binlogs from mongodb_extract_node
+Flink SQL> SELECT * FROM mongodb_extract_node;
+```
+
+**注意**
+
+MongoDB 的更改事件记录在消息之前没有更新。所以,我们只能将其转换为 Flink 的 UPSERT 变更日志流。upsert 流需要唯一键,因此我们必须声明`_id`为主键。我们不能将其他列声明为主键,因为删除操作不包含除`_id`和`sharding key` 之外的键和值。
+
+### 在 InLong Dashboard的使用方法

Review Comment:
   在 InLong Dashboard的使用方法 -> InLong Dashboard 用法



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong-website] gong commented on a diff in pull request #418: [INLONG-409][Sort]add md for mongodb-cdc

Posted by GitBox <gi...@apache.org>.
gong commented on code in PR #418:
URL: https://github.com/apache/incubator-inlong-website/pull/418#discussion_r899081107


##########
i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/mongodb-cdc-ch.md:
##########
@@ -0,0 +1,193 @@
+---
+title: MongoDB-CDC
+sidebar_position: 7
+---
+
+## MongoDB-CDC Extract节点
+
+MongoDB CDC 连接器允许从 MongoDB 读取快照数据和增量数据。本文档介绍如何设置 MongoDB CDC 连接器以对 MongoDB 运行 SQL 查询。
+
+## 支持的版本
+| Extract节点                     | 版本                                         |
+| ------------------------------- | -------------------------------------------- |
+| [mongodb-cdc](./mongodb-cdc.md) | [MongoDB](https://www.mongodb.com/): \>= 3.6 |
+
+## 依赖项
+
+I.为了设置 MongoDB CDC 连接器,下表提供了使用构建自动化工具(例如 Maven 或 SBT)的依赖关系信息
+
+### Maven依赖
+
+```
+<dependency>
+  <groupId>com.ververica</groupId>
+  <artifactId>flink-connector-mongodb-cdc</artifactId>
+  <!-- the dependency is available only for stable releases. -->
+  <version>2.1.1</version>
+</dependency>
+```
+
+## 设置 MongoDB
+
+### 可用性
+
+- MongoDB版本
+
+  MongoDB 版本 >= 3.6
+  我们使用[更改流](https://docs.mongodb.com/manual/changeStreams/)功能(3.6 版中的新功能)来捕获更改数据。
+
+- 集群部署
+
+  [需要副本集](https://docs.mongodb.com/manual/replication/)或[分片集群](https://docs.mongodb.com/manual/sharding/)。
+
+- 存储引擎
+
+  [需要WiredTiger](https://docs.mongodb.com/manual/core/wiredtiger/#std-label-storage-wiredtiger)存储引擎。
+
+- [副本集协议版本](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
+
+  需要副本集协议版本 1 [(pv1)](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)。
+  从版本 4.0 开始,MongoDB 仅支持 pv1。pv1 是使用 MongoDB 3.2 或更高版本创建的所有新副本集的默认值。
+
+- 特权
+
+  `changeStream`MongoDB Kafka 连接器`read`需要权限。
+
+  您可以使用以下示例进行简单授权。
+  更详细的授权请参考[MongoDB 数据库用户角色](https://docs.mongodb.com/manual/reference/built-in-roles/#database-user-roles)。
+
+  ```json
+  use admin;
+  db.createUser({
+    user: "flinkuser",
+    pwd: "flinkpw",
+    roles: [
+      { role: "read", db: "admin" }, //read role includes changeStream privilege 
+      { role: "readAnyDatabase", db: "admin" } //for snapshot reading
+    ]
+  });
+  ```
+
+## 如何创建 MongoDB Extract节点
+
+### SQL API的使用方法
+
+这个例子展示了如何使用`Flink SQL` 创建一个MongoDB Extract节点:
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a MySQL table 'mongodb_extract_node' in Flink SQL
+Flink SQL> CREATE TABLE mongodb_extract_node (
+  _id STRING, // must be declared
+  name STRING,
+  weight DECIMAL(10,3),
+  tags ARRAY<STRING>, -- array
+  price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+  suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+  PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+  'connector' = 'mongodb-cdc',
+  'hosts' = 'localhost:27017,localhost:27018,localhost:27019',
+  'username' = 'flinkuser',
+  'password' = 'flinkpw',
+  'database' = 'inventory',
+  'collection' = 'mongodb_extract_node'
+);
+
+-- Read snapshot and binlogs from mongodb_extract_node
+Flink SQL> SELECT * FROM mongodb_extract_node;
+```
+
+**注意**
+
+MongoDB 的更改事件记录在消息之前没有更新。所以,我们只能将其转换为 Flink 的 UPSERT 变更日志流。upsert 流需要唯一键,因此我们必须声明`_id`为主键。我们不能将其他列声明为主键,因为删除操作不包含除`_id`和`sharding key` 之外的键和值。
+
+### 在 InLong Dashboard的使用方法
+
+TODO: 将会支持
+
+### 在InLong Manager客户端的使用方法
+
+TODO:将会支持
+
+## MongoDB Extract节点选项
+
+| **选项**                  | **是否必须** | **默认**   | **类型** | **描述**                                                     |
+| ------------------------- | ------------ | ---------- | -------- | ------------------------------------------------------------ |
+| connector                 | 必须         | (none)     | String   | 指定要使用的连接器,这里应该是`mongodb-cdc`.                 |
+| hosts                     | 必须         | (none)     | String   | MongoDB 服务器的主机名和端口对的逗号分隔列表。例如。`localhost:27017,localhost:27018` |
+| username                  | 可选         | (none)     | String   | 连接到 MongoDB 时要使用的数据库用户的名称。<br/>仅当 MongoDB 配置为使用身份验证时才需要这样做。 |
+| password                  | 可选         | (none)     | String   | 连接 MongoDB 时使用的密码。<br/>仅当 MongoDB 配置为使用身份验证时才需要这样做。 |
+| database                  | 必须         | (none)     | String   | 要监视更改的数据库的名称。                                   |
+| collection                | 必须         | (none)     | String   | 数据库中要监视更改的集合的名称。                             |
+| connection.options        | 可选         | (none)     | String   | MongoDB的 & 分隔[连接选项](https://docs.mongodb.com/manual/reference/connection-string/#std-label-connections-connection-options)。例如。<br/>`replicaSet=test&connectTimeoutMS=300000` |
+| errors.tolerance          | 可选         | none       | String   | 如果遇到错误,是否继续处理消息。接受`none`或`all`。设置为`none`时,连接器会报告错误并在遇到错误时阻止对其余记录的进一步处理。设置为`all`时,连接器会静默忽略任何错误消息。 |
+| errors.log.enable         | 可选         | true       | Boolean  | 是否应将失败操作的详细信息写入日志文件。                     |
+| copy.existing             | 可选         | true       | Boolean  | 是否从源集合中复制现有数据。                                 |
+| copy.existing.pipeline    | 可选         | (none)     | String   | 一组 JSON 对象,描述在复制现有数据时要运行的管道操作。<br/>这可以提高复制管理器对索引的使用,并使复制更有效。例如。`[{"$match": {"closed": "false"}}]`确保仅复制已关闭字段设置为 false 的文档。 |
+| copy.existing.max.threads | 可选         | 处理器数量 | Integer  | 执行数据复制时使用的线程数。                                 |
+| copy.existing.queue.size  | 可选         | 16000      | Integer  | 执行数据复制时使用的线程数。                                 |
+| poll.max.batch.size       | 可选         | 1000       | Integer  | 轮询新数据时,单个批次中包含的最大更改流文档数。             |
+| poll.await.time.ms        | 可选         | 1500       | Integer  | 在更改流上检查新结果之前等待的时间量。                       |
+| heartbeat.interval.ms     | 可选         | 0          | Integer  | 发送心跳消息之间的时间长度(以毫秒为单位)。使用 0 禁用。    |
+
+## 可用元数据
+
+以下格式元数据可以作为表定义中的只读 (VIRTUAL) 列公开。
+
+| Key             | 数据类型                  | 描述                                                         |
+| --------------- | ------------------------- | ------------------------------------------------------------ |
+| database_name   | STRING NOT NULL           | 包含该行的数据库的名称。                                     |
+| collection_name | STRING NOT NULL           | 包含该行的集合的名称。                                       |
+| op_ts           | TIMESTAMP_LTZ(3) NOT NULL | 它指示在数据库中进行更改的时间。<br/>如果记录是从表的快照而不是更改流中读取的,则该值始终为 0。 |
+
+
+扩展的 CREATE TABLE 示例演示了公开这些元数据字段的语法:
+```sql
+CREATE TABLE `mysql_extract_node` (
+    db_name STRING METADATA FROM 'database_name' VIRTUAL,
+    table_name STRING METADATA  FROM 'table_name' VIRTUAL,
+    operation_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL,
+    _id STRING, // must be declared
+    name STRING,
+    weight DECIMAL(10,3),
+    tags ARRAY<STRING>, -- array
+    price ROW<amount DECIMAL(10,2), currency STRING>, -- embedded document
+    suppliers ARRAY<ROW<name STRING, address STRING>>, -- embedded documents
+    PRIMARY KEY(_id) NOT ENFORCED
+) WITH (
+      'connector' = 'mongodb-cdc', 
+      'hostname' = 'YourHostname',
+      'username' = 'YourUsername',
+      'password' = 'YourPassword',
+      'database-name' = 'YourDatabase',
+      'table-name' = 'YourTable' 

Review Comment:
   `database-name` should be `database`.
   `table-name` should be `collection`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org