You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by do...@apache.org on 2022/06/14 12:39:23 UTC
[incubator-inlong-website] branch master updated: [INLONG-405][Sort] Add sqlserver cdc,hdfs,hive doc (#406)

This is an automated email from the ASF dual-hosted git repository.

dockerzhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-inlong-website.git


The following commit(s) were added to refs/heads/master by this push:
     new e8f8c3740 [INLONG-405][Sort] Add sqlserver cdc,hdfs,hive doc (#406)
e8f8c3740 is described below

commit e8f8c374044d7b0e5595fb4d2679cfaef466478f
Author: ganfengtan <Ga...@users.noreply.github.com>
AuthorDate: Tue Jun 14 20:39:18 2022 +0800

    [INLONG-405][Sort] Add sqlserver cdc,hdfs,hive doc (#406)
---
 docs/data_node/extract_node/hdfs.md                |   7 +-
 docs/data_node/extract_node/sqlserver-cdc.md       | 337 ++++++++++++++++++++-
 docs/data_node/load_node/hdfs.md                   | 201 +++++++++++-
 docs/data_node/load_node/hive.md                   | 209 ++++++++++++-
 .../current/data_node/extract_node/hdfs.md         |   8 +-
 .../data_node/extract_node/sqlserver-cdc.md        | 335 +++++++++++++++++++-
 .../current/data_node/load_node/hdfs.md            | 205 ++++++++++++-
 .../current/data_node/load_node/hive.md            | 208 ++++++++++++-
 8 files changed, 1496 insertions(+), 14 deletions(-)

diff --git a/docs/data_node/extract_node/hdfs.md b/docs/data_node/extract_node/hdfs.md
index 947e2f8c2..9762d7893 100644
--- a/docs/data_node/extract_node/hdfs.md
+++ b/docs/data_node/extract_node/hdfs.md
@@ -1,4 +1,9 @@
 ---
 title: HDFS
 sidebar_position: 6
----
\ No newline at end of file
+---
+The HDFS connector can be used to read single files or entire directories into a single table.
+
+When using a directory as the source path, there is no defined order of ingestion for the files inside the directory.
+
+Notice:HDFS CDC feature is developing.
diff --git a/docs/data_node/extract_node/sqlserver-cdc.md b/docs/data_node/extract_node/sqlserver-cdc.md
index d4f2c1a19..c4ad66faa 100644
--- a/docs/data_node/extract_node/sqlserver-cdc.md
+++ b/docs/data_node/extract_node/sqlserver-cdc.md
@@ -1,4 +1,337 @@
 ---
-title: SqlServer-CDC
+title: SQLServer-CDC
 sidebar_position: 11
----
\ No newline at end of file
+---
+## SQLServer Extract Node
+
+The SQLServer Extract Node reads data and incremental data from the SQLServer database. The following will describe how to set up the SQLServer extraction node.
+
+## Supported Version
+
+| Extract Node                | Version                                                                                                                                                                                                                                                                                                                                                                                                |
+|-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [SQLServer-cdc](./sqlserver-cdc.md) | [SQLServer](https://docs.microsoft.com/en-us/sql/database-engine/install-windows/install-sql-server?view=sql-server-ver16): 2014、2016、2017、2019、2022 |      |
+
+## Dependencies
+
+Introduce related SQLServer Extract Node dependencies through maven.
+Of course, you can also use INLONG to provide jar packages.([sort-connector-sqlserver-cdc](https://inlong.apache.org/download/main/))
+
+### Maven dependency
+
+```
+<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>sort-connector-sqlserver-cdc</artifactId>
+    <!-- Choose the version that suits your application -->
+    <version>inlong_version</version>
+</dependency>
+```
+## Setup SQLServer Extract Node
+
+SQLServer Extract Node needs to open related libraries and tables, the steps are as follows:
+
+1. Enable the CDC function for the database.
+```sql
+if exists(select 1 from sys.databases where name='dbName' and is_cdc_enabled=0)
+begin
+    exec sys.sp_cdc_enable_db
+end
+```
+2. Check the database CDC capability status.
+```sql
+select is_cdc_enabled from sys.databases where name='dbName'
+```
+note: 1 is running CDC of DB.
+
+3. Turn on CDC for the table
+```sql
+IF EXISTS(SELECT 1 FROM sys.tables WHERE name='tableName' AND is_tracked_by_cdc = 0)
+BEGIN
+    EXEC sys.sp_cdc_enable_table
+        @source_schema = 'dbo', -- source_schema
+        @source_name = 'tableName', -- table_name
+        @capture_instance = NULL, -- capture_instance
+        @supports_net_changes = 1, -- supports_net_changes
+        @role_name = NULL, -- role_name
+        @index_name = NULL, -- index_name
+        @captured_column_list = NULL, -- captured_column_list
+        @filegroup_name = 'PRIMARY' -- filegroup_name
+END
+```
+note: The table must have a primary key or unique index.
+
+4. Check the table CDC capability status.
+```sql
+SELECT is_tracked_by_cdc FROM sys.tables WHERE name='tableName'
+```
+note: 1 is running CDC of table.
+
+## How to create a SQLServer Extract Node
+
+### Usage for SQL API
+
+The example below shows how to create a SQLServer Extract Node with `Flink SQL Cli` :
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a SQLServer table 'sqlserver_extract_node' in Flink SQL Cli
+Flink SQL> CREATE TABLE sqlserver_extract_node (
+     order_id INT,
+     order_date TIMESTAMP(0),
+     customer_name STRING,
+     price DECIMAL(10, 5),
+     product_id INT,
+     order_status BOOLEAN,
+     PRIMARY KEY(order_id) NOT ENFORCED
+     ) WITH (
+     'connector' = 'sqlserver-cdc',
+     'hostname' = 'YourHostname',
+     'port' = 'port', --default:1433
+     'username' = 'YourUsername',
+     'password' = 'YourPassword',
+     'database-name' = 'YourDatabaseName',
+     'schema-name' = 'YourSchemaName' -- default:dbo
+     'table-name' = 'YourTableName');
+  
+-- Read snapshot and binlog from sqlserver_extract_node
+Flink SQL> SELECT * FROM sqlserver_extract_node;
+```
+### Usage for InLong Dashboard
+TODO
+
+### Usage for InLong Manager Client
+TODO
+
+## SQLServer Extract Node Options
+
+<div class="highlight">
+<table class="colwidths-auto docutils">
+    <thead>
+      <tr>
+        <th class="text-left" style={{width: '10%'}}>Option</th>
+        <th class="text-left" style={{width: '8%'}}>Required</th>
+        <th class="text-left" style={{width: '7%'}}>Default</th>
+        <th class="text-left" style={{width: '10%'}}>Type</th>
+        <th class="text-left" style={{width: '65%'}}>Description</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td>connector</td>
+      <td>required</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>Specify what connector to use, here should be 'sqlserver-cdc'.</td>
+    </tr>
+    <tr>
+      <td>hostname</td>
+      <td>required</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>IP address or hostname of the SQLServer database.</td>
+    </tr>
+    <tr>
+      <td>username</td>
+      <td>required</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>Username to use when connecting to the SQLServer database.</td>
+    </tr>
+    <tr>
+      <td>password</td>
+      <td>required</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>Password to use when connecting to the SQLServer database.</td>
+    </tr>
+    <tr>
+      <td>database-name</td>
+      <td>required</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>Database name of the SQLServer database to monitor.</td>
+    </tr> 
+    <tr>
+      <td>schema-name</td>
+      <td>required</td>
+      <td style={{wordWrap: 'break-word'}}>dbo</td>
+      <td>String</td>
+      <td>Schema name of the SQLServer database to monitor.</td>
+    </tr>
+    <tr>
+      <td>table-name</td>
+      <td>required</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>Table name of the SQLServer database to monitor.</td>
+    </tr>
+    <tr>
+      <td>port</td>
+      <td>optional</td>
+      <td style={{wordWrap: 'break-word'}}>1433</td>
+      <td>Integer</td>
+      <td>Integer port number of the SQLServer database.</td>
+    </tr>
+    <tr>
+      <td>server-time-zone</td>
+      <td>optional</td>
+      <td style={{wordWrap: 'break-word'}}>UTC</td>
+      <td>String</td>
+      <td>The session time zone in database server, e.g. "Asia/Shanghai".</td>
+    </tr>
+    </tbody>
+</table>
+</div>
+
+## Available Metadata
+The following format metadata can be exposed as read-only (VIRTUAL) columns in a table definition.
+
+<table class="colwidths-auto docutils">
+  <thead>
+     <tr>
+       <th class="text-left" style={{width: '15%'}}>Key</th>
+       <th class="text-left" style={{width: '30%'}}>DataType</th>
+       <th class="text-left" style={{width: '55%'}}>Description</th>
+     </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>meta.table_name</td>
+      <td>STRING NOT NULL</td>
+      <td>Name of the table that contain the row.</td>
+    </tr>   
+     <tr>
+      <td>meta.schema_name</td>
+      <td>STRING NOT NULL</td>
+      <td>Name of the schema that contain the row.</td>
+    </tr>
+    <tr>
+      <td>meta.database_name</td>
+      <td>STRING NOT NULL</td>
+      <td>Name of the database that contain the row.</td>
+    </tr>
+    <tr>
+      <td>meta.op_ts</td>
+      <td>TIMESTAMP_LTZ(3) NOT NULL</td>
+      <td>It indicates the time that the change was made in the database. <br/>If the record is read from snapshot of the table instead of the binlog, the value is always 0.</td>
+    </tr>
+  </tbody>
+</table>
+
+The extended CREATE TABLE example demonstrates the syntax for exposing these metadata fields:
+```sql
+CREATE TABLE sqlserver_extract_node (
+    table_name STRING METADATA  FROM 'table_name' VIRTUAL,
+    schema_name STRING METADATA  FROM 'schema_name' VIRTUAL,
+    db_name STRING METADATA FROM 'database_name' VIRTUAL,
+    operation_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL,
+    id INT NOT NULL
+) WITH (
+    'connector' = 'sqlserver-cdc',
+    'hostname' = 'localhost',
+    'port' = '1433',
+    'username' = 'sa',
+    'password' = 'password',
+    'database-name' = 'test',
+    'schema-name' = 'dbo',
+    'table-name' = 'worker'
+);
+```
+
+## Data Type Mapping
+<div class="wy-table-responsive">
+<table class="colwidths-auto docutils">
+    <thead>
+      <tr>
+        <th class="text-left">SQLServer type</th>
+        <th class="text-left">Flink SQL type</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td>char(n)</td>
+      <td>CHAR(n)</td>
+    </tr>
+    <tr>
+      <td>
+        varchar(n)<br/>
+        nvarchar(n)<br/>
+        nchar(n)</td>
+      <td>VARCHAR(n)</td>
+    </tr>
+    <tr>
+      <td>
+        text<br/>
+        ntext<br/>
+        xml</td>
+      <td>STRING</td>
+    </tr>
+    <tr>
+      <td>
+        decimal(p, s)<br/>
+        money<br/>
+        smallmoney</td>
+      <td>DECIMAL(p, s)</td>
+    </tr>
+   <tr>
+      <td>numeric</td>
+      <td>NUMERIC</td>
+    </tr>
+    <tr>
+      <td>
+          REAL<br/>
+          FLOAT<br/>
+       </td>
+       <td>FLOAT</td>
+    </tr>
+    <tr>
+      <td>bit</td>
+      <td>BOOLEAN</td>
+    </tr>
+    <tr>
+      <td>int</td>
+      <td>INT</td>
+    </tr>
+    <tr>
+      <td>tinyint</td>
+      <td>TINYINT</td>
+    </tr>
+    <tr>
+      <td>smallint</td>
+      <td>SMALLINT</td>
+    </tr>
+    <tr>
+      <td>time (n)</td>
+      <td>TIME (n)</td>
+    </tr>
+    <tr>
+      <td>bigint</td>
+      <td>BIGINT</td>
+    </tr>
+    <tr>
+      <td>date</td>
+      <td>DATE</td>
+    </tr>
+    <tr>
+      <td>
+        datetime2<br/>
+        datetime<br/>
+        smalldatetime
+      </td>
+      <td>TIMESTAMP(n)</td>
+    </tr>
+    <tr>
+      <td>
+       datetimeoffset
+      </td>
+      <td>TIMESTAMP_LTZ(3)</td>
+    </tr>
+    </tbody>
+</table>
+</div>
+
+
+
diff --git a/docs/data_node/load_node/hdfs.md b/docs/data_node/load_node/hdfs.md
index c4a3c3864..1ae608da9 100644
--- a/docs/data_node/load_node/hdfs.md
+++ b/docs/data_node/load_node/hdfs.md
@@ -1,4 +1,203 @@
 ---
 title: HDFS
 sidebar_position: 11
----
\ No newline at end of file
+---
+## HDFS Load Node
+
+HDFS uses the general capabilities of flink's fileSystem to support single files and partitioned files.
+The file system connector itself is included in Flink and does not require an additional dependency. 
+The corresponding jar can be found in the Flink distribution inside the /lib directory. 
+A corresponding format needs to be specified for reading and writing rows from and to a file system.
+
+## How to create a HDFS Load Node
+
+### Usage for SQL API
+The example below shows how to create a HDFS Load Node with `Flink SQL Cli` :
+
+```sql
+CREATE TABLE hdfs_load_node (
+  id STRING,
+  name STRING,
+  uv BIGINT,
+  pv BIGINT,
+  dt STRING,
+ `hour` STRING
+  ) PARTITIONED BY (dt, `hour`) WITH (
+    'connector'='filesystem',
+    'path'='...',
+    'format'='orc',
+    'sink.partition-commit.delay'='1 h',
+    'sink.partition-commit.policy.kind'='success-file'
+  );
+```
+
+#### File Formats
+<ul>
+<li>CSV(Uncompressed)</li>
+<li>JSON(JSON format for file system connector is not a typical JSON file but uncompressed newline delimited JSON.)</li>
+<li>Avro(Support compression by configuring avro.codec.)</li>
+<li>Parquet(Compatible with Hive.)</li>
+<li>Orc(Compatible with Hive.)</li>
+<li>Debezium-JSON</li>
+<li>Canal-JSON</li>
+<li>Raw</li>
+</ul>
+
+#### Rolling Policy
+Data within the partition directories are split into part files. 
+Each partition will contain at least one part file for each subtask of the sink that has received data for that partition. 
+The in-progress part file will be closed and additional part file will be created according to the configurable rolling policy. 
+The policy rolls part files based on size, a timeout that specifies the maximum duration for which a file can be open.
+<table class="table table-bordered">
+    <thead>
+      <tr>
+        <th class="text-left" style={{width: '25%'}}>Option</th>
+        <th class="text-center" style={{width: '7%'}}>Default</th>
+        <th class="text-center" style={{width: '10%'}}>Type</th>
+        <th class="text-center" style={{width: '50%'}}>Description</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td><h5>sink.rolling-policy.file-size</h5></td>
+        <td style={{wordWrap: 'break-word'}}>128MB</td>
+        <td>MemorySize</td>
+        <td>The maximum part file size before rolling.</td>
+    </tr>
+    <tr>
+      <td><h5>sink.rolling-policy.rollover-interval</h5></td>
+      <td style={{wordWrap: 'break-word'}}>30 min</td>
+      <td>String</td>
+      <td>The maximum time duration a part file can stay open before rolling (by default 30 min to avoid to many small files). The frequency at which this is checked is controlled by the 'sink.rolling-policy.check-interval' option.</td>
+    </tr>
+    <tr>
+      <td><h5>sink.rolling-policy.check-interval</h5></td>
+      <td style={{wordWrap: 'break-word'}}>1 min</td>
+      <td>String</td>
+      <td>The interval for checking time based rolling policies. This controls the frequency to check whether a part file should rollover based on 'sink.rolling-policy.rollover-interval'.</td>
+    </tr>
+    </tbody>
+</table>
+
+#### File Compaction 
+The file sink supports file compactions, which allows applications to have smaller checkpoint intervals without generating a large number of files.
+<table class="table table-bordered">
+    <thead>
+      <tr>
+        <th class="text-left" style={{width: '25%'}}>Option</th>
+        <th class="text-center" style={{width: '7%'}}>Default</th>
+        <th class="text-center" style={{width: '10%'}}>Type</th>
+        <th class="text-center" style={{width: '50%'}}>Description</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td><h5>auto-compaction</h5></td>
+        <td style={{wordWrap: 'break-word'}}>false</td>
+        <td>Boolean</td>
+        <td>Whether to enable automatic compaction in streaming sink or not.
+         The data will be written to temporary files. After the checkpoint is completed, the temporary files generated by a checkpoint will be compacted.
+         The temporary files are invisible before compaction.</td>
+    </tr>
+    <tr>
+      <td><h5>compaction.file-size</h5></td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>The compaction target file size, the default value is the rolling file size.</td>
+    </tr>
+    </tbody>
+</table>
+
+#### Partition Commit 
+After writing a partition, it is often necessary to notify downstream applications. 
+For example, add the partition to a Hive metastore or writing a _SUCCESS file in the directory. 
+The file system sink contains a partition commit feature that allows configuring custom policies. 
+Commit actions are based on a combination of triggers and policies.
+<table class="table table-bordered">
+    <thead>
+      <tr>
+        <th class="text-left" style={{width: '25%'}}>Option</th>
+        <th class="text-center" style={{width: '7%'}}>Default</th>
+        <th class="text-center" style={{width: '10%'}}>Type</th>
+        <th class="text-center" style={{width: '50%'}}>Description</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td><h5>sink.partition-commit.trigger</h5></td>
+        <td style={{wordWrap: 'break-word'}}>process-time</td>
+        <td>String</td>
+        <td>Trigger type for partition commit: 'process-time': based on the time of the machine, it neither requires partition time extraction nor watermark generation. Commit partition once the 'current system time' passes 'partition creation system time' plus 'delay'. 'partition-time': based on the time that extracted from partition values, it requires watermark generation. Commit partition once the 'watermark' passes 'time extracted from partition values' plus 'delay'.</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.delay</h5></td>
+      <td style={{wordWrap: 'break-word'}}>0 s</td>
+      <td>Duration</td>
+      <td>The partition will not commit until the delay time. If it is a daily partition, should be '1 d', if it is a hourly partition, should be '1 h'.</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.watermark-time-zone</h5></td>
+      <td style={{wordWrap: 'break-word'}}>UTC</td>
+      <td>String</td>
+      <td>The time zone to parse the long watermark value to TIMESTAMP value,
+       the parsed watermark timestamp is used to compare with partition time to decide the partition should commit or not. 
+       This option is only take effect when `sink.partition-commit.trigger` is set to 'partition-time'. 
+       If this option is not configured correctly, e.g.
+       source rowtime is defined on TIMESTAMP_LTZ column, but this config is not configured, 
+       then users may see the partition committed after a few hours. The default value is 'UTC', 
+       which means the watermark is defined on TIMESTAMP column or not defined. 
+       If the watermark is defined on TIMESTAMP_LTZ column, the time zone of watermark is the session time zone.
+       The option value is either a full name such as 'America/Los_Angeles', or a custom timezone id such as 'GMT-08:00'.</td>
+    </tr>
+    </tbody>
+</table>
+
+#### Partition Commit Policy
+
+
+The partition strategy defines the specific operation of partition submission.
+
+- metastore：This strategy is only supported when hive.
+- success: The '_SUCCESS' file will be generated after the part file is generated.
+
+<table class="table table-bordered">
+    <thead>
+      <tr>
+        <th class="text-left" style={{width: '25%'}}>Option</th>
+        <th class="text-left" style={{width: '8%'}}>Required</th>
+        <th class="text-center" style={{width: '7%'}}>Default</th>
+        <th class="text-center" style={{width: '10%'}}>Type</th>
+        <th class="text-center" style={{width: '50%'}}>Description</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td><h5>sink.partition-commit.policy.kind</h5></td>
+        <td>optional</td>
+        <td style={{wordWrap: 'break-word'}}>(none)</td>
+        <td>String</td>
+        <td>Policy to commit a partition is to notify the downstream application that the partition has finished writing, the partition is ready to be read. 
+        metastore: add partition to metastore. Only hive table supports metastore policy, 
+        file system manages partitions through directory structure. success-file: add '_success' file to directory. 
+        Both can be configured at the same time: 'metastore,success-file'. custom: use policy class to create a commit policy. 
+        Support to configure multiple policies: 'metastore,success-file'.</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.policy.class</h5></td>
+      <td>optional</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>The partition commit policy class for implement PartitionCommitPolicy interface. 
+      Only work in custom commit policy.</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.success-file.name</h5></td>
+      <td>optional</td>
+      <td style={{wordWrap: 'break-word'}}>_SUCCESS</td>
+      <td>String</td>
+      <td>The file name for success-file partition commit policy, default is '_SUCCESS'.</td>
+    </tr>
+    </tbody>
+</table>
+
+
diff --git a/docs/data_node/load_node/hive.md b/docs/data_node/load_node/hive.md
index c1ddfdb09..23a7db068 100644
--- a/docs/data_node/load_node/hive.md
+++ b/docs/data_node/load_node/hive.md
@@ -2,8 +2,211 @@
 title: Hive
 sidebar_position: 3
 ---
+## Hive Load Node
+Hive Load Node can write data to hive. Using the flink dialect, the insert operation is currently supported, and the data in the upsert mode will be converted into insert.
+Manipulating hive tables using the hive dialect is currently not supported.
 
-## Configuration
-When creating a data flow, select `Hive` for the data stream direction, and click "Add" to configure it.
+## Supported Version
 
-![Hive Configuration](img/hive.png)
\ No newline at end of file
+| Load Node                           | Version                                            | 
+|-------------------------------------|----------------------------------------------------|
+| [Hive](./hive.md) | [Hive](https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/overview/#supported-hive-versions): 1.x, 2.x, 3.x |
+
+### Dependencies
+
+Using Hive load requires the introduction of dependencies.
+Of course, you can also use INLONG to provide jar packages.([sort-connector-hive](https://inlong.apache.org/download/main/))
+
+### Maven dependency
+
+```
+<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>sort-connector-hive</artifactId>
+ <!-- Choose the version that suits your application -->
+    <version>inlong_version</version>
+</dependency>
+```
+## How to create a Hive Load Node
+
+### Usage for SQL API
+
+The example below shows how to create a Hive Load Node with `Flink SQL Cli` :
+
+```sql
+CREATE TABLE hiveTableName (
+  id STRING,
+  name STRING,
+  uv BIGINT,
+  pv BIGINT
+) WITH (
+  'connector' = 'hive',
+  'default-database' = 'default',
+  'hive-version' = '3.1.2',
+  'hive-conf-dir' = 'hdfs://localhost:9000/user/hive/hive-site.xml'
+);
+```
+### Usage for InLong Dashboard
+
+#### Configuration
+When creating a data stream, select `Hive` for the data stream direction, and click "Add" to configure it.
+
+![Hive Configuration](img/hive.png)
+
+### Usage for InLong Manager Client
+
+TODO: It will be supported in the future.
+
+## Hive Load Node Options
+<table class="table table-bordered">
+    <thead>
+      <tr>
+        <th class="text-left" style={{width: '25%'}}>Option</th>
+        <th class="text-center" style={{width: '8%'}}>Required</th>
+        <th class="text-center" style={{width: '7%'}}>Default</th>
+        <th class="text-center" style={{width: '10%'}}>Type</th>
+        <th class="text-center" style={{width: '50%'}}>Description</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td><h5>connector</h5></td>
+        <td>required</td>
+        <td style={{wordWrap: 'break-word'}}>(none)</td>
+        <td>String</td>
+        <td>Specify what connector to use, here should be 'hive'.</td>
+    </tr>
+    <tr>
+      <td><h5>default-database</h5></td>
+      <td>required</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td></td>
+    </tr>
+    <tr>
+      <td><h5>hive-conf-dir</h5></td>
+      <td>required</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>If you don't want to upload hive-site.xml to HDFS, 
+      you can put this configuration into the classpath of the project,
+      and then this place only needs to be not empty, 
+      otherwise you must fill in the complete HDFS URL.</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.trigger</h5></td>
+      <td>optional</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>If hive exists partition you can set trigger mode.(process-time)</td>
+    </tr>
+    <tr>
+      <td><h5>partition.time-extractor.timestamp-pattern</h5></td>
+      <td>optional</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>If hive exists partition you can set timestamp-pattern mode.(yyyy-MM-dd...)</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.delay</h5></td>
+      <td>optional</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>If hive exists partition you can set delay  mode.(10s,20s,1m...)</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.policy.kind</h5></td>
+      <td>optional</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>Policy to commit a partition is to notify the downstream application that the partition has finished writing, 
+      the partition is ready to be read. metastore: add partition to metastore. 
+      Only hive table supports metastore policy, file system manages partitions through directory structure.
+      success-file: add '_success' file to directory. Both can be configured at the same time: 'metastore,success-file'.
+      custom: use policy class to create a commit policy.
+      Support to configure multiple policies: 'metastore,success-file'.</td>
+    </tr>
+    </tbody>
+</table>
+
+## Data Type Mapping
+<div class="wy-table-responsive">
+<table class="colwidths-auto docutils">
+    <thead>
+      <tr>
+        <th class="text-left">Hive type</th>
+        <th class="text-left">Flink SQL type</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td>char(p)</td>
+      <td>CHAR(p)</td>
+    </tr>
+    <tr>
+      <td>varchar(p)</td>
+      <td>VARCHAR(p)</td>
+    </tr>
+    <tr>
+      <td>string</td>
+      <td>STRING</td>
+    </tr>
+    <tr>
+      <td>boolean</td>
+      <td>BOOLEAN</td>
+    </tr>
+    <tr>
+      <td>tinyint</td>
+      <td>TINYINT</td>
+    </tr>     
+    <tr>
+      <td>smallint</td>
+      <td>SMALLINT</td>
+    </tr>    
+   <tr>
+      <td>int</td>
+      <td>INT</td>
+    </tr>
+    <tr>
+      <td>bigint</td>
+      <td>BIGINT</td>
+    </tr>
+    <tr>
+      <td>float</td>
+      <td>FLOAT</td>
+    </tr>
+    <tr>
+      <td>double</td>
+      <td>DOUBLE</td>
+    </tr>
+    <tr>
+      <td>decimal(p, s)</td>
+      <td>DECIMAL(p, s)</td>
+    </tr>
+    <tr>
+      <td>date</td>
+      <td>DATE</td>
+    </tr>
+    <tr>
+      <td>timestamp(9)</td>
+      <td>TIMESTAMP</td>
+    </tr>
+    <tr>
+      <td>bytes</td>
+      <td>BINARY</td>
+    </tr>   
+    <tr>
+      <td>array</td>
+      <td>LIST</td>
+    </tr>
+    <tr>
+      <td>map</td>
+      <td>MAP</td>
+    </tr>
+    <tr>
+      <td>row</td>
+      <td>STRUCT</td>
+    </tr>       
+    </tbody>
+</table>
+</div>
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/hdfs.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/hdfs.md
index 947e2f8c2..3ae130e27 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/hdfs.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/hdfs.md
@@ -1,4 +1,10 @@
 ---
 title: HDFS
 sidebar_position: 6
----
\ No newline at end of file
+---
+
+HDFS 连接器可用于将单个文件或整个目录读取到单个表中。
+
+当使用目录作为源路径时，目录中的文件没有定义摄取顺序。
+
+注意：HDFS CDC 功能正在开发中。
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/sqlserver-cdc.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/sqlserver-cdc.md
index d4f2c1a19..6c68e9a41 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/sqlserver-cdc.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/sqlserver-cdc.md
@@ -1,4 +1,335 @@
 ---
-title: SqlServer-CDC
+title: SQLServer-CDC
 sidebar_position: 11
----
\ No newline at end of file
+---
+## SQLServer抽取节点
+
+SQLServer 提取节点从 SQLServer 数据库中读取数据和增量数据。下面将介绍如何配置 SQLServer 抽取节点。
+
+## 支持的版本
+
+| Extract Node                | Version                                                                                                                                                                                                                                                                                                                                                                                                |
+|-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [SQLServer-cdc](./sqlserver-cdc.md) | [SQLServer](https://docs.microsoft.com/en-us/sql/database-engine/install-windows/install-sql-server?view=sql-server-ver16): 2014、2016、2017、2019、2022 |      |
+
+## 依赖配置
+
+通过 Maven 引入 sort-connector-sqlserver-cdc 构建自己的项目。
+当然，你也可以直接使用 INLONG 提供的 jar 包。([sort-connector-sqlserver-cdc](https://inlong.apache.org/download/main/))
+
+### Maven依赖配置
+
+```
+<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>sort-connector-sqlserver-cdc</artifactId>
+    <!-- 填写适合你应用的 inlong 版本-->
+    <version>inlong_version</version>
+</dependency>
+```
+## 配置 SQLServer 加载节点
+
+SQLServer 加载节点需要开启库和表的 CDC 功能，配置步骤如下：
+
+1. 开启数据库 CDC 能力。
+```sql
+if exists(select 1 from sys.databases where name='dbName' and is_cdc_enabled=0)
+begin
+    exec sys.sp_cdc_enable_db
+end
+```
+2. 检查数据库 CDC 是否开启。
+```sql
+select is_cdc_enabled from sys.databases where name='dbName'
+```
+备注: "1"表示数据库 CDC 开启
+
+3. 开启表的 CDC 能力。
+```sql
+IF EXISTS(SELECT 1 FROM sys.tables WHERE name='tableName' AND is_tracked_by_cdc = 0)
+BEGIN
+    EXEC sys.sp_cdc_enable_table
+        @source_schema = 'dbo', -- source_schema
+        @source_name = 'tableName', -- table_name
+        @capture_instance = NULL, -- capture_instance
+        @supports_net_changes = 1, -- supports_net_changes
+        @role_name = NULL, -- role_name
+        @index_name = NULL, -- index_name
+        @captured_column_list = NULL, -- captured_column_list
+        @filegroup_name = 'PRIMARY' -- filegroup_name
+END
+```
+备注: 表必须有主键或者唯一索引。
+
+4. 检查表 CDC 是否开启。
+```sql
+SELECT is_tracked_by_cdc FROM sys.tables WHERE name='tableName'
+```
+备注: "1"表示表 CDC 开启
+
+## 如何创建一个 SQLServer 抽取节点
+
+### SQL API 的使用
+
+使用 `Flink SQL Cli` :
+
+```sql
+-- Set checkpoint every 3000 milliseconds                       
+Flink SQL> SET 'execution.checkpointing.interval' = '3s';   
+
+-- Create a SqlServer table 'sqlserver_extract_node' in Flink SQL Cli
+Flink SQL> CREATE TABLE sqlserver_extract_node (
+     order_id INT,
+     order_date TIMESTAMP(0),
+     customer_name STRING,
+     price DECIMAL(10, 5),
+     product_id INT,
+     order_status BOOLEAN,
+     PRIMARY KEY(order_id) NOT ENFORCED
+     ) WITH (
+     'connector' = 'sqlserver-cdc',
+     'hostname' = 'YourHostname',
+     'port' = 'port', --default:1433
+     'username' = 'YourUsername',
+     'password' = 'YourPassword',
+     'database-name' = 'YourDatabaseName',
+     'schema-name' = 'YourSchemaName' -- default:dbo
+     'table-name' = 'YourTableName');
+  
+-- Read snapshot and binlog from sqlserver_extract_node
+Flink SQL> SELECT * FROM sqlserver_extract_node;
+```
+### InLong Dashboard 方式
+TODO
+
+### InLong Manager Client 方式
+TODO
+
+## SQLServer 抽取节点参数信息
+
+<div class="highlight">
+<table class="colwidths-auto docutils">
+    <thead>
+      <tr>
+       <th class="text-left" style={{width: '10%'}}>参数</th>
+       <th class="text-left" style={{width: '8%'}}>是否必须</th>
+       <th class="text-left" style={{width: '7%'}}>默认值</th>
+       <th class="text-left" style={{width: '10%'}}>数据类型</th>
+              <th class="text-left" style={{width: '65%'}}>描述</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td>connector</td>
+      <td>必须</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>指定使用什么连接器，这里应该是 'sqlserver-cdc'。</td>
+    </tr>
+    <tr>
+      <td>hostname</td>
+      <td>必须</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>SQLServer 数据库 IP 地址或者 hostname。</td>
+    </tr>
+    <tr>
+      <td>username</td>
+      <td>必须</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>SQLServer 数据库用户名。</td>
+    </tr>
+    <tr>
+      <td>password</td>
+      <td>必须</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>SQLServer 数据库用户密码。</td>
+    </tr>
+    <tr>
+      <td>database-name</td>
+      <td>必须</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>SQLServer 数据库监控的数据库名称。</td>
+    </tr> 
+    <tr>
+      <td>schema-name</td>
+      <td>必须</td>
+      <td style={{wordWrap: 'break-word'}}>dbo</td>
+      <td>String</td>
+      <td>SQLServer 数据库监控的 schema 名称。</td>
+    </tr>
+    <tr>
+      <td>table-name</td>
+      <td>必须</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>SQLServer 数据库监控的表名称。</td>
+    </tr>
+    <tr>
+      <td>port</td>
+      <td>可选</td>
+      <td style={{wordWrap: 'break-word'}}>1433</td>
+      <td>Integer</td>
+      <td>SQLServer 数据库端口。</td>
+    </tr>
+    <tr>
+      <td>server-time-zone</td>
+      <td>可选</td>
+      <td style={{wordWrap: 'break-word'}}>UTC</td>
+      <td>String</td>
+      <td>SQLServer 数据库连接配置时区。 例如： "Asia/Shanghai"。</td>
+    </tr>
+    </tbody>
+</table>
+</div>
+
+## 可用的元数据字段
+
+以下格式元数据可以作为表定义中的只读 (VIRTUAL) 列公开。
+
+<table class="colwidths-auto docutils">
+  <thead>
+     <tr>
+        <th class="text-left" style={{width: '15%'}}>字段名称</th>
+        <th class="text-left" style={{width: '30%'}}>数据类型</th>
+        <th class="text-left" style={{width: '55%'}}>描述</th>
+     </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>meta.table_name</td>
+      <td>STRING NOT NULL</td>
+      <td>包含该行的表的名称。</td>
+    </tr>   
+     <tr>
+      <td>meta.schema_name</td>
+      <td>STRING NOT NULL</td>
+      <td>包含该行 schema 的名称。</td>
+    </tr>
+    <tr>
+      <td>meta.database_name</td>
+      <td>STRING NOT NULL</td>
+      <td>包含该行数据库的名称。</td>
+    </tr>
+    <tr>
+      <td>meta.op_ts</td>
+      <td>TIMESTAMP_LTZ(3) NOT NULL</td>
+      <td>它表示在数据库中进行更改的时间。如果记录是从表的快照而不是 binlog 中读取的，则该值始终为 0。</td>
+    </tr>
+  </tbody>
+</table>
+
+使用元数据字段的例子：
+```sql
+CREATE TABLE sqlserver_extract_node (
+    table_name STRING METADATA  FROM 'table_name' VIRTUAL,
+    schema_name STRING METADATA  FROM 'schema_name' VIRTUAL,
+    db_name STRING METADATA FROM 'database_name' VIRTUAL,
+    operation_ts TIMESTAMP_LTZ(3) METADATA FROM 'op_ts' VIRTUAL,
+    id INT NOT NULL
+) WITH (
+    'connector' = 'sqlserver-cdc',
+    'hostname' = 'localhost',
+    'port' = '1433',
+    'username' = 'sa',
+    'password' = 'password',
+    'database-name' = 'test',
+    'schema-name' = 'dbo',
+    'table-name' = 'worker'
+);
+```
+
+## 数据类型映射
+<div class="wy-table-responsive">
+<table class="colwidths-auto docutils">
+    <thead>
+      <tr>
+        <th class="text-left">SQLServer type</th>
+        <th class="text-left">Flink SQL type</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td>char(n)</td>
+      <td>CHAR(n)</td>
+    </tr>
+    <tr>
+      <td>
+        varchar(n)<br/>
+        nvarchar(n)<br/>
+        nchar(n)</td>
+      <td>VARCHAR(n)</td>
+    </tr>
+    <tr>
+      <td>
+        text<br/>
+        ntext<br/>
+        xml</td>
+      <td>STRING</td>
+    </tr>
+    <tr>
+      <td>
+        decimal(p, s)<br/>
+        money<br/>
+        smallmoney</td>
+      <td>DECIMAL(p, s)</td>
+    </tr>
+   <tr>
+      <td>numeric</td>
+      <td>NUMERIC</td>
+    </tr>
+    <tr>
+      <td>
+          REAL<br/>
+          FLOAT<br/>
+       </td>
+       <td>FLOAT</td>
+    </tr>
+    <tr>
+      <td>bit</td>
+      <td>BOOLEAN</td>
+    </tr>
+    <tr>
+      <td>int</td>
+      <td>INT</td>
+    </tr>
+    <tr>
+      <td>tinyint</td>
+      <td>TINYINT</td>
+    </tr>
+    <tr>
+      <td>smallint</td>
+      <td>SMALLINT</td>
+    </tr>
+    <tr>
+      <td>time (n)</td>
+      <td>TIME (n)</td>
+    </tr>
+    <tr>
+      <td>bigint</td>
+      <td>BIGINT</td>
+    </tr>
+    <tr>
+      <td>date</td>
+      <td>DATE</td>
+    </tr>
+    <tr>
+      <td>
+        datetime2<br/>
+        datetime<br/>
+        smalldatetime
+      </td>
+      <td>TIMESTAMP(n)</td>
+    </tr>
+    <tr>
+      <td>
+       datetimeoffset
+      </td>
+      <td>TIMESTAMP_LTZ(3)</td>
+    </tr>
+    </tbody>
+</table>
+</div>
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hdfs.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hdfs.md
index c4a3c3864..92c33ba84 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hdfs.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hdfs.md
@@ -1,4 +1,207 @@
 ---
 title: HDFS
 sidebar_position: 11
----
\ No newline at end of file
+---
+## HDFS 加载节点
+HDFS 连接器为 Flink 内部依赖，支持分区文件。
+在 Flink 中包含了该文件系统连接器，不需要添加额外的依赖。
+相应的 jar 包可以在 Flink 工程项目的 /lib 目录下找到。
+从文件系统中读取或者向文件系统中写入行时，需要指定相应的 format。
+
+## 如何创建 HDFS 加载节点
+
+### SQL API 的使用
+使用 `Flink SQL Cli` :
+
+```sql
+CREATE TABLE hdfs_load_node (
+  id STRING,
+  name STRING,
+  uv BIGINT,
+  pv BIGINT,
+  dt STRING,
+ `hour` STRING
+  ) PARTITIONED BY (dt, `hour`) WITH (
+    'connector'='filesystem',
+    'path'='...',
+    'format'='orc',
+    'sink.partition-commit.delay'='1 h',
+    'sink.partition-commit.policy.kind'='success-file'
+  );
+```
+
+#### File Formats
+<ul>
+<li>CSV(非压缩格式)</li>
+<li>JSON(文件系统连接器的 JSON format 与传统的标准的 JSON file 的不同，而是非压缩的。换行符分割的 JSON)</li>
+<li>Avro(通过配置 avro.codec 属性支持压缩)</li>
+<li>Parquet(与 hive 兼容)</li>
+<li>Orc(与 hive 兼容)</li>
+<li>Debezium-JSON</li>
+<li>Canal-JSON</li>
+<li>Raw</li>
+</ul>
+
+备注：文件格式明细可以查看[Flink Formats](https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/formats/overview/)
+
+#### 滚动策略
+
+数据会被加载到文件的目录下的 part 文件中，每个分区接收到来之 subtask 的数据至少会为该分区生成一个 part 文件。同时可以配置滚动策略
+来生成 part 文件，生成 part 文件会将 in-progress part 文件关闭。该策略基于大小和指定文件被打开的超时时间来生成 part 文件。
+
+<table class="table table-bordered">
+    <thead>
+      <tr>
+          <th class="text-left" style={{width: '10%'}}>参数</th>
+          <th class="text-left" style={{width: '7%'}}>默认值</th>
+          <th class="text-left" style={{width: '10%'}}>数据类型</th>
+          <th class="text-left" style={{width: '65%'}}>描述</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td><h5>sink.rolling-policy.file-size</h5></td>
+        <td style={{wordWrap: 'break-word'}}>128MB</td>
+        <td>MemorySize</td>
+        <td>滚动前 part 文件的最大值。</td>
+    </tr>
+    <tr>
+      <td><h5>sink.rolling-policy.rollover-interval</h5></td>
+      <td style={{wordWrap: 'break-word'}}>30 min</td>
+      <td>String</td>
+      <td>滚动前，part 文件处于打开状态的最大时长（默认值30分钟，以避免产生大量小文件）。
+       检查频率是由 'sink.rolling-policy.check-interval' 属性控制的。</td>
+    </tr>
+    <tr>
+      <td><h5>sink.rolling-policy.check-interval</h5></td>
+      <td style={{wordWrap: 'break-word'}}>1 min</td>
+      <td>String</td>
+      <td>基于时间的滚动策略的检查间隔。
+      该属性控制了基于 'sink.rolling-policy.rollover-interval' 属性检查文件是否该被滚动的检查频率。</td>
+    </tr>
+    </tbody>
+</table>
+
+#### 文件合并 
+支持文件能力，允许在较小的 checkpoint 下不产生大量的小文件。
+<table class="table table-bordered">
+    <thead>
+      <tr>
+          <th class="text-left" style={{width: '10%'}}>参数</th>
+          <th class="text-left" style={{width: '7%'}}>默认值</th>
+          <th class="text-left" style={{width: '10%'}}>数据类型</th>
+          <th class="text-left" style={{width: '65%'}}>描述</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td><h5>auto-compaction</h5></td>
+        <td style={{wordWrap: 'break-word'}}>false</td>
+        <td>Boolean</td>
+        <td>在流式 sink 中是否开启自动合并功能，数据首先会被写入临时文件。
+        当 checkpoint 完成后，该检查点产生的临时文件会被合并，这些临时文件在合并前不可见。</td>
+    </tr>
+    <tr>
+      <td><h5>compaction.file-size</h5></td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>合并目标文件大小，默认值为滚动文件大小。</td>
+    </tr>
+    </tbody>
+</table>
+
+#### 分区提交 
+
+分区数据写入完成后，一般需要通知下流应用。如：更新 hive 的元数据信息或者 hdfs 目录生成 _SUCCESS 文件。
+分区提交策略是配置的，分区提交行为基于 triggers 和 policies 的组合。
+
+- Trigger :分区提交时机可以基于分区的 watermark 或者基于处理时间（process-time）。
+- Policy :分区提交策略，内置策略包括提交 hive 元数据和生成 _SUCCESS 文件，同时支持自定策略，如生成 hive 的统计信息、合并小文件等。
+
+备注：分区提交仅支持动态分区插入。
+
+<table class="table table-bordered">
+    <thead>
+      <tr>
+          <th class="text-left" style={{width: '10%'}}>参数</th>
+          <th class="text-left" style={{width: '7%'}}>默认值</th>
+          <th class="text-left" style={{width: '10%'}}>数据类型</th>
+          <th class="text-left" style={{width: '65%'}}>描述</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td><h5>sink.partition-commit.trigger</h5></td>
+        <td style={{wordWrap: 'break-word'}}>process-time</td>
+        <td>String</td>
+        <td>分区提交触发器类型： 'process-time'：基于机器时间既不需要分区时间提取器也不需要 watermark 生成器。
+        一旦 "当前系统时间" 超过了 "分区创建系统时间" 和 'sink.partition-commit.delay' 之和立即提交分区。<br/>
+         'partition-time'：基于提取的分区时间，需要 watermark 生成。一旦 watermark 超过了 "分区创建系统时间" 和 'sink.partition-commit.delay' 之和立即提交分区。</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.delay</h5></td>
+      <td style={{wordWrap: 'break-word'}}>0 s</td>
+      <td>Duration</td>
+      <td>如果设置分区延迟提交，这个延迟时间之前不会提交。天：'d'；小时：'h';秒：'s'等</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.watermark-time-zone</h5></td>
+      <td style={{wordWrap: 'break-word'}}>UTC</td>
+      <td>String</td>
+      <td> 解析 Long 类型的 watermark 到 TIMESTAMP 类型时所采用的时区，
+      解析得到的 watermark 的 TIMESTAMP 会被用来跟分区时间进行比较以判断是否该被提交。
+      这个属性仅当 `sink.partition-commit.trigger` 被设置为 'partition-time' 时有效。
+      如果这个属性设置的不正确，例如在 TIMESTAMP_LTZ 类型的列上定义了 source rowtime，
+      如果没有设置该属性，那么用户可能会在若干个小时后才看到分区的提交。
+      默认值为 'UTC' 意味着 watermark 是定义在 TIMESTAMP 类型的列上或者没有定义 watermark。
+      如果 watermark 定义在 TIMESTAMP_LTZ 类型的列上，watermark 时区必须是会话时区（session time zone）。
+      该属性的可选值要么是完整的时区名比如 'America/Los_Angeles'，要么是自定义时区，例如 'GMT-08:00'。</td>
+    </tr>
+    </tbody>
+</table>
+
+#### 分区提交策略
+
+分区提交策略定义了分区提交使用的具体策略。
+
+- metastore：仅在 hive 时支持该策略。
+- success: part 文件生成后会生成 '_SUCCESS' 文件。
+
+<table class="table table-bordered">
+    <thead>
+      <tr>
+        <th class="text-left" style={{width: '25%'}}>参数</th>
+        <th class="text-left" style={{width: '8%'}}>是否必须</th>
+        <th class="text-center" style={{width: '7%'}}>默认值</th>
+        <th class="text-center" style={{width: '10%'}}>数据类型</th>
+        <th class="text-center" style={{width: '50%'}}>描述</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td><h5>sink.partition-commit.policy.kind</h5></td>
+        <td>可选</td>
+        <td style={{wordWrap: 'break-word'}}>(none)</td>
+        <td>String</td>
+        <td>分区策略通知分区 part 生成可以被访问，仅 hive 支持 metastore 策略，文件系统生成 '_success' 文件表示文件写入完成。
+        两种策略的指定分别为 'metastore,success-file' ，也可以通过 custom 的指定的类创建提交策略。</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.policy.class</h5></td>
+      <td>可选</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>实现 PartitionCommitPolicy 接口的分区提交策略类，只有在 custom 提交策略下才使用该类。</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.success-file.name</h5></td>
+      <td>可选</td>
+      <td style={{wordWrap: 'break-word'}}>_SUCCESS</td>
+      <td>String</td>
+      <td>使用 success-file 分区提交策略时的文件名，默认值是 '_SUCCESS'。</td>
+    </tr>
+    </tbody>
+</table>
+
+
+
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hive.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hive.md
index 874c5030e..d8468a225 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hive.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/load_node/hive.md
@@ -2,8 +2,210 @@
 title: Hive
 sidebar_position: 2
 ---
+## Hive 加载节点
 
-## 配置
-创建数据流时，数据流向选择 `Hive`，并点击 ”添加“ 进行配置。
+Hive 加载节点可以将数据写入 Hive。使用 Flink 方言，目前仅支持 Insert 操作，Upsert 模式下的数据会转换成 Insert 方式
+目前暂时不支持使用 Hive 方言操作 Hive 表。
 
-![Hive Configuration](img/hive.png)
\ No newline at end of file
+## 支持的版本
+
+| Load Node                           | Version                                            | 
+|-------------------------------------|----------------------------------------------------|
+| [Hive](./hive.md) | [Hive](https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/overview/#supported-hive-versions): 1.x, 2.x, 3.x |
+
+### 依赖
+
+通过 Maven 引入 sort-connector-hive 构建自己的项目。
+当然，你也可以直接使用 INLONG 提供的 jar 包。([sort-connector-sqlserver-cdc](https://inlong.apache.org/download/main/))
+
+### Maven 依赖
+
+```
+<dependency>
+    <groupId>org.apache.inlong</groupId>
+    <artifactId>sort-connector-hive</artifactId>
+    <!-- 填写适合你应用的 inlong 版本-->
+    <version>inlong_version</version>
+</dependency>
+```
+## 如何配置 Hive 数据加载节点
+
+### SQL API 的使用
+
+使用 `Flink SQL Cli` :
+
+```sql
+CREATE TABLE hiveTableName (
+  id STRING,
+  name STRING,
+  uv BIGINT,
+  pv BIGINT
+) WITH (
+  'connector' = 'hive',
+  'default-database' = 'default',
+  'hive-version' = '3.1.2',
+  'hive-conf-dir' = 'hdfs://localhost:9000/user/hive/hive-site.xml'
+);
+```
+### InLong Dashboard 方式
+
+#### 配置
+在创建数据流时，选择数据落地为 'Hive' 然后点击 'Add' 来配置 Hive 的相关信息。
+
+![Hive Configuration](img/hive.png)
+
+### InLong Manager Client 方式
+
+TODO: 未来版本支持
+
+## Hive 加载节点参数信息
+<table class="table table-bordered">
+    <thead>
+      <tr>
+              <th class="text-left" style={{width: '10%'}}>参数</th>
+              <th class="text-left" style={{width: '8%'}}>是否必须</th>
+              <th class="text-left" style={{width: '7%'}}>默认值</th>
+              <th class="text-left" style={{width: '10%'}}>数据类型</th>
+              <th class="text-left" style={{width: '65%'}}>描述</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+        <td><h5>connector</h5></td>
+        <td>必须</td>
+        <td style={{wordWrap: 'break-word'}}>(none)</td>
+        <td>String</td>
+        <td>指定使用什么连接器，这里应该是  'hive'。</td>
+    </tr>
+    <tr>
+      <td><h5>default-database</h5></td>
+      <td>必须</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>指定数据库名称。</td>
+    </tr>
+    <tr>
+      <td><h5>hive-conf-dir</h5></td>
+      <td>必须</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>本地构建项目可以将hive-site.xml构建到 classpath 中，未来 Dashboard 将支持本地上传能力。
+      目前通用方式只支持配置已经上传文件后的 HDFS 路径。</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.trigger</h5></td>
+      <td>可选</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>如果表是分区表，可以配置触发模式。如：(process-time)</td>
+    </tr>
+    <tr>
+      <td><h5>partition.time-extractor.timestamp-pattern</h5></td>
+      <td>可选</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>如果表是分区表，可以配置时间戳。如：(yyyy-MM-dd)</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.delay</h5></td>
+      <td>可选</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>如果表是分区表，可以配置延迟时间。如：(10s,20s,1m...)</td>
+    </tr>
+    <tr>
+      <td><h5>sink.partition-commit.policy.kind</h5></td>
+      <td>可选</td>
+      <td style={{wordWrap: 'break-word'}}>(none)</td>
+      <td>String</td>
+      <td>分区提交策略通知下游某个分区已经写完毕可以被读取了。 
+      metastore：向 metadata 增加分区。仅 hive 支持 metastore 策略，文件系统通过目录结构管理分区； 
+      success-file：在目录中增加 '_success' 文件； 
+      上述两个策略可以同时指定：'metastore,success-file'。 
+      custom：通过指定的类来创建提交策略， 
+      支持同时指定多个提交策略：'metastore,success-file'。</td>
+    </tr>
+    </tbody>
+</table>
+
+## 数据类型映射
+<div class="wy-table-responsive">
+<table class="colwidths-auto docutils">
+    <thead>
+      <tr>
+        <th class="text-left">Hive type</th>
+        <th class="text-left">Flink SQL type</th>
+      </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td>char(p)</td>
+      <td>CHAR(p)</td>
+    </tr>
+    <tr>
+      <td>varchar(p)</td>
+      <td>VARCHAR(p)</td>
+    </tr>
+    <tr>
+      <td>string</td>
+      <td>STRING</td>
+    </tr>
+    <tr>
+      <td>boolean</td>
+      <td>BOOLEAN</td>
+    </tr>
+    <tr>
+      <td>tinyint</td>
+      <td>TINYINT</td>
+    </tr>     
+    <tr>
+      <td>smallint</td>
+      <td>SMALLINT</td>
+    </tr>    
+   <tr>
+      <td>int</td>
+      <td>INT</td>
+    </tr>
+    <tr>
+      <td>bigint</td>
+      <td>BIGINT</td>
+    </tr>
+    <tr>
+      <td>float</td>
+      <td>FLOAT</td>
+    </tr>
+    <tr>
+      <td>double</td>
+      <td>DOUBLE</td>
+    </tr>
+    <tr>
+      <td>decimal(p, s)</td>
+      <td>DECIMAL(p, s)</td>
+    </tr>
+    <tr>
+      <td>date</td>
+      <td>DATE</td>
+    </tr>
+    <tr>
+      <td>timestamp(9)</td>
+      <td>TIMESTAMP</td>
+    </tr>
+    <tr>
+      <td>bytes</td>
+      <td>BINARY</td>
+    </tr>   
+    <tr>
+      <td>array</td>
+      <td>LIST</td>
+    </tr>
+    <tr>
+      <td>map</td>
+      <td>MAP</td>
+    </tr>
+    <tr>
+      <td>row</td>
+      <td>STRUCT</td>
+    </tr>       
+    </tbody>
+</table>
+</div>