You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by do...@apache.org on 2021/12/02 10:39:28 UTC

[incubator-inlong-website] 01/01: [INLONG-1877] improve the document format

This is an automated email from the ASF dual-hosted git repository.

dockerzhang pushed a commit to branch fix-1877
in repository https://gitbox.apache.org/repos/asf/incubator-inlong-website.git

commit 4174673dea87966f7bc3bd384f878c847bf63eab
Author: dockerzhang <do...@tencent.com>
AuthorDate: Thu Dec 2 18:39:13 2021 +0800

    [INLONG-1877] improve the document format
---
 docs/deployment/bare_metal.md                      |  16 +--
 docs/deployment/k8s.md                             |   1 +
 docs/modules/agent/file_collect.md                 | 109 ++++++++++++++++
 docs/modules/agent/overview.md                     |  46 +++----
 docs/modules/agent/quick_start.md                  | 145 +++------------------
 docs/modules/dataproxy/overview.md                 | 113 ++++++++--------
 docs/sdk/dataproxy-sdk/overview.md                 |  30 ++---
 download/main.md                                   |   8 +-
 .../current/main.md                                |   8 +-
 .../current/deployment/bare_metal.md               |   4 +-
 .../current/deployment/k8s.md                      |   1 +
 .../current/modules/agent/file_collect.md          | 109 ++++++++++++++++
 .../current/modules/agent/overview.md              |  51 +++-----
 .../current/modules/agent/quick_start.md           | 145 +++------------------
 .../current/modules/dataproxy/overview.md          | 108 ++++++++-------
 .../current/sdk/dataproxy-sdk/overview.md          |  30 ++---
 16 files changed, 442 insertions(+), 482 deletions(-)

diff --git a/docs/deployment/bare_metal.md b/docs/deployment/bare_metal.md
index a9024fc..9a5c78c 100644
--- a/docs/deployment/bare_metal.md
+++ b/docs/deployment/bare_metal.md
@@ -8,29 +8,29 @@ sidebar_position: 4
 - MySQL 5.7+
 - Flink 1.9.x
 
-## deploy InLong TubeMQ Server
+## Deploy InLong TubeMQ Server
 [deploy InLong TubeMQ Server](modules/tubemq/quick_start.md)
 
-## deploy InLong TubeMQ Manager
+## Deploy InLong TubeMQ Manager
 [deploy InLong TubeMQ Manager](modules/tubemq/tubemq-manager/quick_start.md)
 
-## deploy InLong Manager
+## Deploy InLong Manager
 [deploy InLong Manager](modules/manager/quick_start.md)
 
-## deploy InLong WebSite
+## Deploy InLong WebSite
 [deploy InLong WebSite](modules/website/quick_start.md)
 
-## deploy InLong Sort
+## Deploy InLong Sort
 [deploy InLong Sort](modules/sort/quick_start.md)
 
-## deploy InLong DataProxy
+## Deploy InLong DataProxy
 [deploy InLong DataProxy](modules/dataproxy/quick_start.md)
 
-## deploy InLong Agent
+## Deploy InLong Agent
 [deploy InLong Agent](modules/agent/quick_start.md)
 
 ## Business configuration
-[How to configure a new business](docs/user_guide/user_manual)
+[How to configure a new business](user_guide/user_manual.md)
 
 ## Data report verification
 At this stage, you can collect data through the file agent and verify whether the received data is consistent with the sent data in the specified Hive table.
\ No newline at end of file
diff --git a/docs/deployment/k8s.md b/docs/deployment/k8s.md
index f74e0c6..4d0cda5 100644
--- a/docs/deployment/k8s.md
+++ b/docs/deployment/k8s.md
@@ -7,6 +7,7 @@ sidebar_position: 3
 
 - Kubernetes 1.10+
 - Helm 3.0+
+- [InLong Helm Chart](https://github.com/apache/incubator-inlong/tree/master/docker/kubernetes)
 - A dynamic provisioner for the PersistentVolumes(`production environment`)
 
 ## Install
diff --git a/docs/modules/agent/file_collect.md b/docs/modules/agent/file_collect.md
new file mode 100644
index 0000000..36f5591
--- /dev/null
+++ b/docs/modules/agent/file_collect.md
@@ -0,0 +1,109 @@
+---
+title: File Collect
+sidebar_position: 3
+---
+
+## File Collect Configuration
+```
+/data/inlong-agent/test.log //Represents reading the new file test.log in the inlong-agent folder
+/data/inlong-agent/test[0-9]{1} // means to read the new file test in the inlong-agent folder followed by a number at the end
+/data/inlong-agent/test //If test is a directory, it means to read all new files under test
+/data/inlong-agent/^\\d+(\\.\\d+)? // Start with one or more digits, followed by. or end with one. or more digits (? stands for optional, can match Examples: "5", "1.5" and "2.21"
+```
+
+## Get data time from file name
+
+Agent supports obtaining the time from the file name as the production time of the data. The configuration instructions are as follows:
+```
+/data/inlong-agent/***YYYYMMDDHH***
+```
+Where YYYYDDMMHH represents the data time, YYYY represents the year, MM represents the month, DD represents the day, and HH represents the hour
+Where *** is any character
+
+At the same time, you need to add the current data cycle to the job conf, the current support day cycle and hour cycle,
+When adding a task, add the property job.cycleUnit
+
+job.cycleUnit contains the following two types:
+- D: Represents the data time and day dimension
+- H: Represents the data time and hour dimension
+
+E.g:
+The configuration data source is
+```
+/data/inlong-agent/2021020211.log
+```
+Write data to 2021020211.log
+Configure job.cycleUnit as D
+Then the agent will try the 202020211.log file at the time of 202020211. When reading the data in the file, it will write all the data to the backend proxy at the time of 20210202.
+If job.cycleUnit is configured as H
+When collecting data in the 2021020211.log file, all data will be written to the backend proxy at the time of 2021020211。
+
+Examples of job submission:
+```bash
+curl --location --request POST'http://localhost:8008/config/job' \
+--header'Content-Type: application/json' \
+--data'{
+"job": {
+"dir": {
+"path": "",
+"pattern": "/data/inlong-agent/2021020211.log"
+},
+"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
+"id": 1,
+"thread": {
+"running": {
+"core": "4"
+}
+},
+"name": "fileAgentTest",
+"cycleUnit": "D",
+"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
+"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
+"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
+},
+"proxy": {
+"group": "group10",
+"group": "group10"
+},
+"op": "add"
+}'
+```
+
+## Time offset reading
+After the configuration is read by time, if you want to read data at other times than the current time, you can configure the time offset to complete
+Configure the job attribute name as job.timeOffset, the value is number + time dimension, time dimension includes day and hour
+For example, the following settings are supported:
+- 1d Read the data one day after the current time
+- -1h read the data one hour before the current time
+
+Examples of job submission
+```bash
+curl --location --request POST'http://localhost:8008/config/job' \
+--header'Content-Type: application/json' \
+--data'{
+"job": {
+"dir": {
+"path": "",
+"pattern": "/data/inlong-agent/test.log"
+},
+"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
+"id": 1,
+"thread": {
+"running": {
+"core": "4"
+}
+},
+"name": "fileAgentTest",
+"cycleUnit": "D",
+"timeOffset": "-1d",
+"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
+"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
+"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
+},
+"proxy": {
+"groupId": "groupId10",
+"streamId": "streamId10"
+},
+"op": "add"
+}'
+```
\ No newline at end of file
diff --git a/docs/modules/agent/overview.md b/docs/modules/agent/overview.md
index 0001917..1f071e2 100644
--- a/docs/modules/agent/overview.md
+++ b/docs/modules/agent/overview.md
@@ -1,47 +1,42 @@
 ---
-title: Overview 
+title: Overview
+sidebar_position: 1
 ---
 
-## 1 Overview of InLong-Agent
 InLong-Agent is a collection tool that supports multiple types of data sources, and is committed to achieving stable and efficient data collection functions between multiple heterogeneous data sources including file, sql, Binlog, metrics, etc.
 
-### 1.1 The brief architecture diagram is as follows:
-![](img/architecture.png)
-
-### 1.2 design concept
+## Design Concept
 In order to solve the problem of data source diversity, InLong-agent abstracts multiple data sources into a unified source concept, and abstracts sinks to write data. When you need to access a new data source, you only need to configure the format and reading parameters of the data source to achieve efficient reading.
 
-### 1.3 Current status of use
-InLong-Agent is widely used within the Tencent Group, undertaking most of the data collection business, and the amount of online data reaches tens of billions.
+## InLong-Agent Architecture
+![](img/architecture.png)
 
-## 2 InLong-Agent architecture
 The InLong Agent task is used as a data acquisition framework, constructed with a channel + plug-in architecture. Read and write the data source into a reader/writer plug-in, and then into the entire framework.
 
-+ Reader: Reader is the data collection module, responsible for collecting data from the data source and sending the data to the channel.
-+ Writer: Writer is a data writing module, which reuses data continuously to the channel and writes the data to the destination.
-+ Channel: The channel used to connect the reader and writer, and as the data transmission channel of the connection, which realizes the function of data reading and monitoring
-
+- Reader: Reader is the data collection module, responsible for collecting data from the data source and sending the data to the channel.
+- Writer: Writer is a data writing module, which reuses data continuously to the channel and writes the data to the destination.
+- Channel: The channel used to connect the reader and writer, and as the data transmission channel of the connection, which realizes the function of data reading and monitoring
 
-## 3 Different kinds of agent
-### 3.1 file agent
+## Different kinds of agent
+### File
 File collection includes the following functions:
 
 User-configured path monitoring, able to monitor the created file information
 Directory regular filtering, support YYYYMMDD+regular expression path configuration
 Breakpoint retransmission, when InLong-Agent restarts, it can automatically re-read from the last read position to ensure no reread or missed reading.
 
-### 3.2 sql agent
+### Sql
 This type of data refers to the way it is executed through SQL
 SQL regular decomposition, converted into multiple SQL statements
 Execute SQL separately, pull the data set, the pull process needs to pay attention to the impact on mysql itself
 The execution cycle, which is generally executed regularly
 
-### 3.3 binlog agent
+### Binlog
 This type of collection reads binlog and restores data by configuring mysql slave
 Need to pay attention to multi-threaded parsing when binlog is read, and multi-threaded parsing data needs to be labeled in order
 The code is based on the old version of dbsync, the main modification is to change the sending of tdbus-sender to push to agent-channel for integration
 
-## 4 Monitoring indicator configuration instructions
+## Monitor Metrics configuration
 
 Agent provides the ability of monitoring indicators in JMX mode, and the monitoring indicators have been registered to MBeanServer
 Users can add similar JMX (port and authentication are adjusted according to the situation) to the startup parameters of the Agent to realize the collection of monitoring indicators from the remote end.
@@ -56,25 +51,22 @@ Users can add similar JMX (port and authentication are adjusted according to the
 
 The agent indicators are divided into the following items, and the indicators are as follows:
 
-AgentTaskMetric
-
-|  property   | info  |
+### AgentTaskMetric
+|  property   | description  |
 |  ----  | ----  |
 | runningTasks  | tasks currently being executed |
 | retryingTasks  | Tasks that are currently being retried |
 | fatalTasks  | The total number of currently failed tasks |
 
 
-JobMetrics
-
-|  property   | info  |
+### JobMetrics
+|  property   | description  |
 |  ----  | ----  |
 | runningJobs  | the total number of currently running jobs |
 | fatalJobs  | the total number of currently failed jobs |
 
-PluginMetric
-
-|  property   | info  |
+### PluginMetric
+|  property   | description  |
 |  ----  | ----  |
 | readNum  | the number of reads |
 | sendNum  | the number of sent items |
diff --git a/docs/modules/agent/quick_start.md b/docs/modules/agent/quick_start.md
index bd3185d..bb0ec8f 100644
--- a/docs/modules/agent/quick_start.md
+++ b/docs/modules/agent/quick_start.md
@@ -1,26 +1,27 @@
 ---
 title: Deployment
+sidebar_position: 2
 ---
 
-## 1 Configuration
 ```
 cd inlong-agent
 ```
 
-The agent supports two modes of operation: local operation and online operation
-
-
-### 1.1 Agent configuration
+### Configuration
 
 Online operation needs to pull the configuration from inlong-manager, the configuration conf/agent.properties is as follows:
 ```ini
+# whether enable http service
+agent.http.enable=true
+# http default port
+agent.http.port=Available ports
 agent.fetcher.classname=org.apache.inlong.agent.plugin.fetcher.ManagerFetcher (the class name for fetch tasks, default ManagerFetcher)
 agent.local.ip=Write local ip
 agent.manager.vip.http.host=manager web host
 agent.manager.vip.http.port=manager web port
 ```
 
-## 2 run
+## Start
 After decompression, run the following command
 
 ```bash
@@ -28,17 +29,8 @@ sh agent.sh start
 ```
 
 
-## 3 Add job configuration in real time
+## Add job configuration in real time
 
-### 3.1 agent.properties Modify the following two places
-```ini
-# whether enable http service
-agent.http.enable=true
-# http default port
-agent.http.port=Available ports
-```
-
-### 3.2 Execute the following command
 ```bash
     curl --location --request POST 'http://localhost:8008/config/job' \
     --header 'Content-Type: application/json' \
@@ -69,117 +61,10 @@ agent.http.port=Available ports
     }'
 ```
 
-    The meaning of each parameter is :
-    - job.dir.pattern: Configure the read file path, which can include regular expressions
-    - job.trigger: Trigger name, the default is DirectoryTrigger, the function is to monitor the files under the folder to generate events
-    - job.source: The type of data source used, the default is TextFileSource, which reads text files
-    - job.sink:The type of writer used, the default is ProxySink, which sends messages to the proxy
-    - proxy.groupId: The groupId type used when writing proxy, groupId is group id showed on data access in inlong-manager, not the topic name.
-    - proxy.streamId: The streamId type used when writing proxy, streamId is the data flow id showed on data flow window in inlong-manager
-
-
-## 4 eg for directory config
-
-    E.g:
-    /data/inlong-agent/test.log //Represents reading the new file test.log in the inlong-agent folder
-    /data/inlong-agent/test[0-9]{1} // means to read the new file test in the inlong-agent folder followed by a number at the end
-    /data/inlong-agent/test //If test is a directory, it means to read all new files under test
-    /data/inlong-agent/^\\d+(\\.\\d+)? // Start with one or more digits, followed by. or end with one. or more digits (? stands for optional, can match Examples: "5", "1.5" and "2.21"
-
-
-## 5 Support to get data time from file name
-
-    Agent supports obtaining the time from the file name as the production time of the data. The configuration instructions are as follows:
-    /data/inlong-agent/***YYYYMMDDHH***
-    Where YYYYDDMMHH represents the data time, YYYY represents the year, MM represents the month, DD represents the day, and HH represents the hour
-    Where *** is any character
-
-    At the same time, you need to add the current data cycle to the job conf, the current support day cycle and hour cycle,
-    When adding a task, add the property job.cycleUnit
-    
-    job.cycleUnit contains the following two types:
-    1. D: Represents the data time and day dimension
-    2. H: Represents the data time and hour dimension
-
-    E.g:
-    The configuration data source is
-    /data/inlong-agent/YYYYMMDDHH.log
-    Write data to 2021020211.log
-    Configure job.cycleUnit as D
-    Then the agent will try the 202020211.log file at the time of 202020211. When reading the data in the file, it will write all the data to the backend proxy at the time of 20210202.
-    If job.cycleUnit is configured as H
-    When collecting data in the 2021020211.log file, all data will be written to the backend proxy at the time of 2021020211
-
-    
-    Examples of job submission
-
-```bash
-curl --location --request POST'http://localhost:8008/config/job' \
---header'Content-Type: application/json' \
---data'{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/inlong-agent/test.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"cycleUnit": "D",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"group": "group10",
-"group": "group10"
-},
-"op": "add"
-}'
-```
-
-## 6 Support time offset reading
-
-    After the configuration is read by time, if you want to read data at other times than the current time, you can configure the time offset to complete
-    Configure the job attribute name as job.timeOffset, the value is number + time dimension, time dimension includes day and hour
-    For example, the following settings are supported
-    1. 1d Read the data one day after the current time
-    2. -1h read the data one hour before the current time
-
-
-    Examples of job submission
-```bash
-curl --location --request POST'http://localhost:8008/config/job' \
---header'Content-Type: application/json' \
---data'{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/inlong-agent/test.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"cycleUnit": "D",
-"timeOffset": "-1d",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"groupId": "groupId10",
-"streamId": "streamId10"
-},
-"op": "add"
-}'
-```
\ No newline at end of file
+The meaning of each parameter is :
+- job.dir.pattern: Configure the read file path, which can include regular expressions
+- job.trigger: Trigger name, the default is DirectoryTrigger, the function is to monitor the files under the folder to generate events
+- job.source: The type of data source used, the default is TextFileSource, which reads text files
+- job.sink:The type of writer used, the default is ProxySink, which sends messages to the proxy
+- proxy.groupId: The groupId type used when writing proxy, groupId is group id showed on data access in inlong-manager, not the topic name.
+- proxy.streamId: The streamId type used when writing proxy, streamId is the data flow id showed on data flow window in inlong-manager
\ No newline at end of file
diff --git a/docs/modules/dataproxy/overview.md b/docs/modules/dataproxy/overview.md
index c78cedf..7ab56c7 100644
--- a/docs/modules/dataproxy/overview.md
+++ b/docs/modules/dataproxy/overview.md
@@ -1,27 +1,24 @@
 ---
 title: Overview
 ---
-## 1 intro
 
-    Inlong-dataProxy belongs to the inlong proxy layer and is used for data collection, reception and forwarding. Through format conversion, the data is converted into TDMsg1 format that can be cached and processed by the cache layer
-    InLong-dataProxy acts as a bridge from the InLong collection end to the InLong buffer end. Dataproxy pulls the relationship between the business group id and the corresponding topic name from the manager module, and internally manages the producers of multiple topics
-    The overall architecture of inlong-dataproxy is based on Apache Flume. On the basis of this project, inlong-bus expands the source layer and sink layer, and optimizes disaster tolerance forwarding, which improves the stability of the system.
+Inlong-dataProxy belongs to the inlong proxy layer and is used for data collection, reception and forwarding. Through format conversion, the data is converted into TDMsg1 format that can be cached and processed by the cache layer
+InLong-dataProxy acts as a bridge from the InLong collection end to the InLong buffer end. Dataproxy pulls the relationship between the business group id and the corresponding topic name from the manager module, and internally manages the producers of multiple topics
+The overall architecture of inlong-dataproxy is based on Apache Flume. On the basis of this project, inlong-bus expands the source layer and sink layer, and optimizes disaster tolerance forwarding, which improves the stability of the system.
 
-
-## 2 architecture
+## Architecture
 
 ![](img/architecture.png)
 
- 	1. The source layer opens port monitoring, which is realized through netty server. The decoded data is sent to the channel layer
- 	2. The channel layer has a selector, which is used to choose which type of channel to go. If the memory is eventually full, the data will be processed.
- 	3. The data of the channel layer will be forwarded through the sink layer. The main purpose here is to convert the data to the TDMsg1 format and push it to the cache layer (tube is more commonly used here)
-
-## 3 DataProxy support configuration instructions
+- The source layer opens port monitoring, which is realized through netty server. The decoded data is sent to the channel layer
+- The channel layer has a selector, which is used to choose which type of channel to go. If the memory is eventually full, the data will be processed.
+- The data of the channel layer will be forwarded through the sink layer. The main purpose here is to convert the data to the TDMsg1 format and push it to the cache layer (tube is more commonly used here)
 
-DataProxy supports configurable source-channel-sink, and the configuration method is the same as the configuration file structure of flume:
+## DataProxy Configuration
 
-Source configuration example and corresponding notes:
+DataProxy supports configurable source-channel-sink, and the configuration method is the same as the configuration file structure of flume.
 
+- Source configuration example:
 ```shell
 agent1.sources.tcp-source.channels = ch-msg1 ch-msg2 ch-msg3 ch-more1 ch-more2 ch-more3 ch-msg5 ch-msg6 ch-msg7 ch-msg8 ch-msg9 ch-msg10 ch-transfer ch -Back
 Define the channel used in the source. Note that if the configuration below this source uses the channel, it needs to be annotated here
@@ -42,7 +39,7 @@ agent1.sources.tcp-source.highWaterMark=2621440
 The concept of netty, set the netty high water level value
 
 agent1.sources.tcp-source.enableExceptionReturn=true
-The new function of v1.7 version, optional, the default is false, used to open the exception channel, when an exception occurs, the data is written to the exception channel to prevent other normal data transmission (the open source version does not add this function), Details: Increase the local disk of abnormal data landing
+The new function of v1.7 version, optional, the default is false, used to open the exception channel, when an exception occurs, the data is written to the exception channel to prevent other normal data transmission (the open source version does not add this function), Details  |  Increase the local disk of abnormal data landing
 
 agent1.sources.tcp-source.max-msg-length = 524288
 Limit the size of a single package, here if the compressed package is transmitted, it is the compressed package size, the limit is 512KB
@@ -82,9 +79,7 @@ agent1.sources.tcp-source.selector.fileMetric = ch-back
 Specify the fileMetric channel to receive the metric data reported by the agent
 ```
 
-Channel configuration examples and corresponding annotations
-
-memory channel
+- Channel configuration examples, memory channel:
 
 ```shell
 agent1.channels.ch-more1.type = memory
@@ -99,7 +94,7 @@ agent1.channels.ch-more1.transactionCapacity = 20
 The maximum number of batches are processed in atomic operations, and the memory channel needs to be locked when used, so there will be a batch process to increase efficiency
 ```
 
-file channel
+- Channel configuration examples, file channel:
 
 ```shell
 agent1.channels.ch-msg5.type = file
@@ -127,7 +122,7 @@ agent1.channels.ch-msg5.fsyncInterval = 5
 The time interval between data flush from memory to disk, in seconds
 ```
 
-Sink configuration example and corresponding notes
+- Sink configuration example:
 
 ```shell
 agent1.sinks.meta-sink-more1.channel = ch-msg1
@@ -158,7 +153,7 @@ agent1.sinks.meta-sink-more1.max-survived-size = 3000000
 Maximum number of caches
 ```
 
-## 4 Monitor metrics configuration instructions
+## Monitor Metrics configuration
 
   DataProxy provide monitor indicator based on JMX, user can implement the code that read the metrics and report to user-defined monitor system.
 Source-module and Sink-module can add monitor metric class that is the subclass of org.apache.inlong.commons.config.metrics.MetricItemSet, and register it to MBeanServer. User-defined plugin can get module metric with JMX, and report metric data to different monitor system.
@@ -171,51 +166,49 @@ metricDomains.DataProxy.domainListeners=org.apache.inlong.dataproxy.metrics.prom
 metricDomains.DataProxy.snapshotInterval=60000
 ```
 
-  * The JMX domain name of DataProxy is "DataProxy". 
-  * It is defined by the parameter "metricDomains".
-  * The listeners of JMX domain is defined by the parameter "metricDomains.$domainName.domainListeners".
-  * The class names of the listeners is separated by the space char.
-  * The listener class need to implement the interface "org.apache.inlong.dataproxy.metrics.MetricListener".
-  * The snapshot interval of the listeners is defined by the parameter "metricDomains.$domainName.snapshotInterval", the parameter unit is "millisecond".
-
-  The method proto of org.apache.inlong.dataproxy.metrics.MetricListener is:
+- The JMX domain name of DataProxy is "DataProxy". 
+- It is defined by the parameter "metricDomains".
+- The listeners of JMX domain is defined by the parameter "metricDomains.$domainName.domainListeners".
+- The class names of the listeners is separated by the space char.
+- The listener class need to implement the interface "org.apache.inlong.dataproxy.metrics.MetricListener".
+- The snapshot interval of the listeners is defined by the parameter "metricDomains.$domainName.snapshotInterval", the parameter unit is "millisecond".
 
+The method proto of org.apache.inlong.dataproxy.metrics.MetricListener is:
 ```java
 public void snapshot(String domain, List itemValues);
 ```
 
-  The field of MetricItemValue.dimensions has these dimensions(The fields of DataProxyMetricItem defined by the Annotation "@Dimension"):
-
-```shell
-clusterId: DataProxy cluster ID.
-sourceId: DataProxy source component name.
-sourceDataId: DataProxy source component data id, when source is a TCP source, it will be port number.
-inlongGroupId: Inlong data group ID.
-inlongStreamId: Inlong data stream ID.
-sinkId: DataProxy sink component name.
-sinkDataId: DataProxy sink component data id, when sink is a pulsar sink, it will be topic name.
-```
-
-  The field of MetricItemValue.metrics has these metrics(The fields of DataProxyMetricItem defined by the Annotation "@CountMetric"):
-
-```shell
-readSuccessCount: Successful event count reading from source component.
-readSuccessSize: Successful event body size reading from source component.
-readFailCount: Failure event count reading from source component.
-readFailSize: Failure event body size reading from source component.
-sendCount: Event count sending to sink destination.
-sendSize: Event body size sending to sink destination.
-sendSuccessCount: Successful event count sending to sink destination.
-sendSuccessSize: Successful event body size sending to sink destination.	
-sendFailCount: Failure event count sending to sink destination.
-sendFailSize: Failure event body size sending to sink destination.
-sinkDuration: The unit is millisecond, the duration is between current timepoint and the timepoint in sending to sink destination.
-nodeDuration: The unit is millisecond, the duration is between current timepoint and the timepoint in getting event from source.
-wholeDuration: The unit is millisecond, the duration is between current timepoint and the timepoint in generating event.
-```
-
-  Monitor indicators have registered to MBeanServer, user can append JMX parameters when running DataProxy, remote server can get monitor metrics with RMI.
-
+The field of MetricItemValue.dimensions has these dimensions(The fields of DataProxyMetricItem defined by the Annotation "@Dimension"):
+
+|  property   | description  |
+|  ----  | ----  |
+|  clusterId  |  DataProxy cluster ID. |
+|  sourceId  |  DataProxy source component name. |
+|  sourceDataId  |  DataProxy source component data id, when source is a TCP source, it will be port number. |
+|  inlongGroupId  |  Inlong data group ID. |
+|  inlongStreamId  |  Inlong data stream ID. |
+|  sinkId  |  DataProxy sink component name. |
+|  sinkDataId  |  DataProxy sink component data id, when sink is a pulsar sink, it will be topic name. |
+
+The field of MetricItemValue.metrics has these metrics(The fields of DataProxyMetricItem defined by the Annotation "@CountMetric"):
+
+|  property   | description  |
+|  ----  | ----  |
+|  readSuccessCount  |  Successful event count reading from source component. |
+|  readSuccessSize  |  Successful event body size reading from source component. |
+|  readFailCount  |  Failure event count reading from source component. |
+|  readFailSize  |  Failure event body size reading from source component. |
+|  sendCount  |  Event count sending to sink destination. |
+|  sendSize  |  Event body size sending to sink destination. |
+|  sendSuccessCount  |  Successful event count sending to sink destination. |
+|  sendSuccessSize  |  Successful event body size sending to sink destination.	 |
+|  sendFailCount  |  Failure event count sending to sink destination. |
+|  sendFailSize  |  Failure event body size sending to sink destination. |
+|  sinkDuration  |  The unit is millisecond, the duration is between current timepoint and the timepoint in sending to sink destination. |
+|  nodeDuration  |  The unit is millisecond, the duration is between current timepoint and the timepoint in getting event from source. |
+|  wholeDuration  |  The unit is millisecond, the duration is between current timepoint and the timepoint in generating event. |
+
+Monitor indicators have registered to MBeanServer, user can append JMX parameters when running DataProxy, remote server can get monitor metrics with RMI.
 ```shell
 -Dcom.sun.management.jmxremote
 -Djava.rmi.server.hostname=127.0.0.1
diff --git a/docs/sdk/dataproxy-sdk/overview.md b/docs/sdk/dataproxy-sdk/overview.md
index da6d8d1..5938ba0 100644
--- a/docs/sdk/dataproxy-sdk/overview.md
+++ b/docs/sdk/dataproxy-sdk/overview.md
@@ -1,16 +1,13 @@
 ---
 title: Overview
 ---
-## 1 intro
 When the business uses the message access method, the business generally only needs to format the data in a proxy-recognizable format (such as six-segment protocol, digital protocol, etc.)
 After group packet transmission, data can be connected to inlong. But in order to ensure data reliability, load balancing, and dynamic update of the proxy list and other security features
 The user program needs to consider more and ultimately leads to the program being too cumbersome and bloated.
 
 The original intention of API design is to simplify user access and assume some reliability-related logic. After the user integrates the API in the service delivery program, the data can be sent to the proxy without worrying about the grouping format, load balancing and other logic.
 
-## 2 functions
-
-### 2.1 overall functions
+## Functions
 
 |  function   | description  |
 |  ----  | ----  |
@@ -22,39 +19,40 @@ The original intention of API design is to simplify user access and assume some
 | proxy list persistence (new)| Persist the proxy list according to the business group id to prevent the configuration center from failing to send data when the program starts
 
 
-### 2.2 Data transmission function description
-
-#### Synchronous batch function
+## Data transmission
 
+### Synchronous batch function
+```
     public SendResult sendMessage(List<byte[]> bodyList, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
 
     Parameter Description
 
     bodyListIt is a collection of multiple pieces of data that users need to send. The total length is recommended to be less than 512k. groupId represents the service id, and streamId represents the interface id. dt represents the time stamp of the data, accurate to the millisecond level. It can also be set to 0 directly, and the api will get the current time as its timestamp in the background. timeout & timeUnit: These two parameters are used to set the timeout time for sending data, a [...]
+```
 
-#### Synchronize a single function
-
+### Synchronize a single function
+```
     public SendResult sendMessage(byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
 
     Parameter Description
 
     body is the content of a single piece of data that the user wants to send, and the meaning of the remaining parameters is basically the same as the batch sending interface.
+```
 
-
-#### Asynchronous batch function
-
+### Asynchronous batch function
+```
     public void asyncSendMessage(SendMessageCallback callback, List<byte[]> bodyList, String groupId, String streamId, long dt, long timeout,TimeUnit timeUnit)
 
     Parameter Description
 
     SendMessageCallback is a callback for processing messages. The bodyList is a collection of multiple pieces of data that users need to send. The total length of multiple pieces of data is recommended to be less than 512k. groupId is the service id, and streamId is the interface id. dt represents the time stamp of the data, accurate to the millisecond level. It can also be set to 0 directly, and the api will get the current time as its timestamp in the background. timeout and timeUnit  [...]
+```
 
-
-#### Asynchronous single function
-
-
+### Asynchronous single function
+```
     public void asyncSendMessage(SendMessageCallback callback, byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
 
     Parameter Description
 
     The body is the content of a single message, and the meaning of the remaining parameters is basically the same as the batch sending interface
+```
\ No newline at end of file
diff --git a/download/main.md b/download/main.md
index 4cd6cc0..073b118 100644
--- a/download/main.md
+++ b/download/main.md
@@ -1,5 +1,9 @@
-## Download links
-  Use the links below to download the Apache InLong Releases, the latest release is 0.11.0.
+---
+title: Download InLong
+sidebar_position: 1
+---
+
+Use the links below to download the Apache InLong Releases, the latest release is 0.11.0.
 
 ## 0.11.0 release
 - Released: Nov 5th, 2021
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs-download/current/main.md b/i18n/zh-CN/docusaurus-plugin-content-docs-download/current/main.md
index 5649bc4..f82737e 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs-download/current/main.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs-download/current/main.md
@@ -1,5 +1,9 @@
-## 下载链接
-  使用以下链接,下载InLong,最新版本为0.11.0. 
+---
+title: 下载 InLong
+sidebar_position: 1
+---
+
+使用以下链接,下载InLong,最新版本为0.11.0. 
 
 ## 0.11.0 release
 - 发布时间: 2021-11-05
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/bare_metal.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/bare_metal.md
index 1396846..288e2e2 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/bare_metal.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/bare_metal.md
@@ -1,5 +1,5 @@
 ---
-title: bare metal 部署
+title: Bare Metal 部署
 sidebar_position: 4
 ---
 
@@ -30,7 +30,7 @@ sidebar_position: 4
 [部署InLong Agent](modules/agent/quick_start.md)
 
 ## 业务配置
-[配置新业务](docs/user_guide/user_manual)
+[配置新业务](user_guide/user_manual.md)
 
 ## 数据上报验证
 到这里,您就可以通过文件Agent采集数据并在指定的Hive表中验证接收到的数据是否与发送的数据一致。
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/k8s.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/k8s.md
index 340da82..f7c22de 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/k8s.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/k8s.md
@@ -7,6 +7,7 @@ sidebar_position: 3
 
 - Kubernetes 1.10+
 - Helm 3.0+
+- [InLong Helm Chart](https://github.com/apache/incubator-inlong/tree/master/docker/kubernetes)
 - A dynamic provisioner for the PersistentVolumes(`production environment`)
 
 ## 安装
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/file_collect.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/file_collect.md
new file mode 100644
index 0000000..49d8e7a
--- /dev/null
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/file_collect.md
@@ -0,0 +1,109 @@
+---
+title: 文件采集
+sidebar_position: 3
+---
+
+## 文件采集配置
+```
+/data/inlong-agent/test.log  //代表读取inlong-agent文件夹下的的新增文件test.log
+/data/inlong-agent/test[0-9]{1} //代表读取inlong-agent文件夹下的新增文件test后接一个数字结尾
+/data/inlong-agent/test //如果test为目录,则代表读取test下的所有新增文件
+/data/inlong-agent/^\\d+(\\.\\d+)? // 以一个或多个数字开头,之后可以是.或者一个.或多个数字结尾,?代表可选,可以匹配的实例:"5", "1.5" 和 "2.21"
+```
+
+## 从文件名称中获取数据时间
+Agent支持从文件名称中获取时间当作数据的生产时间,配置说明如下:
+```
+/data/inlong-agent/***YYYYMMDDHH***
+```
+其中YYYYDDMMHH代表数据时间,YYYY表示年,MM表示月份,DD表示天,HH表示小时
+其中***为任意字符
+
+同时需要在job conf中加入当前数据的周期,当前支持天周期以及小时周期,
+在添加任务时,加入属性job.cycleUnit
+
+job.cycleUnit 包含如下两种类型:
+- D : 代表数据时间天维度
+- H : 代表数据时间小时维度
+
+例如:
+配置数据源为
+```
+/data/inlong-agent/2021020211.log
+```
+写入数据到 2021020211.log
+配置 job.cycleUnit 为 D
+则agent会在2021020211时间尝试2021020211.log文件,读取文件中的数据时,会将所有数据以20210202这个时间写入到后端proxy
+如果配置 job.cycleUnit 为 H
+则采集2021020211.log文件中的数据时,会将所有数据以2021020211这个时间写入到后端proxy。
+
+提交job举例:
+```bash
+curl --location --request POST 'http://localhost:8008/config/job' \
+--header 'Content-Type: application/json' \
+--data '{
+"job": {
+"dir": {
+"path": "",
+"pattern": "/data/inlong-agent/2021020211.log"
+},
+"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
+"id": 1,
+"thread": {
+"running": {
+"core": "4"
+}
+},
+"name": "fileAgentTest",
+"cycleUnit": "D",
+"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
+"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
+"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
+},
+"proxy": {
+"groupId": "groupId",
+"streamId": "streamId"
+},
+"op": "add"
+}'
+```
+
+
+## 时间偏移量offset读取
+在配置按照时间读取之后,如果想要读取当前时间之外的其他时间的数据,可以通过配置时间偏移量完成
+配置job属性名称为job.timeOffset,值为数字 + 时间维度,时间维度包括天和小时
+例如支持如下设置:
+- 1d 读取当前时间后一天的数据 
+- -1h 读取当前时间前一个小时的数据
+
+提交job举例
+```bash
+curl --location --request POST 'http://localhost:8008/config/job' \
+--header 'Content-Type: application/json' \
+--data '{
+"job": {
+"dir": {
+"path": "",
+"pattern": "/data/inlong-agent/test.log"
+},
+"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
+"id": 1,
+"thread": {
+"running": {
+"core": "4"
+}
+},
+"name": "fileAgentTest",
+"cycleUnit": "D",
+"timeOffset": "-1d",
+"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
+"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
+"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
+},
+"proxy": {
+"groupId": "groupId",
+"streamId": "streamId"
+},
+"op": "add"
+}'
+```
\ No newline at end of file
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/overview.md
index 040f595..c9a1136 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/overview.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/overview.md
@@ -1,50 +1,42 @@
 ---
 title: 总览
+sidebar_position: 1
 ---
-## 1 InLong-Agent 概览
-InLong-Agent是一个支持多种数据源类型的收集工具,致力于实现包括file、sql、Binlog、metrics等多种异构数据源之间稳定高效的数据采集功能。
-
-### 简要的架构图如下:
-![](img/architecture.png)
-
 
+InLong-Agent是一个支持多种数据源类型的收集工具,致力于实现包括file、sql、Binlog、metrics等多种异构数据源之间稳定高效的数据采集功能。
 
-### 设计理念
+## 设计理念
 为了解决数据源多样性问题,InLong-agent 将多种数据源抽象成统一的source概念,并抽象出sink来对数据进行写入。当需要接入一个新的数据源的时候,只需要配置好数据源的格式与读取参数便能跟做到高效读取。
 
-### 当前使用现状
-InLong-Agent在腾讯集团内被广泛使用,承担了大部分的数据采集业务,线上数据量达百亿级别。
+## InLong-Agent 架构介绍
+![](img/architecture.png)
 
-## 2 InLong-Agent 架构介绍
 InLong Agent本身作为数据采集框架,采用channel + plugin架构构建。将数据源读取和写入抽象成为Reader/Writer插件,纳入到整个框架中。
 
-+ Reader:Reader为数据采集模块,负责采集数据源的数据,将数据发送给channel。
-+ Writer: Writer为数据写入模块,负责不断向channel取数据,并将数据写入到目的端。
-+ Channel:Channel用于连接reader和writer,作为两者的数据传输通道,并起到了数据的写入读取监控作用
+- Reader:Reader为数据采集模块,负责采集数据源的数据,将数据发送给channel。
+- Writer: Writer为数据写入模块,负责不断向channel取数据,并将数据写入到目的端。
+- Channel:Channel用于连接reader和writer,作为两者的数据传输通道,并起到了数据的写入读取监控作用
 
 
-## 3 InLong-Agent 采集分类说明
-### 3.1 文件采集
+## InLong-Agent 采集分类
+### 文件
 文件采集包含如下功能:
-
 用户配置的路径监听,能够监听出创建的文件信息
 目录正则过滤,支持YYYYMMDD+正则表达式的路径配置
-断点重传,InLong-Agent重启时,能够支持自动从上次读取位置重新读取,保证不重读不漏读。
-### 3.2 sql采集
+断点重传,InLong-Agent重启时,能够支持自动从上次读取位置重新读取,保证不重读不漏读。\
+
+### Sql
 这类数据是指通过SQL执行的方式
 SQL正则分解,转化成多条SQL语句
 分别执行SQL,拉取数据集,拉取过程需要注意对mysql本身的影响
 执行周期,这种一般是定时执行
-### 3.3 binlog 采集
+
+### Binlog
 这类采集通过配置mysql slave的方式,读取binlog,并还原数据
 需要注意binlog读取的时候多线程解析,多线程解析的数据需要打上顺序标签
 代码基于老版本的dbsync,主要的修改是将tdbus-sender的发送改为推送到agent-channel的方式做融合
-### 3.4 Metrics采集类
-这种方式采集属于文件采集,只不过metric采集的时候,单行的数据有格式规范
-
-
-## 4 监控指标配置说明
 
+## 监控指标配置
 Agent提供了JMX方式的监控指标能力,监控指标已经注册到MBeanServer
 用户可以在Agent的启动参数中增加如下类似JMX定义(端口和鉴权根据情况进行调整),实现监控指标从远端采集。
 
@@ -58,25 +50,20 @@ Agent提供了JMX方式的监控指标能力,监控指标已经注册到MBeanS
 
 Agent指标分为以下几项, 各项的属性分别为:
 
-
-AgentTaskMetric
-
+### AgentTaskMetric
 |  属性名称   | 说明  |
 |  ----  | ----  |
 | runningTasks  | 当前正在执行的任务 |
 | retryingTasks  | 当前正在重试的任务 |
 | fatalTasks  | 当前失败的任务总数 |
 
-
-JobMetrics
-
+### JobMetrics
 |  属性名称   | 说明  |
 |  ----  | ----  |
 | runningJobs  | 当前正在运行的job总数 |
 | fatalJobs  | 当前失败的job总数 |
 
-PluginMetric
-
+### PluginMetric
 |  属性名称   | 说明  |
 |  ----  | ----  |
 | readNum  | 当前正在运行的job总数 |
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md
index 61e3cff..a3754a4 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md
@@ -1,43 +1,34 @@
 ---
 title: 安装部署
+sidebar_position: 2
 ---
 
-## 1 配置
 ```
 cd inlong-agent
 ```
 
-agent 支持本地运行以及线上运行,其中线上运行从inlong manager拉取任务,本地运行可使用http请求提交任务
-
-### 1.1 Agent 线上运行相关设置
+### 配置
 
 线上运行需要从inlong-manager拉取配置,配置conf/agent.properties如下:
 ```ini
+# whether enable http service
+agent.http.enable=true
+# http default port
+agent.http.port=可用端口
 agent.fetcher.classname=org.apache.inlong.agent.plugin.fetcher.ManagerFetcher (设置任务获取的类名,默认为ManagerFetcher)
 agent.local.ip=写入本机ip
 agent.manager.vip.http.host=manager web host
 agent.manager.vip.http.port=manager web port
 ```
 
-## 2 运行
+## 启动
 
 解压后如下命令运行
 ```bash
 sh agent.sh start
 ```
 
-## 3 实时添加job配置
-
-### 3.1 agent.properties 修改下面两处
-
-```ini
-# whether enable http service
-agent.http.enable=true
-# http default port
-agent.http.port=可用端口
-```
-
-### 3.2 执行如下命令:
+## 实时添加job配置
 
 ```bash
 curl --location --request POST 'http://localhost:8008/config/job' \
@@ -68,116 +59,10 @@ curl --location --request POST 'http://localhost:8008/config/job' \
 }'
 ```
 
-    其中各个参数含义为:
-    - job.dir.pattern: 配置读取的文件路径,可包含正则表达式
-    - job.trigger: 触发器名称,默认为DirectoryTrigger,功能为监听文件夹下的文件产生事件,任务运行时已有的文件不会读取
-    - job.source: 使用的数据源类型,默认为TextFileSource,读取文本文件
-    - job.sink:使用的写入器类型,默认为ProxySink,发送消息到dataproxy中
-    - proxy.groupId: 写入proxy时使用的groupId,groupId是指manager界面中,数据接入中业务信息的业务ID,此处不是创建的tube topic名称
-    - proxy.streamId: 写入proxy时使用的streamId,streamId是指manager界面中,数据接入中数据流的数据流ID
-
-## 4 可支持的路径配置方案
-
-    例如:
-    /data/inlong-agent/test.log  //代表读取inlong-agent文件夹下的的新增文件test.log
-    /data/inlong-agent/test[0-9]{1} //代表读取inlong-agent文件夹下的新增文件test后接一个数字结尾
-    /data/inlong-agent/test //如果test为目录,则代表读取test下的所有新增文件
-    /data/inlong-agent/^\\d+(\\.\\d+)? // 以一个或多个数字开头,之后可以是.或者一个.或多个数字结尾,?代表可选,可以匹配的实例:"5", "1.5" 和 "2.21"
-
-
-## 5 支持从文件名称中获取数据时间
-
-    Agent支持从文件名称中获取时间当作数据的生产时间,配置说明如下:
-    /data/inlong-agent/***YYYYMMDDHH***
-    其中YYYYDDMMHH代表数据时间,YYYY表示年,MM表示月份,DD表示天,HH表示小时
-    其中***为任意字符
-
-    同时需要在job conf中加入当前数据的周期,当前支持天周期以及小时周期,
-    在添加任务时,加入属性job.cycleUnit
-    
-    job.cycleUnit 包含如下两种类型:
-    1、D : 代表数据时间天维度
-    2、H : 代表数据时间小时维度
-
-    例如:
-    配置数据源为
-    /data/inlong-agent/YYYYMMDDHH.log
-    写入数据到 2021020211.log
-    配置 job.cycleUnit 为 D
-    则agent会在2021020211时间尝试2021020211.log文件,读取文件中的数据时,会将所有数据以20210202这个时间写入到后端proxy
-    如果配置 job.cycleUnit 为 H
-    则采集2021020211.log文件中的数据时,会将所有数据以2021020211这个时间写入到后端proxy
-
-    
-    提交job举例
-```bash
-curl --location --request POST 'http://localhost:8008/config/job' \
---header 'Content-Type: application/json' \
---data '{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/inlong-agent/test.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"cycleUnit": "D",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"groupId": "groupId",
-"streamId": "streamId"
-},
-"op": "add"
-}'
-```
-
-
-## 6 支持时间偏移量offset读取
-
-    在配置按照时间读取之后,如果想要读取当前时间之外的其他时间的数据,可以通过配置时间偏移量完成
-    配置job属性名称为job.timeOffset,值为数字 + 时间维度,时间维度包括天和小时
-    例如支持如下设置
-    1、 1d 读取当前时间后一天的数据 
-    2、 -1h 读取当前时间前一个小时的数据
-
-
-    提交job举例
-```bash
-curl --location --request POST 'http://localhost:8008/config/job' \
---header 'Content-Type: application/json' \
---data '{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/inlong-agent/test.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"cycleUnit": "D",
-"timeOffset": "-1d",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"groupId": "groupId",
-"streamId": "streamId"
-},
-"op": "add"
-}'
-```
\ No newline at end of file
+其中各个参数含义为:
+- job.dir.pattern: 配置读取的文件路径,可包含正则表达式
+- job.trigger: 触发器名称,默认为DirectoryTrigger,功能为监听文件夹下的文件产生事件,任务运行时已有的文件不会读取
+- job.source: 使用的数据源类型,默认为TextFileSource,读取文本文件
+- job.sink:使用的写入器类型,默认为ProxySink,发送消息到dataproxy中
+- proxy.groupId: 写入proxy时使用的groupId,groupId是指manager界面中,数据接入中业务信息的业务ID,此处不是创建的tube topic名称
+- proxy.streamId: 写入proxy时使用的streamId,streamId是指manager界面中,数据接入中数据流的数据流ID
\ No newline at end of file
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/overview.md
index 159793e..2a75cd4 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/overview.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/dataproxy/overview.md
@@ -1,28 +1,26 @@
 ---
 title: 总览
 ---
-## 1 说明
 
-    InLong-dataProxy属于inlong proxy层,用于数据的汇集接收以及转发。通过格式转换,将数据转为cache层可以缓存处理的TDMsg1格式
-    InLong-dataProxy充当了InLong采集端到InLong缓冲端的桥梁,dataproxy从manager模块拉取业务id与对应topic名称的关系,内部管理多个topic的生产者
-    当dataproxy收到消息时,会首先缓存到本地的Channel中,并使用本地的producer往后端即cache层发送数据
-    InLong-dataProxy整体架构基于Apache Flume。inlong-dataproxy在该项目的基础上,扩展了source层和sink层,并对容灾转发做了优化处理,提升了系统的稳定性。
+InLong-dataProxy属于inlong proxy层,用于数据的汇集接收以及转发。通过格式转换,将数据转为cache层可以缓存处理的TDMsg1格式
+InLong-dataProxy充当了InLong采集端到InLong缓冲端的桥梁,dataproxy从manager模块拉取业务id与对应topic名称的关系,内部管理多个topic的生产者
+当dataproxy收到消息时,会首先缓存到本地的Channel中,并使用本地的producer往后端即cache层发送数据
+InLong-dataProxy整体架构基于Apache Flume。inlong-dataproxy在该项目的基础上,扩展了source层和sink层,并对容灾转发做了优化处理,提升了系统的稳定性。
     
-    
-## 2 架构
+## 架构
 
 ![](img/architecture.png)
 
-    1.Source层开启端口监听,通过netty server实现。解码之后的数据发到channel层
-    2.channel层有一个selector,用于选择走哪种类型的channel,如果memory最终满了,会对数据做落地处理
-    3.channel层的数据会通过sink层做转发,这里主要是将数据转为TDMsg1的格式,并推送到cache层(这里用的比较多的是tube)
+- Source层开启端口监听,通过netty server实现。解码之后的数据发到channel层
+- channel层有一个selector,用于选择走哪种类型的channel,如果memory最终满了,会对数据做落地处理
+- channel层的数据会通过sink层做转发,这里主要是将数据转为TDMsg1的格式,并推送到cache层(这里用的比较多的是tube)
 
 
-## 3 DataProxy功能配置说明
+## DataProxy功能配置说明
 
-DataProxy支持配置化的source-channel-sink,配置方式与flume的配置文件结构相同:
+DataProxy支持配置化的source-channel-sink,配置方式与flume的配置文件结构相同。
 
-Source配置示例以及对应的注解:
+- Source配置示例:
 
 ```shell
 agent1.sources.tcp-source.channels = ch-msg1 ch-msg2 ch-msg3 ch-more1 ch-more2 ch-more3 ch-msg5 ch-msg6 ch-msg7 ch-msg8 ch-msg9 ch-msg10 ch-transfer ch-back
@@ -81,9 +79,7 @@ agent1.sources.tcp-source.selector.fileMetric = ch-back
 指定fileMetric channel,用于接收agent上报的指标数据
 ```
 
-Channel配置示例以及对应的注解
-
-memory channel
+- Channel配置示例,memory channel:
 
 ```shell
 agent1.channels.ch-more1.type = memory
@@ -98,7 +94,7 @@ agent1.channels.ch-more1.transactionCapacity = 20
 原子操作时批量处理最大条数,memory channel使用时需要用到加锁,因此会有批处理流程增加效率
 ```
 
-file channel
+- Channel配置示例,file channel:
 
 ```shell
 agent1.channels.ch-msg5.type = file
@@ -126,7 +122,7 @@ agent1.channels.ch-msg5.fsyncInterval = 5
 数据从内存flush到磁盘的时间间隔,单位秒
 ```
 
-Sink配置示例以及对应的注解
+- Sink配置示例:
 
 ```shell
 agent1.sinks.meta-sink-more1.channel = ch-msg1
@@ -157,7 +153,7 @@ agent1.sinks.meta-sink-more1.max-survived-size = 3000000
 缓存最大个数
 ```
     
-## 4 监控指标配置说明
+## 监控指标配置
 
   DataProxy提供了JMX方式的监控指标Listener能力,用户可以实现MetricListener接口,注册后可以定期接收监控指标,用户选择将指标上报自定义的监控系统。Source和Sink模块可以通过将指标数据统计到org.apache.inlong.commons.config.metrics.MetricItemSet的子类中,并注册到MBeanServer。用户自定义的MetricListener通过JMX方式收集指标数据并上报到外部监控系统
 
@@ -169,49 +165,47 @@ metricDomains.DataProxy.domainListeners=org.apache.inlong.dataproxy.metrics.prom
 metricDomains.DataProxy.snapshotInterval=60000
 ```
 
-  * 统一的JMX域名:DataProxy,并定义在参数metricDomains下;自定义的Source、Sink等组件也可以上报到不同的JMX域名。
-  * 对一个JMX域名的监控指标MetricListener可以配置在metricDomains.$domainName.domainListeners参数里,可以配置多个,用空格分隔类名。
-  * 这些监控指标MetricListener需要实现接口:org.apache.inlong.dataproxy.metrics.MetricListener。
-  * 快照参数:metricDomains.$domainName.snapshotInterval,定义拉取一次监控指标数据的间隔时间,参数单位是毫秒。
+- 统一的JMX域名:DataProxy,并定义在参数metricDomains下;自定义的Source、Sink等组件也可以上报到不同的JMX域名。
+- 对一个JMX域名的监控指标MetricListener可以配置在metricDomains.$domainName.domainListeners参数里,可以配置多个,用空格分隔类名。
+- 这些监控指标MetricListener需要实现接口:org.apache.inlong.dataproxy.metrics.MetricListener。
+- 快照参数:metricDomains.$domainName.snapshotInterval,定义拉取一次监控指标数据的间隔时间,参数单位是毫秒。
 
-  org.apache.inlong.dataproxy.metrics.MetricListener接口的方法原型
-  
+org.apache.inlong.dataproxy.metrics.MetricListener接口的方法原型:
 ```java  
 public void snapshot(String domain, List<MetricItemValue> itemValues);
 ```
 
-  监控指标项的MetricItemValue.dimensions有这些维度(DataProxyMetricItem的这些字段通过注解Annotation "@Dimension"定义):
-
-```shell
-clusterId: DataProxy集群ID
-sourceId: DataProxy的Source组件名
-sourceDataId: DataProxy的Source组件数据流ID,如果Source是一个TCPSource,那么这个ID会是一个端口号
-inlongGroupId: Inlong数据ID
-inlongStreamId: Inlong数据流ID
-sinkId: DataProxy的Sink组件名
-sinkDataId: DataProxy的Sink组件数据流ID,如果Sink是一个Pulsar发送组件,这个ID会是一个Topic名。
-```
-
-  监控指标项的MetricItemValue.metrics有这些指标(DataProxyMetricItem的这些字段通过注解Annotation "@CountMetric"定义):
-
-```shell
-readSuccessCount: 接收成功条数
-readSuccessSize: 接收成功大小,单位:byte
-readFailCount: 接收失败条数
-readFailSize: 接收失败大小,单位:byte
-sendCount: 发送条数
-sendSize: 发送大小,单位:byte
-sendSuccessCount: 发送成功条数
-sendSuccessSize: 发送成功大小,单位:byte
-sendFailCount: 发送失败条数
-sendFailSize: 发送失败大小,单位:byte
-sinkDuration: 发送成功回调时间和发送开始时间的时间差,用于评估目标集群的处理时延和健康状况,单位:毫秒
-nodeDuration: 发送成功回调时间和接收成功时间的时间差,用于评估DataProxy内部处理耗时和健康状况,单位:毫秒
-wholeDuration: 发送成功回调时间和事件生成时间的时间差,单位:毫秒
-```
-
-  监控指标已经注册到MBeanServer,用户可以在DataProxy的启动参数中增加如下类似JMX定义(端口和鉴权根据情况进行调整),实现监控指标从远端采集。
-  
+监控指标项的MetricItemValue.dimensions有这些维度(DataProxyMetricItem的这些字段通过注解Annotation "@Dimension"定义):
+
+|  property   | description  |
+|  ----  | ----  |
+|  clusterId |  DataProxy集群ID |  
+|  sourceId|  DataProxy的Source组件名 |  
+|  sourceDataId|  DataProxy的Source组件数据流ID,如果Source是一个TCPSource,那么这个ID会是一个端口号 |  
+|  inlongGroupId|  Inlong数据ID |  
+|  inlongStreamId|  Inlong数据流ID |  
+|  sinkId|  DataProxy的Sink组件名 |  
+|  sinkDataId|  DataProxy的Sink组件数据流ID,如果Sink是一个Pulsar发送组件,这个ID会是一个Topic名。 |
+
+监控指标项的MetricItemValue.metrics有这些指标(DataProxyMetricItem的这些字段通过注解Annotation "@CountMetric"定义):
+
+|  property   | description  |
+|  ----  | ----  |
+|  readSuccessCount |  接收成功条数 |  
+|  readSuccessSize |  接收成功大小,单位:byte |  
+|  readFailCount |  接收失败条数 |  
+|  readFailSize |  接收失败大小,单位:byte |  
+|  sendCount |  发送条数 |  
+|  sendSize |  发送大小,单位:byte |  
+|  sendSuccessCount |  发送成功条数 |  
+|  sendSuccessSize |  发送成功大小,单位:byte |  
+|  sendFailCount |  发送失败条数 |  
+|  sendFailSize |  发送失败大小,单位:byte |  
+|  sinkDuration |  发送成功回调时间和发送开始时间的时间差,用于评估目标集群的处理时延和健康状况,单位:毫秒 |  
+|  nodeDuration |  发送成功回调时间和接收成功时间的时间差,用于评估DataProxy内部处理耗时和健康状况,单位:毫秒 |  
+|  wholeDuration |  发送成功回调时间和事件生成时间的时间差,单位:毫秒 |
+
+监控指标已经注册到MBeanServer,用户可以在DataProxy的启动参数中增加如下类似JMX定义(端口和鉴权根据情况进行调整),实现监控指标从远端采集。
 ```shell
 	-Dcom.sun.management.jmxremote
 	-Djava.rmi.server.hostname=127.0.0.1
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sdk/dataproxy-sdk/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sdk/dataproxy-sdk/overview.md
index ede0758..22eeaac 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sdk/dataproxy-sdk/overview.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sdk/dataproxy-sdk/overview.md
@@ -1,7 +1,6 @@
 ---
 title: 总览
 ---
-# 一、说明
 
 在业务使用消息接入方式时,业务一般仅需将数据按照DataProxy可识别的格式(如六段协议、数字化协议等)
 进行组包发送,就可以将数据接入到inlong。但为了保证数据可靠性、负载均衡、动态更新proxy列表等安全特性
@@ -9,9 +8,7 @@ title: 总览
 
 API的设计初衷就是为了简化用户接入,承担部分可靠性相关的逻辑。用户通过在服务送程序中集成API后,即可将数据发送到DataProxy,而不用关心组包格式、负载均衡等逻辑。
 
-# 二、功能说明
-
-## 2.1 整体功能说明
+## 功能说明
 
 |  功能   | 详细描述  |
 |  ----  | ----  |
@@ -23,41 +20,42 @@ API的设计初衷就是为了简化用户接入,承担部分可靠性相关
 | DataProxy列表持久化(新)  | 根据业务id对DataProxy列表持久化,防止程序启动时配置中心发生故障无法发送数据
 
 
-## 2.2 数据发送功能说明
+## 数据发送
 
 ### 同步批量函数
-
+```
     public SendResult sendMessage(List<byte[]> bodyList, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
 
     参数说明
 
     bodyList是用户需要发送的多条数据的集合,总长度建议小于512k。groupId代表业务id,streamId代表接口id。dt表示该数据的时间戳,精确到毫秒级别。也可直接设置为0,此时api会后台获取当前时间作为其时间戳。timeout & timeUnit:这两个参数是设置发送数据的超时时间,一般建议设置成20s。
+```
 
 
-
-###同步单条函数
-
+### 同步单条函数
+```
     public SendResult sendMessage(byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
 
     参数说明
 
     body是用户要发送的单条数据内容,其余各参数涵义基本与批量发送接口一致。
+```
 
 
-
-###异步批量函数
-
+### 异步批量函数
+```
     public void asyncSendMessage(SendMessageCallback callback, List<byte[]> bodyList, String groupId, String streamId, long dt, long timeout,TimeUnit timeUnit)
 
     参数说明
 
     SendMessageCallback 是处理消息的callback。bodyList为用户需要发送的多条数据的集合,多条数据的总长度建议小于512k。groupId是业务id,streamId是接口id。dt表示该数据的时间戳,精确到毫秒级别。也可直接设置为0,此时api会后台获取当前时间作为其时间戳。timeout和timeUnit是发送数据的超时时间,一般建议设置成20s。
+```
 
-
-###异步单条函数
-
+### 异步单条函数
+```
     public void asyncSendMessage(SendMessageCallback callback, byte[] body, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit)
 
     参数说明
 
-    body为单条消息内容,其余各参数涵义基本与批量发送接口一致
\ No newline at end of file
+    body为单条消息内容,其余各参数涵义基本与批量发送接口一致
+```
\ No newline at end of file