You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by do...@apache.org on 2022/04/11 11:25:52 UTC
[incubator-inlong-website] branch master updated: [INLONG-3620] Update the file agent guide document (#342)

This is an automated email from the ASF dual-hosted git repository.

dockerzhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-inlong-website.git


The following commit(s) were added to refs/heads/master by this push:
     new e9b13a82d [INLONG-3620] Update the file agent guide document (#342)
e9b13a82d is described below

commit e9b13a82d604a9fb29d7740c3b10ac1f63158e43
Author: dockerzhang <do...@apache.org>
AuthorDate: Mon Apr 11 19:25:47 2022 +0800

    [INLONG-3620] Update the file agent guide document (#342)
---
 docs/modules/agent/file.md                         |  72 +-------------------
 docs/modules/agent/quick_start.md                  |  43 +-----------
 docs/quick_start/hive_example.md                   |  51 +++-----------
 docs/quick_start/img/file-source.png               | Bin 0 -> 16484 bytes
 docs/quick_start/img/pulsar-topic.png              | Bin 33396 -> 0 bytes
 docs/quick_start/pulsar_example.md                 |  65 +++---------------
 .../current/modules/agent/file.md                  |  73 +--------------------
 .../current/modules/agent/quick_start.md           |  41 +-----------
 .../current/quick_start/hive_example.md            |  52 +++------------
 .../current/quick_start/img/file-source.png        | Bin 0 -> 13190 bytes
 .../current/quick_start/img/pulsar-topic.png       | Bin 33396 -> 0 bytes
 .../current/quick_start/pulsar_example.md          |  64 +++---------------
 12 files changed, 39 insertions(+), 422 deletions(-)

diff --git a/docs/modules/agent/file.md b/docs/modules/agent/file.md
index c0abcfc2f..a85201c76 100644
--- a/docs/modules/agent/file.md
+++ b/docs/modules/agent/file.md
@@ -36,74 +36,4 @@ Write data to 2021020211.log
 Configure job.cycleUnit as D
 Then the agent will try the 202020211.log file at the time of 202020211. When reading the data in the file, it will write all the data to the backend proxy at the time of 20210202.
 If job.cycleUnit is configured as H
-When collecting data in the 2021020211.log file, all data will be written to the backend proxy at the time of 2021020211。
-
-Examples of job submission：
-```bash
-curl --location --request POST'http://localhost:8008/config/job' \
---header'Content-Type: application/json' \
---data'{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/inlong-agent/2021020211.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"cycleUnit": "D",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"inlongGroupId": "group10",
-"inlongStreamId": "group10"
-},
-"op": "add"
-}'
-```
-
-## Time offset reading
-After the configuration is read by time, if you want to read data at other times than the current time, you can configure the time offset to complete
-Configure the job attribute name as job.timeOffset, the value is number + time dimension, time dimension includes day and hour
-For example, the following settings are supported:
-- 1d Read the data one day after the current time
-- -1h read the data one hour before the current time
-
-Examples of job submission
-```bash
-curl --location --request POST'http://localhost:8008/config/job' \
---header'Content-Type: application/json' \
---data'{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/inlong-agent/test.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"cycleUnit": "D",
-"timeOffset": "-1d",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"inlongGroupId": "groupId10",
-"inlongStreamId": "streamId10"
-},
-"op": "add"
-}'
-```
\ No newline at end of file
+When collecting data in the 2021020211.log file, all data will be written to the backend proxy at the time of 2021020211。
\ No newline at end of file
diff --git a/docs/modules/agent/quick_start.md b/docs/modules/agent/quick_start.md
index 8514683bd..cf0ddfc86 100644
--- a/docs/modules/agent/quick_start.md
+++ b/docs/modules/agent/quick_start.md
@@ -28,45 +28,4 @@ audit.proxys=127.0.0.1:10081
 ## Start
 ```bash
 bash +x bin/agent.sh start
-```
-
-
-## Example: Add job configuration in real time
-
-```bash
-    curl --location --request POST 'http://localhost:8008/config/job' \
-    --header 'Content-Type: application/json' \
-    --data '{
-    "job": {
-    "dir": {
-    "path": "",
-    "pattern": "/data/inlong-agent/test.log"
-    },
-    "trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-    "id": 1,
-    "thread": {
-    "running": {
-    "core": "4"
-    },
-    "onejob": true
-    },
-    "name": "fileAgentTest",
-    "source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-    "sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-    "channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-    },
-    "proxy": {
-  "inlongGroupId": "groupId10",
-  "inlongStreamId": "groupId10"
-    },
-    "op": "add"
-    }'
-```
-
-The meaning of each parameter is ：
-- job.dir.pattern: Configure the read file path, which can include regular expressions
-- job.trigger: Trigger name, the default is DirectoryTrigger, the function is to monitor the files under the folder to generate events
-- job.source: The type of data source used, the default is TextFileSource, which reads text files
-- job.sink：The type of writer used, the default is ProxySink, which sends messages to the proxy
-- proxy.groupId: The groupId type used when writing proxy, groupId is group id showed on data access in inlong-manager, not the topic name.
-- proxy.streamId: The streamId type used when writing proxy, streamId is the data flow id showed on data flow window in inlong-manager
\ No newline at end of file
+```
\ No newline at end of file
diff --git a/docs/quick_start/hive_example.md b/docs/quick_start/hive_example.md
index c5de370e5..b177c3372 100644
--- a/docs/quick_start/hive_example.md
+++ b/docs/quick_start/hive_example.md
@@ -3,7 +3,7 @@ title: Hive Example
 sidebar_position: 2
 ---
 
-Here we use a simple example to help you experience InLong by Docker.
+Here we use a simple example to help you experience InLong.
 
 ## Install Hive
 Hive is the necessary component. If you don't have Hive in your machine, we recommand using Docker to install it. Details can be found [here](https://github.com/big-data-europe/docker-hive).
@@ -24,7 +24,9 @@ Then we click the next button, and fill in the stream information as shown in th
 
 ![Create Stream](img/create-stream.png)
 
-Note that the message source is "File", and we don't need to create a message source manually.
+Note that the message source is "File", you can create a data source manually and configure `Agent Address` and `File Path`.
+
+![File Source](img/file-source.png)
 
 Then we fill in the following information in the "data information" column below.
 
@@ -43,53 +45,16 @@ Then we enter the "Approval Management" interface and click "My Approval" to app
 
 At this point, the data access has been created successfully. We can see that the corresponding table has been created in Hive, and we can see that the corresponding topic has been created successfully in the management GUI of TubeMQ.
 
-## Configure the agent
-Create a collect job by using `curl` to make a request.
-```
-curl --location --request POST 'http://localhost:8008/config/job' \
---header 'Content-Type: application/json' \
---data '{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/collect-data/test.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"inlongGroupId": "b_test_group",
-"inlongStreamId": "test_stream"
-},
-"op": "add"
-}'
-```
-
-At this point, the agent is configured successfully.
-Then we need to create a new file `./collect-data/test.log` and add content to it to trigger the agent to send data to the dataproxy.
+## Configure the agent file
+Then we need to create a new file `/data/collect-data/test.log` and add content to it to trigger the agent to send data to the dataproxy.
 
 ``` shell
 mkdir collect-data
 END=100000
 for ((i=1;i<=END;i++)); do
     sleep 3
-    echo "name_$i | $i" >> ./collect-data/test.log
+    echo "name_$i | $i" >> /data/collect-data/test.log
 done
 ```
 
-Then we can observe the logs of agent and dataproxy, and we can see that the relevant data has been sent successfully.
-
-```
-$ docker logs agent
-$ docker logs dataproxy
-```
-
+Then you can observe the Audit Data Pages, and see that the data has been collected and sent successfully.
\ No newline at end of file
diff --git a/docs/quick_start/img/file-source.png b/docs/quick_start/img/file-source.png
new file mode 100644
index 000000000..6e1d84f37
Binary files /dev/null and b/docs/quick_start/img/file-source.png differ
diff --git a/docs/quick_start/img/pulsar-topic.png b/docs/quick_start/img/pulsar-topic.png
deleted file mode 100644
index b892f651a..000000000
Binary files a/docs/quick_start/img/pulsar-topic.png and /dev/null differ
diff --git a/docs/quick_start/pulsar_example.md b/docs/quick_start/pulsar_example.md
index a12ea8ba8..3bdf08cd8 100644
--- a/docs/quick_start/pulsar_example.md
+++ b/docs/quick_start/pulsar_example.md
@@ -21,17 +21,6 @@ Before we begin, we need to install InLong. Here we provide two ways:
 1. Install InLong with Docker by according to the [instructions here](deployment/docker.md).(Recommanded)
 2. Install InLong binary according to the [instructions here](deployment/bare_metal.md).
 
-Unlike InLong TubeMQ, if you use Apache Pulsar, you need to configure Pulsar cluster information 
-in the Manager component installation. The format is as follows：
-```
-# Pulsar admin URL
-pulsar.adminUrl=http://127.0.0.1:8080,127.0.0.2:8080,127.0.0.3:8080
-# Pulsar broker address
-pulsar.serviceUrl=pulsar://127.0.0.1:6650,127.0.0.1:6650,127.0.0.1:6650
-# Default tenant of Pulsar
-pulsar.defaultTenant=public
-```
-
 ## Create a data ingestion
 ### Configure data streams group information
 ![](img/pulsar-group.png)
@@ -46,7 +35,9 @@ and other configuration items related to Pulsar include:
 
 ### Configure data stream
 ![](img/pulsar-stream.png)
-When configuring the message source, the file path in the file data source can be referred to [file-agent-configuration](https://inlong.apache.org/docs/next/modules/agent/file#file-agent-configuration).
+
+### Configure File Agent
+![](img/file-source.png)
 
 ### Configure data information
 ![](img/pulsar-data.png)
@@ -58,52 +49,22 @@ Save Hive cluster information, click "Ok" to submit.
 ## Data ingestion Approval
 Enter **Approval** page, click **My Approval**, abd approve the data ingestion application. After the approval is over, 
 the topics and subscriptions required for the data stream will be created in the Pulsar cluster synchronously.
-We can use the command-line tool in the Pulsar cluster to check whether the topic is created successfully:
-![](img/pulsar-topic.png)
-
-## Configure the agent
-Create a collect job by using `curl` to make a request.
-```
-curl --location --request POST 'http://localhost:8008/config/job' \
---header 'Content-Type: application/json' \
---data '{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/collect-data/test.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"inlongGroupId": "b_test_group",
-"inlongStreamId": "test_stream"
-},
-"op": "add"
-}'
-```
+We can use the command-line tool in the Pulsar cluster to check whether the topic is created successfully.
 
-At this point, the agent is configured successfully.
-Then we need to create a new file `./collect-data/test.log` and add content to it to trigger the agent to send data to the dataproxy.
+## Configure File Agent
+Then we need to create a new file `/data/collect-data/test.log` and add content to it to trigger the agent to send data to the dataproxy.
 
 ``` shell
 mkdir collect-data
 END=100000
 for ((i=1;i<=END;i++)); do
     sleep 3
-    echo "name_$i | $i" >> ./collect-data/test.log
+    echo "name_$i | $i" >> /data/collect-data/test.log
 done
 ```
 
+Then you can observe the Audit Data Pages, and see that the data has been collected and sent successfully.
+
 ## Data Check
 Finally, we log in to the Hive cluster and use Hive SQL commands to check 
 whether data is successfully inserted in the `test_stream` table.
@@ -113,10 +74,4 @@ If data is not correctly written to the Hive cluster, you can check whether the
 - Check whether the topic information corresponding to the data stream is correctly written in the `conf/topics.properties` folder of `InLong DataProxy`:
 ```
 b_test_group/test_stream=persistent://public/b_test_group/test_stream
-```
-
-- Check whether the configuration information of the data stream is successfully pushed in 
-- the ZooKeeper monitored by `InLong Sort`：
-```
-get /inlong_hive/dataflows/{{sink_id}}
-```
+```
\ No newline at end of file
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/file.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/file.md
index c2294be9e..0744cce32 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/file.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/file.md
@@ -35,75 +35,4 @@ job.cycleUnit 包含如下两种类型：
 配置 job.cycleUnit 为 D
 则agent会在2021020211时间尝试2021020211.log文件，读取文件中的数据时，会将所有数据以20210202这个时间写入到后端proxy
 如果配置 job.cycleUnit 为 H
-则采集2021020211.log文件中的数据时，会将所有数据以2021020211这个时间写入到后端proxy。
-
-提交job举例：
-```bash
-curl --location --request POST 'http://localhost:8008/config/job' \
---header 'Content-Type: application/json' \
---data '{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/inlong-agent/2021020211.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"cycleUnit": "D",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"inlongGroupId": "groupId",
-"inlongStreamId": "streamId"
-},
-"op": "add"
-}'
-```
-
-
-## 时间偏移量offset读取
-在配置按照时间读取之后，如果想要读取当前时间之外的其他时间的数据，可以通过配置时间偏移量完成
-配置job属性名称为job.timeOffset，值为数字 + 时间维度，时间维度包括天和小时
-例如支持如下设置:
-- 1d 读取当前时间后一天的数据 
-- -1h 读取当前时间前一个小时的数据
-
-提交job举例
-```bash
-curl --location --request POST 'http://localhost:8008/config/job' \
---header 'Content-Type: application/json' \
---data '{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/inlong-agent/test.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"cycleUnit": "D",
-"timeOffset": "-1d",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"inlongGroupId": "groupId",
-"inlongStreamId": "streamId"
-},
-"op": "add"
-}'
-```
\ No newline at end of file
+则采集2021020211.log文件中的数据时，会将所有数据以2021020211这个时间写入到后端proxy。
\ No newline at end of file
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md
index e0285d27b..72c0a0227 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/quick_start.md
@@ -25,43 +25,4 @@ audit.proxys=127.0.0.1:10081
 ## 启动
 ```bash
 bash +x bin/agent.sh start
-```
-
-## 示例：实时添加job配置
-
-```bash
-curl --location --request POST 'http://localhost:8008/config/job' \
---header 'Content-Type: application/json' \
---data '{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/inlong-agent/test.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"inlongGroupId": "groupId10",
-"inlongStreamId": "streamId10"
-},
-"op": "add"
-}'
-```
-
-其中各个参数含义为：
-- job.dir.pattern: 配置读取的文件路径，可包含正则表达式
-- job.trigger: 触发器名称，默认为DirectoryTrigger，功能为监听文件夹下的文件产生事件，任务运行时已有的文件不会读取
-- job.source: 使用的数据源类型，默认为TextFileSource，读取文本文件
-- job.sink：使用的写入器类型，默认为ProxySink，发送消息到dataproxy中
-- proxy.groupId: 写入proxy时使用的groupId，groupId是指manager界面中，数据接入中业务信息的业务ID，此处不是创建的tube topic名称
-- proxy.streamId: 写入proxy时使用的streamId，streamId是指manager界面中，数据接入中数据流的数据流ID
\ No newline at end of file
+```
\ No newline at end of file
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/hive_example.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/hive_example.md
index 34009a9fd..701b43a2b 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/hive_example.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/hive_example.md
@@ -3,7 +3,7 @@ title: 入库 Hive 示例
 sidebar_position: 2
 ---
 
-本节用一个简单的示例，帮助您使用 Docker 快速体验 InLong 的完整流程。
+本节用一个简单的示例，帮助您快速体验 InLong 的完整流程。
 
 
 ## 安装 Hive
@@ -26,7 +26,9 @@ Hive 是运行的必备组件。如果您的机器上没有 Hive，这里推荐
 
 ![Create Stream](img/create-stream.png)
 
-注意其中消息来源选择“文件”，暂时不用新建数据源。
+注意其中消息来源选择“文件”，并“新建数据源”，配置 `Agent 地址`及采集`文件路径`：
+
+![File Source](img/file-source.png)
 
 然后我们在下面的“数据信息”一栏中填入以下信息
 
@@ -45,54 +47,16 @@ Hive 是运行的必备组件。如果您的机器上没有 Hive，这里推荐
 
 到此接入就已经创建完毕了，我们可以在 Hive 中看到相应的表已经被创建，并且在 TubeMQ 的管理界面中可以看到相应的 topic 已经创建成功。
 
-## 配置 agent
-使用 curl 向 agent 容器发送请求创建采集任务。
-```
-curl --location --request POST 'http://localhost:8008/config/job' \
---header 'Content-Type: application/json' \
---data '{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/collect-data/test.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"inlongGroupId": "b_test_group",
-"inlongStreamId": "test_stream"
-},
-"op": "add"
-}'
-```
-
-至此，agent 就配置完毕了。接下来我们可以新建 `./collect-data/test.log` ，并往里面添加内容，来触发 agent 向 dataproxy 发送数据了。
+## 配置 Agent 采集文件
+接下来我们可以新建 `/data/collect-data/test.log` ，并往里面添加内容，来触发 agent 向 dataproxy 发送数据了。
 
 ``` shell
 mkdir collect-data
 END=100000
 for ((i=1;i<=END;i++)); do
     sleep 3
-    echo "name_$i | $i" >> ./collect-data/test.log
+    echo "name_$i | $i" >> /data/collect-data/test.log
 done
 ```
 
-然后观察 agent 和 dataproxy 的日志，可以看到相关数据已经成功发送。
-
-```
-$ docker logs agent
-$ docker logs dataproxy
-```
-
-
-
+可以观察审计数据页面，看到数据已经成功采集和发送。
\ No newline at end of file
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/file-source.png b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/file-source.png
new file mode 100644
index 000000000..571f9f008
Binary files /dev/null and b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/file-source.png differ
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-topic.png b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-topic.png
deleted file mode 100644
index b892f651a..000000000
Binary files a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/pulsar-topic.png and /dev/null differ
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/pulsar_example.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/pulsar_example.md
index 740106c27..9eb30054e 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/pulsar_example.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/pulsar_example.md
@@ -21,16 +21,6 @@ Hive 是运行的必备组件。如果您的机器上没有 Hive，这里推荐
 1. 按照 [这里的说明](deployment/docker.md)，使用 Docker 进行快速部署。（推荐）
 2. 按照 [这里的说明](deployment/bare_metal.md)，使用二进制包依次安装各组件。
 
-区别于 InLong TubeMQ，如果使用 Apache Pulsar，需要在 Manager 组件安装中配置 Pulsar 集群信息，格式如下：
-```
-# Pulsar admin URL
-pulsar.adminUrl=http://127.0.0.1:8080,127.0.0.2:8080,127.0.0.3:8080
-# Pulsar broker address
-pulsar.serviceUrl=pulsar://127.0.0.1:6650,127.0.0.1:6650,127.0.0.1:6650
-# Default tenant of Pulsar
-pulsar.defaultTenant=public
-```
-
 ## 创建数据接入
 ### 配置数据流 Group 信息
 ![](img/pulsar-group.png)
@@ -44,7 +34,9 @@ pulsar.defaultTenant=public
 
 ### 配置数据流
 ![](img/pulsar-stream.png)
-配置消息来源时，文件数据源中的文件路径，可参照 inlong-agent 中[File Agent的详细指引](https://inlong.apache.org/docs/next/modules/agent/file#file-agent-configuration)。
+
+### 配置文件 Agent
+![](img/file-source.png)
 
 ### 配置数据格式
 ![](img/pulsar-data.png)
@@ -55,52 +47,21 @@ pulsar.defaultTenant=public
 
 ## 数据接入审批
 进入**审批管理**页面，点击**我的审批**，审批上面提交的接入申请，审批结束后会在 Pulsar 集群同步创建数据流需要的 Topic 和订阅。
-我们可以在 Pulsar 集群使用命令行工具检查 Topic 是否创建成功：
-![](img/pulsar-topic.png)
-
-## 配置文件 Agent
-使用 curl agent 发送请求创建采集任务。
-```
-curl --location --request POST 'http://localhost:8008/config/job' \
---header 'Content-Type: application/json' \
---data '{
-"job": {
-"dir": {
-"path": "",
-"pattern": "/data/collect-data/test.log"
-},
-"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
-"id": 1,
-"thread": {
-"running": {
-"core": "4"
-}
-},
-"name": "fileAgentTest",
-"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
-"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-},
-"proxy": {
-"inlongGroupId": "b_test_group",
-"inlongStreamId": "test_stream"
-},
-"op": "add"
-}'
-```
+我们可以在 Pulsar 集群使用命令行工具检查 Topic 是否创建成功。
 
-至此，agent 就配置完毕了。接下来我们可以新建 `./collect-data/test.log` ，并往里面添加内容，来触发 agent 向 dataproxy 发送数据了。
+## 配置 Agent 采集文件
+接下来我们可以新建 `/data/collect-data/test.log` ，并往里面添加内容，来触发 agent 向 dataproxy 发送数据了。
 
 ``` shell
 mkdir collect-data
 END=100000
 for ((i=1;i<=END;i++)); do
     sleep 3
-    echo "name_$i | $i" >> ./collect-data/test.log
+    echo "name_$i | $i" >> /data/collect-data/test.log
 done
 ```
 
-然后观察 agent 和 dataproxy 的日志，可以看到相关数据已经成功发送。
+可以观察审计数据页面，看到数据已经成功采集和发送。
 
 ## 数据落地检查
 
@@ -111,11 +72,4 @@ done
 - 检查 `InLong DataProxy` 的 `conf/topics.properties` 文件夹中是否正确写入该数据流对应的Topic 信息：
 ```
 b_test_group/test_stream=persistent://public/b_test_group/test_stream
-```
-
-- 检查 InLong Sort 监听的 ZooKeeper 中是否成功推送了数据流的配置信息：
-```
-get /inlong_hive/dataflows/{{sink_id}}
-```
-
-
+```
\ No newline at end of file