You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by he...@apache.org on 2022/01/23 11:45:31 UTC

[incubator-inlong-website] branch master updated: [INLONG-2290] simplify the agent collect step for pulsar/hive example guide (#264)

This is an automated email from the ASF dual-hosted git repository.

healchow pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-inlong-website.git


The following commit(s) were added to refs/heads/master by this push:
     new e52944d  [INLONG-2290] simplify the agent collect step for pulsar/hive example guide (#264)
e52944d is described below

commit e52944da3be6583d065d8aad371dd82d96cb8aa0
Author: dockerzhang <do...@apache.org>
AuthorDate: Sun Jan 23 19:45:26 2022 +0800

    [INLONG-2290] simplify the agent collect step for pulsar/hive example guide (#264)
---
 docs/quick_start/hive_example.md                   |  27 +++++--------
 docs/quick_start/pulsar_example.md                 |  44 +++++++++++++++++---
 .../current/quick_start/hive_example.md            |  27 +++++--------
 .../current/quick_start/img/data-information.png   | Bin 13968 -> 23356 bytes
 .../current/quick_start/pulsar_example.md          |  45 ++++++++++++++++++---
 5 files changed, 95 insertions(+), 48 deletions(-)

diff --git a/docs/quick_start/hive_example.md b/docs/quick_start/hive_example.md
index 347f0d6..c5de370 100644
--- a/docs/quick_start/hive_example.md
+++ b/docs/quick_start/hive_example.md
@@ -44,20 +44,7 @@ Then we enter the "Approval Management" interface and click "My Approval" to app
 At this point, the data access has been created successfully. We can see that the corresponding table has been created in Hive, and we can see that the corresponding topic has been created successfully in the management GUI of TubeMQ.
 
 ## Configure the agent
-Here we use `docker exec` to enter the container of the agent and configure it.
-```
-$ docker exec -it agent sh
-```
-
-Then we create a directory of `.inlong`, and new a file named `groupid.local` (Here groupId is group id showed on data access in inlong-manager) and fill in the configuration of Dataproxy as follows.
-```
-$ mkdir .inlong
-$ cd .inlong
-$ touch b_test.local
-$ echo '{"cluster_id":1,"isInterVisit":1,"size":1,"address": [{"port":46801,"host":"dataproxy"}], "switch":0}' >> b_test.local
-```
-
-Then we exit the container, and use `curl` to make a request.
+Create a collect job by using `curl` to make a request.
 ```
 curl --location --request POST 'http://localhost:8008/config/job' \
 --header 'Content-Type: application/json' \
@@ -80,7 +67,7 @@ curl --location --request POST 'http://localhost:8008/config/job' \
 "channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
 },
 "proxy": {
-"inlongGroupId": "b_test",
+"inlongGroupId": "b_test_group",
 "inlongStreamId": "test_stream"
 },
 "op": "add"
@@ -90,9 +77,13 @@ curl --location --request POST 'http://localhost:8008/config/job' \
 At this point, the agent is configured successfully.
 Then we need to create a new file `./collect-data/test.log` and add content to it to trigger the agent to send data to the dataproxy.
 
-```
-$ touch collect-data/test.log
-$ echo 'test,24' >> collect-data/test.log
+``` shell
+mkdir collect-data
+END=100000
+for ((i=1;i<=END;i++)); do
+    sleep 3
+    echo "name_$i | $i" >> ./collect-data/test.log
+done
 ```
 
 Then we can observe the logs of agent and dataproxy, and we can see that the relevant data has been sent successfully.
diff --git a/docs/quick_start/pulsar_example.md b/docs/quick_start/pulsar_example.md
index 2469096..a12ea8b 100644
--- a/docs/quick_start/pulsar_example.md
+++ b/docs/quick_start/pulsar_example.md
@@ -61,15 +61,47 @@ the topics and subscriptions required for the data stream will be created in the
 We can use the command-line tool in the Pulsar cluster to check whether the topic is created successfully:
 ![](img/pulsar-topic.png)
 
-## Configure File Agent
-When configuring the file agent, you must create the file in the directory specified when creating the data ingestion:
+## Configure the agent
+Create a collect job by using `curl` to make a request.
 ```
-touch /data/test_file.txt;
+curl --location --request POST 'http://localhost:8008/config/job' \
+--header 'Content-Type: application/json' \
+--data '{
+"job": {
+"dir": {
+"path": "",
+"pattern": "/data/collect-data/test.log"
+},
+"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
+"id": 1,
+"thread": {
+"running": {
+"core": "4"
+}
+},
+"name": "fileAgentTest",
+"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
+"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
+"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
+},
+"proxy": {
+"inlongGroupId": "b_test_group",
+"inlongStreamId": "test_stream"
+},
+"op": "add"
+}'
 ```
 
-Write data to the file according to the data source format when creating the data stream:
-```
-echo -e "1|test\n2|test\n" >> /data/test_file.txt
+At this point, the agent is configured successfully.
+Then we need to create a new file `./collect-data/test.log` and add content to it to trigger the agent to send data to the dataproxy.
+
+``` shell
+mkdir collect-data
+END=100000
+for ((i=1;i<=END;i++)); do
+    sleep 3
+    echo "name_$i | $i" >> ./collect-data/test.log
+done
 ```
 
 ## Data Check
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/hive_example.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/hive_example.md
index 2ada905..34009a9 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/hive_example.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/hive_example.md
@@ -46,20 +46,7 @@ Hive 是运行的必备组件。如果您的机器上没有 Hive,这里推荐
 到此接入就已经创建完毕了,我们可以在 Hive 中看到相应的表已经被创建,并且在 TubeMQ 的管理界面中可以看到相应的 topic 已经创建成功。
 
 ## 配置 agent
-然后我们使用 docker 进入 agent 容器内,创建相应的 agent 配置。
-```
-$ docker exec -it agent sh
-```
-
-然后我们新建 `.inlong` 文件夹,并创建以 `groupId.local` 命名的文件,在其中填入 Dataproxy 有关配置。
-```
-$ mkdir .inlong
-$ cd .inlong
-$ touch b_test.local
-$ echo '{"cluster_id":1,"isInterVisit":1,"size":1,"address": [{"port":46801,"host":"dataproxy"}], "switch":0}' >> b_test.local
-```
-
-然后退出容器,使用 curl 向 agent 容器发送请求。
+使用 curl 向 agent 容器发送请求创建采集任务。
 ```
 curl --location --request POST 'http://localhost:8008/config/job' \
 --header 'Content-Type: application/json' \
@@ -82,7 +69,7 @@ curl --location --request POST 'http://localhost:8008/config/job' \
 "channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
 },
 "proxy": {
-"inlongGroupId": "b_test",
+"inlongGroupId": "b_test_group",
 "inlongStreamId": "test_stream"
 },
 "op": "add"
@@ -91,9 +78,13 @@ curl --location --request POST 'http://localhost:8008/config/job' \
 
 至此,agent 就配置完毕了。接下来我们可以新建 `./collect-data/test.log` ,并往里面添加内容,来触发 agent 向 dataproxy 发送数据了。
 
-```
-$ touch collect-data/test.log
-$ echo 'test,24' >> collect-data/test.log
+``` shell
+mkdir collect-data
+END=100000
+for ((i=1;i<=END;i++)); do
+    sleep 3
+    echo "name_$i | $i" >> ./collect-data/test.log
+done
 ```
 
 然后观察 agent 和 dataproxy 的日志,可以看到相关数据已经成功发送。
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/data-information.png b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/data-information.png
index 14eafe6..8c0742b 100644
Binary files a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/data-information.png and b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/img/data-information.png differ
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/pulsar_example.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/pulsar_example.md
index 6bc3b24..740106c 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/pulsar_example.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/quick_start/pulsar_example.md
@@ -32,7 +32,7 @@ pulsar.defaultTenant=public
 ```
 
 ## 创建数据接入
-### 配置数据流Group 信息
+### 配置数据流 Group 信息
 ![](img/pulsar-group.png)
 在创建数据接入时,数据流 Group 可选用的消息中间件选择 Pulsar,其它跟 Pulsar 相关的配置项还包括:
 - Queue module:队列模型,并行或者顺序,选择并行时可设置 Topic 的分区数,顺序则为一个分区;
@@ -59,16 +59,49 @@ pulsar.defaultTenant=public
 ![](img/pulsar-topic.png)
 
 ## 配置文件 Agent
-在配置文件 Agent 时,需要根据数据接入创建时指定的目录下创建文件:
+使用 curl agent 发送请求创建采集任务。
 ```
-touch /data/test_file.txt;
+curl --location --request POST 'http://localhost:8008/config/job' \
+--header 'Content-Type: application/json' \
+--data '{
+"job": {
+"dir": {
+"path": "",
+"pattern": "/data/collect-data/test.log"
+},
+"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
+"id": 1,
+"thread": {
+"running": {
+"core": "4"
+}
+},
+"name": "fileAgentTest",
+"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
+"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
+"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
+},
+"proxy": {
+"inlongGroupId": "b_test_group",
+"inlongStreamId": "test_stream"
+},
+"op": "add"
+}'
 ```
 
-按照创建数据流时的数据源格式,向文件中写入数据(可以按格式写入更多数据):
-```
-echo -e "1|test\n2|test\n" >> /data/test_file.txt
+至此,agent 就配置完毕了。接下来我们可以新建 `./collect-data/test.log` ,并往里面添加内容,来触发 agent 向 dataproxy 发送数据了。
+
+``` shell
+mkdir collect-data
+END=100000
+for ((i=1;i<=END;i++)); do
+    sleep 3
+    echo "name_$i | $i" >> ./collect-data/test.log
+done
 ```
 
+然后观察 agent 和 dataproxy 的日志,可以看到相关数据已经成功发送。
+
 ## 数据落地检查
 
 最后,我们登入 Hive 集群,通过 Hive 的 SQL 命令查看 `test_stream` 表中是否成功插入了数据。