You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by do...@apache.org on 2022/04/18 12:00:23 UTC

[incubator-inlong-website] branch master updated: [INLONG-355][Doc] Refactor the File collector guide (#356)

This is an automated email from the ASF dual-hosted git repository.

dockerzhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-inlong-website.git


The following commit(s) were added to refs/heads/master by this push:
     new f0f79923f [INLONG-355][Doc] Refactor the File collector guide (#356)
f0f79923f is described below

commit f0f79923fbdfb6e2cb29d1d3ab57244decf0d190
Author: dockerzhang <do...@apache.org>
AuthorDate: Mon Apr 18 20:00:18 2022 +0800

    [INLONG-355][Doc] Refactor the File collector guide (#356)
---
 docs/administration/_category_.json                |   2 +-
 docs/contact.md                                    |   2 +-
 docs/data_node/_category_.json                     |   4 ++
 docs/data_node/extract_node/_category_.json        |   4 ++
 .../agent => data_node/extract_node}/file.md       |  21 +++++---
 docs/data_node/extract_node/img/file_param.png     | Bin 0 -> 17445 bytes
 docs/data_node/load_node/_category_.json           |   4 ++
 docs/development/_category_.json                   |   2 +-
 docs/modules/agent/sql.md                          |  60 ---------------------
 docs/sdk/_category_.json                           |   2 +-
 docs/user_guide/_category_.json                    |   2 +-
 .../current/how-to-release.md                      |   2 +-
 .../docusaurus-plugin-content-docs/current.json    |   4 ++
 .../agent => data_node/extract_node}/file.md       |  18 +++++--
 .../data_node/extract_node/img/file_param.png      | Bin 0 -> 12487 bytes
 .../current/modules/agent/sql.md                   |  60 ---------------------
 16 files changed, 49 insertions(+), 138 deletions(-)

diff --git a/docs/administration/_category_.json b/docs/administration/_category_.json
index cc9b71eac..56431da33 100644
--- a/docs/administration/_category_.json
+++ b/docs/administration/_category_.json
@@ -1,4 +1,4 @@
 {
   "label": "Administration",
-  "position": 9
+  "position": 10
 }
\ No newline at end of file
diff --git a/docs/contact.md b/docs/contact.md
index 867e82d26..918702dae 100644
--- a/docs/contact.md
+++ b/docs/contact.md
@@ -1,6 +1,6 @@
 ---
 title: Contact Us
-sidebar_position: 10
+sidebar_position: 11
 ---
 
 Contact information
diff --git a/docs/data_node/_category_.json b/docs/data_node/_category_.json
new file mode 100644
index 000000000..c44226d96
--- /dev/null
+++ b/docs/data_node/_category_.json
@@ -0,0 +1,4 @@
+{
+  "label": "Data Nodes",
+  "position": 6
+}
\ No newline at end of file
diff --git a/docs/data_node/extract_node/_category_.json b/docs/data_node/extract_node/_category_.json
new file mode 100644
index 000000000..c61e62a3b
--- /dev/null
+++ b/docs/data_node/extract_node/_category_.json
@@ -0,0 +1,4 @@
+{
+  "label": "Extract Nodes",
+  "position": 1
+}
\ No newline at end of file
diff --git a/docs/modules/agent/file.md b/docs/data_node/extract_node/file.md
similarity index 67%
rename from docs/modules/agent/file.md
rename to docs/data_node/extract_node/file.md
index a85201c76..d8bc22a92 100644
--- a/docs/modules/agent/file.md
+++ b/docs/data_node/extract_node/file.md
@@ -1,9 +1,18 @@
 ---
 title: File
-sidebar_position: 3
+sidebar_position: 1
 ---
 
-## File Agent Configuration
+## Parameters
+![File Params](img/file_param.png)
+- DataSource Name
+- Data source IP: Collect Node Agent IP.
+- File path: Must be an absolute path and support regular expressions.
+- Time offset: The file will be collected from a certain time,' 1m' means 1 minute later,' -1m' means 1 minute before, and m(minute), h(hour), d(day) are supported. If it is empty, the file will be collected from the current time.
+- Source data fileDelimiter: Vertical line(|), Comma(,), Semicolon(;)...
+- Source data field: Delimited fields
+
+## Path Configuration
 ```
 /data/inlong-agent/test.log //Represents reading the new file test.log in the inlong-agent folder
 /data/inlong-agent/test[0-9]{1} // means to read the new file test in the inlong-agent folder followed by a number at the end
@@ -11,19 +20,17 @@ sidebar_position: 3
 /data/inlong-agent/^\\d+(\\.\\d+)? // Start with one or more digits, followed by. or end with one. or more digits (? stands for optional, can match Examples: "5", "1.5" and "2.21"
 ```
 
-## Get data time from file name
-
+## Data Time
 Agent supports obtaining the time from the file name as the production time of the data. The configuration instructions are as follows:
 ```
 /data/inlong-agent/***YYYYMMDDHH***
 ```
+
 Where YYYYDDMMHH represents the data time, YYYY represents the year, MM represents the month, DD represents the day, and HH represents the hour
 Where *** is any character
 
 At the same time, you need to add the current data cycle to the job conf, the current support day cycle and hour cycle,
-When adding a task, add the property job.cycleUnit
-
-job.cycleUnit contains the following two types:
+When adding a task, add the property job.cycleUnit. job.cycleUnit contains the following two types:
 - D: Represents the data time and day dimension
 - H: Represents the data time and hour dimension
 
diff --git a/docs/data_node/extract_node/img/file_param.png b/docs/data_node/extract_node/img/file_param.png
new file mode 100644
index 000000000..c3cbcdaea
Binary files /dev/null and b/docs/data_node/extract_node/img/file_param.png differ
diff --git a/docs/data_node/load_node/_category_.json b/docs/data_node/load_node/_category_.json
new file mode 100644
index 000000000..429218e09
--- /dev/null
+++ b/docs/data_node/load_node/_category_.json
@@ -0,0 +1,4 @@
+{
+  "label": "Load Nodes",
+  "position": 2
+}
\ No newline at end of file
diff --git a/docs/development/_category_.json b/docs/development/_category_.json
index 6244591cb..dc8cb43dc 100644
--- a/docs/development/_category_.json
+++ b/docs/development/_category_.json
@@ -1,4 +1,4 @@
 {
   "label": "Development",
-  "position": 8
+  "position": 9
 }
\ No newline at end of file
diff --git a/docs/modules/agent/sql.md b/docs/modules/agent/sql.md
deleted file mode 100644
index c4256c05f..000000000
--- a/docs/modules/agent/sql.md
+++ /dev/null
@@ -1,60 +0,0 @@
----
-title: MySQL SQL
-sidebar_position: 3
----
-
-## Overview
-Currently, Agent supports MYSQL version 5.1.x , 5.5.x , 5.6.x , 5.7.x , 8.0.x
-Currently, the Agent only supports the curl request to create a Job to submit collection tasks, and temporarily does not support the manager front-end to create SQL collection
-
-## Create A MySQL Job
-
-1. Apply for access on the manager, when filling in the data information, select the message source as "Independent Push"
-2. Select the source data field separator
-3. Fill in the source data fields, and the field order should be consistent with the field order in the sql query result
-4. Create a SQL read task using curl request
-
-## Parameter Description
-
-````
-Each parameter used in SQL Agent Job is described as
-1. job.sql.command: The actual executed sql statement, for example: select * from apache_inlong_manager.user
-2. job.sql.user: the user used when connecting to the database, for example: abc
-3. job.sql.password: The password used when connecting to the database, for example: 123456
-4. job.sql.hostname: The IP address of the connected database, for example: 127.0.0.1
-5. job.sql.port: the connected database port, for example: 3306
-6. job.sql.separator: The separator used to separate multiple fields needs to be used with the manager front-end
-````
-
-## Example
-
-```bash
-curl --location --request POST 'http://localhost:8008/config/job' \--header 'Content-Type: application/json' \--data '{
-  "job": {
-    "sql": {
-      "command": "select * from apache_inlong_manager.user",
-      "user":  "root",
-      "password": "inlong",
-      "hostname": "10.0.0.6",
-      "port": "3306",
-      "separator": "|"
-    },
-    "id": 1,
-    "thread": {
-      "running": {
-        "core": "4"
-      }
-    },
-    "name": "test",
-    "source": "org.apache.inlong.agent.plugin.sources.DataBaseSource",
-    "sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-    "channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-  },
-  "proxy": {
-    "inlongGroupId": "b_test_tube_hive_20211221_01",
-    "inlongStreamId": "test_data_stream_20211221_01_01"
-  },
-  "op": "add"
-}
-'
-```
\ No newline at end of file
diff --git a/docs/sdk/_category_.json b/docs/sdk/_category_.json
index 6b355ca6b..9c997dee9 100644
--- a/docs/sdk/_category_.json
+++ b/docs/sdk/_category_.json
@@ -1,4 +1,4 @@
 {
   "label": "SDK",
-  "position": 6
+  "position": 7
 }
\ No newline at end of file
diff --git a/docs/user_guide/_category_.json b/docs/user_guide/_category_.json
index 891650af2..514ec933b 100644
--- a/docs/user_guide/_category_.json
+++ b/docs/user_guide/_category_.json
@@ -1,4 +1,4 @@
 {
   "label": "User Guide",
-  "position": 7
+  "position": 8
 }
\ No newline at end of file
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/how-to-release.md b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/how-to-release.md
index 63a1cd790..cc3b5d3bb 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/how-to-release.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs-community/current/how-to-release.md
@@ -232,7 +232,7 @@ cd /tmp/apache-inlong-${release_version}-${rc_version} # 进入源码包目录
 tar xzvf apache-inlong-${release_version}-src.tar.gz #解压源码包
 cd apache-inlong-${release_version} # 进入源码目录
 mvn compile clean install package -DskipTests # 编译
-cp ./inlong-distribution/target/apache-inlong-${release_version}-bin.tar.gz /tmp/apache-inlong-${release_version}-${rc_version}/  # 拷贝二进制包拷到源码包目录下,方面下一步对包进行签名
+cp ./inlong-distribution/target/apache-inlong-${release_version}-bin.tar.gz /tmp/apache-inlong-${release_version}-${rc_version}/  # 拷贝二进制包拷到源码包目录下,方便下一步对包进行签名
 ```
 
 ### 对源码包/二进制包进行签名/sha512
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current.json b/i18n/zh-CN/docusaurus-plugin-content-docs/current.json
index c5ca17905..f0e315eb8 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current.json
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current.json
@@ -58,5 +58,9 @@
   "sidebar.tutorialSidebar.category.User Guide": {
     "message": "用户指引",
     "description": "The label for category User Guide in sidebar tutorialSidebar"
+  },
+  "sidebar.tutorialSidebar.category.Data Nodes": {
+    "message": "数据节点",
+    "description": "The label for category Data Nodes in sidebar tutorialSidebar"
   }
 }
\ No newline at end of file
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/file.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/file.md
similarity index 66%
rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/file.md
rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/file.md
index 0744cce32..313fdcdd2 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/file.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/file.md
@@ -3,7 +3,16 @@ title: 文件
 sidebar_position: 3
 ---
 
-## 文件Agent配置
+## 参数说明
+![File Params](img/file_param.png)
+- 数据源名称
+- 数据源IP:采集节点 Agent IP
+- ⽂件路径:必须是绝对路径,支持正则表达式,多个时以逗号分隔
+- 时间偏移量:从文件的某个时间开始采集,'1m'表示1分钟之后,'-1m'表示1分钟之前,支持m(分钟),h(小时),d(天),空则从当前时间开始采集
+- 文件分隔符:支持竖线(|), 逗号(,),分好(;)
+- 源数据字段:分隔符切分后的字段
+
+## 路径配置
 ```
 /data/inlong-agent/test.log  //代表读取inlong-agent文件夹下的的新增文件test.log
 /data/inlong-agent/test[0-9]{1} //代表读取inlong-agent文件夹下的新增文件test后接一个数字结尾
@@ -11,18 +20,17 @@ sidebar_position: 3
 /data/inlong-agent/^\\d+(\\.\\d+)? // 以一个或多个数字开头,之后可以是.或者一个.或多个数字结尾,?代表可选,可以匹配的实例:"5", "1.5" 和 "2.21"
 ```
 
-## 从文件名称中获取数据时间
+## 数据时间
 Agent支持从文件名称中获取时间当作数据的生产时间,配置说明如下:
 ```
 /data/inlong-agent/***YYYYMMDDHH***
 ```
+
 其中YYYYDDMMHH代表数据时间,YYYY表示年,MM表示月份,DD表示天,HH表示小时
 其中***为任意字符
 
 同时需要在job conf中加入当前数据的周期,当前支持天周期以及小时周期,
-在添加任务时,加入属性job.cycleUnit
-
-job.cycleUnit 包含如下两种类型:
+在添加任务时,加入属性 job.cycleUnit。job.cycleUnit 包含如下两种类型:
 - D : 代表数据时间天维度
 - H : 代表数据时间小时维度
 
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/img/file_param.png b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/img/file_param.png
new file mode 100644
index 000000000..d4226b267
Binary files /dev/null and b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data_node/extract_node/img/file_param.png differ
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/sql.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/sql.md
deleted file mode 100644
index 8299313f3..000000000
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/sql.md
+++ /dev/null
@@ -1,60 +0,0 @@
----
-title: MySQL SQL
-sidebar_position: 3
----
-
-## 总览
-目前 Agent 支持 MySQL 版本为5.1.x , 5.5.x , 5.6.x , 5.7.x , 8.0.x
-目前 Agent 只支持 curl 请求创建 Job 方式提交采集任务,暂时不支持 manager 前端创建 SQL 采集
-
-## MySQL Job创建步骤
-
-1、在 manager 上申请接入,填写数据信息时,选择消息来源为"自主推送"
-2、选择源数据字段分隔符
-3、填写源数据字段,字段顺序与 sql 查询结果中的字段顺序保持一致
-4、使用 curl 请求创建一个 SQL 读取任务
-
-## 参数说明
-
-```
-SQL Agent Job 中各个使用参数说明为
-1、job.sql.command: 实际执行的 sql 语句,举例: select * from apache_inlong_manager.user
-2、job.sql.user: 连接数据库时使用的 user,举例: abc
-3、job.sql.password: 连接数据库时使用的 password, 举例: 123456
-4、job.sql.hostname: 连接的数据库 ip 地址,举例:127.0.0.1
-5、job.sql.port:连接的数据库端口,举例:3306
-6、job.sql.separator: 使用的分割符来分割多个字段,需要与 manager 前端
-```
-
-## 举例
-
-```bash
-curl --location --request POST 'http://localhost:8008/config/job' \--header 'Content-Type: application/json' \--data '{
-  "job": {
-    "sql": {
-      "command": "select * from apache_inlong_manager.user",
-      "user":  "root",
-      "password": "inlong",
-      "hostname": "10.0.0.6",
-      "port": "3306",
-      "separator": "|"
-    },
-    "id": 1,
-    "thread": {
-      "running": {
-        "core": "4"
-      }
-    },
-    "name": "test",
-    "source": "org.apache.inlong.agent.plugin.sources.DataBaseSource",
-    "sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
-    "channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
-  },
-  "proxy": {
-    "inlongGroupId": "b_test_tube_hive_20211221_01",
-    "inlongStreamId": "test_data_stream_20211221_01_01"
-  },
-  "op": "add"
-}
-'
-```
\ No newline at end of file