You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by do...@apache.org on 2021/08/27 11:25:23 UTC

[incubator-inlong-website] branch master updated: [INLONG-1486][agent] update the document about configuring the dataprxy address (#131)

This is an automated email from the ASF dual-hosted git repository.

dockerzhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-inlong-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 639a47b  [INLONG-1486][agent] update the document about configuring the dataprxy address (#131)
639a47b is described below

commit 639a47b58beb95d91a4187aff61d439b373bbd04
Author: ziruipeng <zp...@connect.ust.hk>
AuthorDate: Fri Aug 27 19:25:18 2021 +0800

    [INLONG-1486][agent] update the document about configuring the dataprxy address (#131)
    
    Co-authored-by: stingpeng <st...@tencent.com>
---
 docs/en-us/modules/agent/quick_start.md | 111 ++++++++++++++++++++++++++++----
 docs/zh-cn/modules/agent/quick_start.md | 107 +++++++++++++++++++++++++++---
 2 files changed, 197 insertions(+), 21 deletions(-)

diff --git a/docs/en-us/modules/agent/quick_start.md b/docs/en-us/modules/agent/quick_start.md
index e015889..b6283c6 100644
--- a/docs/en-us/modules/agent/quick_start.md
+++ b/docs/en-us/modules/agent/quick_start.md
@@ -6,7 +6,7 @@ cd inlong-agent
 The agent supports two modes of operation: local operation and online operation
 
 
-### 1.1 Agent configuration
+### Agent configuration
 
 Online operation needs to pull the configuration from inlong-manager, the configuration conf/agent.properties is as follows:
 ```ini
@@ -16,15 +16,6 @@ agent.manager.vip.http.host=manager web host
 agent.manager.vip.http.port=manager web port
 ```
 
-### 1.2 Proxy configuration
-Create a new folder named .inlong in the agent directory, and create a new bid+.local file inside. For example, if the sending bid is set to a, then create a new file a.local
-
-write:
-```ini
-{"cluster_id":1,"isInterVisit":1,"size":1,"address": [{"port":write proxy port,"host":"write proxy ip"}], "switch":0}
-Among them, cluster_id, isInterVisit, and switch are reserved fields, please fill in the default values
-```
-
 ## 2、run
 After decompression, run the following command
 
@@ -89,4 +80,102 @@ agent.http.port=Available ports
     /data/inlong-agent/test.log //Represents reading the new file test.log in the inlong-agent folder
     /data/inlong-agent/test[0-9]{1} // means to read the new file test in the inlong-agent folder followed by a number at the end
     /data/inlong-agent/test //If test is a directory, it means to read all new files under test
-    /data/inlong-agent/^\\d+(\\.\\d+)? // Start with one or more digits, followed by. or end with one. or more digits (? stands for optional, can match Examples: "5", "1.5" and "2.21"
\ No newline at end of file
+    /data/inlong-agent/^\\d+(\\.\\d+)? // Start with one or more digits, followed by. or end with one. or more digits (? stands for optional, can match Examples: "5", "1.5" and "2.21"
+
+
+## 5. Support to get data time from file name
+
+    Agent supports obtaining the time from the file name as the production time of the data. The configuration instructions are as follows:
+    /data/inlong-agent/***YYYYMMDDHH***
+    Where YYYYDDMMHH represents the data time, YYYY represents the year, MM represents the month, DD represents the day, and HH represents the hour
+    Where *** is any character
+
+    At the same time, you need to add the current data cycle to the job conf, the current support day cycle and hour cycle,
+    When adding a task, add the property job.cycleUnit
+    
+    job.cycleUnit contains the following two types:
+    1. D: Represents the data time and day dimension
+    2. H: Represents the data time and hour dimension
+
+    E.g:
+    The configuration data source is
+    /data/inlong-agent/YYYYMMDDHH.log
+    Write data to 2021020211.log
+    Configure job.cycleUnit as D
+    Then the agent will try the 202020211.log file at the time of 202020211. When reading the data in the file, it will write all the data to the backend proxy at the time of 20210202.
+    If job.cycleUnit is configured as H
+    When collecting data in the 2021020211.log file, all data will be written to the backend proxy at the time of 2021020211
+
+    
+    Examples of job submission
+
+```bash
+curl --location --request POST'http://localhost:8008/config/job' \
+--header'Content-Type: application/json' \
+--data'{
+"job": {
+"dir": {
+"path": "",
+"pattern": "/data/inlong-agent/test.log"
+},
+"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
+"id": 1,
+"thread": {
+"running": {
+"core": "4"
+}
+},
+"name": "fileAgentTest",
+"cycleUnit": "D",
+"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
+"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
+"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
+},
+"proxy": {
+"bid": "bid10",
+"tid": "bid10"
+},
+"op": "add"
+}'
+```
+
+## 6. Support time offset reading
+
+    After the configuration is read by time, if you want to read data at other times than the current time, you can configure the time offset to complete
+    Configure the job attribute name as job.timeOffset, the value is number + time dimension, time dimension includes day and hour
+    For example, the following settings are supported
+    1. 1d Read the data one day after the current time
+    2. -1h read the data one hour before the current time
+
+
+    Examples of job submission
+```bash
+curl --location --request POST'http://localhost:8008/config/job' \
+--header'Content-Type: application/json' \
+--data'{
+"job": {
+"dir": {
+"path": "",
+"pattern": "/data/inlong-agent/test.log"
+},
+"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
+"id": 1,
+"thread": {
+"running": {
+"core": "4"
+}
+},
+"name": "fileAgentTest",
+"cycleUnit": "D",
+"timeOffset": "-1d",
+"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
+"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
+"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
+},
+"proxy": {
+"bid": "bid10",
+"tid": "bid10"
+},
+"op": "add"
+}'
+```
\ No newline at end of file
diff --git a/docs/zh-cn/modules/agent/quick_start.md b/docs/zh-cn/modules/agent/quick_start.md
index 8fb71fe..21314d6 100644
--- a/docs/zh-cn/modules/agent/quick_start.md
+++ b/docs/zh-cn/modules/agent/quick_start.md
@@ -5,7 +5,7 @@ cd inlong-agent
 
 agent 支持本地运行以及线上运行,其中线上运行从inlong manager拉取任务,本地运行可使用http请求提交任务
 
-### 1.1 Agent 线上运行相关设置
+### Agent 线上运行相关设置
 
 线上运行需要从inlong-manager拉取配置,配置conf/agent.properties如下:
 ```ini
@@ -15,15 +15,6 @@ agent.manager.vip.http.host=manager web host
 agent.manager.vip.http.port=manager web port
 ```
 
-### 1.2 DataProxy 相关设置
-在agent目录下新建.inlong文件夹,内部新建bid+.local文件,例如设置的发送bid为a, 则新建a.local文件
-bid是指manager界面中,数据接入中业务信息的业务ID,此处不是创建的tube topic名称
-内部将写入:
-```ini
-{"cluster_id":1,"isInterVisit":1,"size":1,"address": [{"port":写入proxy port,"host":"写入proxy ip"}], "switch":0}
-其中cluster_id, isInterVisit,switch为预留字段,请填写默认值
-```
-
 ## 2、运行
 解压后如下命令运行
 ```bash
@@ -87,3 +78,99 @@ curl --location --request POST 'http://localhost:8008/config/job' \
     /data/inlong-agent/^\\d+(\\.\\d+)? // 以一个或多个数字开头,之后可以是.或者一个.或多个数字结尾,?代表可选,可以匹配的实例:"5", "1.5" 和 "2.21"
 
 
+## 5、支持从文件名称中获取数据时间
+
+    Agent支持从文件名称中获取时间当作数据的生产时间,配置说明如下:
+    /data/inlong-agent/***YYYYMMDDHH***
+    其中YYYYDDMMHH代表数据时间,YYYY表示年,MM表示月份,DD表示天,HH表示小时
+    其中***为任意字符
+
+    同时需要在job conf中加入当前数据的周期,当前支持天周期以及小时周期,
+    在添加任务时,加入属性job.cycleUnit
+    
+    job.cycleUnit 包含如下两种类型:
+    1、D : 代表数据时间天维度
+    2、H : 代表数据时间小时维度
+
+    例如:
+    配置数据源为
+    /data/inlong-agent/YYYYMMDDHH.log
+    写入数据到 2021020211.log
+    配置 job.cycleUnit 为 D
+    则agent会在2021020211时间尝试2021020211.log文件,读取文件中的数据时,会将所有数据以20210202这个时间写入到后端proxy
+    如果配置 job.cycleUnit 为 H
+    则采集2021020211.log文件中的数据时,会将所有数据以2021020211这个时间写入到后端proxy
+
+    
+    提交job举例
+```bash
+curl --location --request POST 'http://localhost:8008/config/job' \
+--header 'Content-Type: application/json' \
+--data '{
+"job": {
+"dir": {
+"path": "",
+"pattern": "/data/inlong-agent/test.log"
+},
+"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
+"id": 1,
+"thread": {
+"running": {
+"core": "4"
+}
+},
+"name": "fileAgentTest",
+"cycleUnit": "D",
+"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
+"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
+"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
+},
+"proxy": {
+"bid": "bid10",
+"tid": "bid10"
+},
+"op": "add"
+}'
+```
+
+
+## 6、支持时间偏移量offset读取
+
+    在配置按照时间读取之后,如果想要读取当前时间之外的其他时间的数据,可以通过配置时间偏移量完成
+    配置job属性名称为job.timeOffset,值为数字 + 时间维度,时间维度包括天和小时
+    例如支持如下设置
+    1、 1d 读取当前时间后一天的数据 
+    2、 -1h 读取当前时间前一个小时的数据
+
+
+    提交job举例
+```bash
+curl --location --request POST 'http://localhost:8008/config/job' \
+--header 'Content-Type: application/json' \
+--data '{
+"job": {
+"dir": {
+"path": "",
+"pattern": "/data/inlong-agent/test.log"
+},
+"trigger": "org.apache.inlong.agent.plugin.trigger.DirectoryTrigger",
+"id": 1,
+"thread": {
+"running": {
+"core": "4"
+}
+},
+"name": "fileAgentTest",
+"cycleUnit": "D",
+"timeOffset": "-1d",
+"source": "org.apache.inlong.agent.plugin.sources.TextFileSource",
+"sink": "org.apache.inlong.agent.plugin.sinks.ProxySink",
+"channel": "org.apache.inlong.agent.plugin.channel.MemoryChannel"
+},
+"proxy": {
+"bid": "bid10",
+"tid": "bid10"
+},
+"op": "add"
+}'
+```
\ No newline at end of file