You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by do...@apache.org on 2022/03/29 08:09:34 UTC
[incubator-inlong-website] branch master updated: [INLONG-3404][Website] Add Hive example document and ElasticSearch example document of Sort-standalone (#323)
This is an automated email from the ASF dual-hosted git repository.
dockerzhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-inlong-website.git
The following commit(s) were added to refs/heads/master by this push:
new b758dc4 [INLONG-3404][Website] Add Hive example document and ElasticSearch example document of Sort-standalone (#323)
b758dc4 is described below
commit b758dc4faa258905b2ad9165b2082a4b667a9cc7
Author: 卢春亮 <94...@qq.com>
AuthorDate: Tue Mar 29 15:58:07 2022 +0800
[INLONG-3404][Website] Add Hive example document and ElasticSearch example document of Sort-standalone (#323)
---
.../sort-standalone/elasticsearch_example.md | 195 ++++++++++++++++++++
docs/modules/sort-standalone/hive_example.md | 199 ++++++++++++++++++++
docs/modules/sort-standalone/quick_start.md | 1 +
.../sort-standalone/elasticsearch_example.md | 196 ++++++++++++++++++++
.../modules/sort-standalone/hive_example.md | 201 +++++++++++++++++++++
5 files changed, 792 insertions(+)
diff --git a/docs/modules/sort-standalone/elasticsearch_example.md b/docs/modules/sort-standalone/elasticsearch_example.md
new file mode 100644
index 0000000..d48b78a
--- /dev/null
+++ b/docs/modules/sort-standalone/elasticsearch_example.md
@@ -0,0 +1,195 @@
+---
+title: Elasticsearch Example
+sidebar_position: 4
+---
+## Prepare to get module archive
+Module archive is in the directory:inlong-sort-standalone/sort-standalone-dist/target/, the archive file is apache-inlong-sort-standalone-${project.version}-bin.tar.gz.
+
+## Prepare to modify configuration file
+At first, decompress the archive file, copy three files in the directory "conf/es/" to the directory "conf/".
+
+- conf/common.properties, common configuration of all components.
+- conf/SortClusterConfig.conf, sink configuration of all sort tasks.
+- conf/sid_es_v3.conf, data source configuration example of a sort task, the file name is same with sort task name in SortClusterConfig.conf.
+
+### Example: conf/common.properties
+
+```
+clusterId=esv3-sz-sz1
+nodeId=nodeId
+metricDomains=Sort
+metricDomains.Sort.domainListeners=org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener
+metricDomains.Sort.snapshotInterval=60000
+sortChannel.type=org.apache.inlong.sort.standalone.channel.BufferQueueChannel
+sortSink.type=org.apache.inlong.sort.standalone.sink.elasticsearch.EsSink
+sortSource.type=org.apache.inlong.sort.standalone.source.sortsdk.SortSdkSource
+
+sortClusterConfig.type=file
+sortClusterConfig.file=SortClusterConfig.conf
+sortSourceConfig.QueryConsumeConfigType=file
+#sortTaskId.conf
+
+#sortClusterConfig.type=manager
+#sortSourceConfig.QueryConsumeConfigType=manager
+#managerUrlLoaderType=org.apache.inlong.sort.standalone.config.loader.CommonPropertiesManagerUrlLoader
+#sortClusterConfig.managerUrl=http://${manager_ip:port}/api/inlong/manager/openapi/sort/getClusterConfig
+#sortSourceConfig.managerUrl=http://${manager_ip:port}/api/inlong/manager/openapi/sort/getSortSource
+```
+
+### Example: conf/SortClusterConfig.conf
+
+```
+{
+ "clusterName": "esv3-gz-gz1",
+ "sortTasks": [{
+ "name": "sid_es_v3",
+ "type": "ES",
+ "idParams": [{
+ "indexNamePattern": "inlong0fc00000046_{yyyyMMdd}",
+ "contentOffset": "0",
+ "inlongGroupId": "atta",
+ "fieldOffset": "2",
+ "fieldNames": "ftime extinfo t1 t2 t3 t4",
+ "inlongStreamId": "0fc00000046",
+ "separator": "|"
+ }],
+ "sinkParams": {
+ "httpHosts": "11.187.135.221:9200",
+ "password": "yingyan@ES",
+ "auditSetName": "es-rmrv7g7a",
+ "bulkSizeMb": "10",
+ "flushInterval": "60",
+ "keywordMaxLength": "32767",
+ "bulkAction": "4000",
+ "concurrentRequests": "5",
+ "maxConnect": "10",
+ "isUseIndexId": "false",
+ "username": "elastic"
+ }
+ }]
+}
+```
+
+### Example: conf/sid_es_v3.conf
+
+```
+{
+ "sortClusterName": "esv3-gz-gz1",
+ "sortTaskId": "sid_es_v3",
+ "cacheZones": {
+ "pc_atta6th_sz1": {
+ "zoneName": "pc_atta6th_sz1",
+ "serviceUrl": "http://9.139.53.86:8080",
+ "authentication": "eyJrZXlJZCI6InB1bHNhci04MnhhN24zZWs1ZHciLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJwdWxzYXItODJ4YTduM2VrNWR3X2FkbWluIn0.D5H_j8UQk8KYWHw_mzq2HmR393SnbL5Gz7JYCANBPnI",
+ "topics": [{
+ "topic": "pulsar-82xa7n3ek5dw/atta/atta_topic_1",
+ "partitionCnt": 10,
+ "topicProperties": {}
+ }],
+ "cacheZoneProperties": {},
+ "zoneType": "pulsar"
+ }
+ }
+}
+```
+
+## Modify configuration file:conf/common.properties
+
+| Parameter | Required | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ |
+|clusterId | Y | NA | inlong-sort-standalone cluster id |
+|nodeId | N | Local IP | Current node id |
+|metricDomains | N | Sort | domain name of metric |
+|metricDomains.Sort.domainListeners | N | org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener | class name list of metric listener, separated by space |
+|metricDomains.Sort.snapshotInterval | N | 60000 | interval snapshoting metric data(millisecond) |
+|prometheusHttpPort | N | 8080 | HTTP server port of prometheus simple client |
+|sortChannel.type | N | org.apache.inlong.sort.standalone.channel.BufferQueueChannel | Channel class name |
+|sortSink.type | Y | NA | Sink class name |
+|sortSource.type | N | org.apache.inlong.sort.standalone.source.sortsdk.SortSdkSource | Source class name |
+|sortClusterConfig.type | N | manager | Loader source of cluster configuration data: [file,manager,UserDefinedClassName]. |
+|sortClusterConfig.file | N | SortClusterConfig.conf | File name in class resource when sortClusterConfig.type=file. |
+|sortClusterConfig.managerUrl | N | NA | The parameter is the cluster configuration URL of InlongManager when sortClusterConfig.type=manager. <br/>For example:http://${manager ip:port}/api/inlong/manager/openapi/sort/standalone/getClusterConfig |
+|sortSourceConfig.QueryConsumeConfigType | N | manager | Loader source of sort task configuration data: [file,manager,UserDefinedClassName]. <br/>Sort task configuration file is ${sortTaskId}.conf in the class resource when sortSourceConfig.QueryConsumeConfigType=file. |
+|sortSourceConfig.managerUrl | N | NA | The parameter is the sort task configuration URL of InlongManager when sortClusterConfig.type=manager. <br/>For example:http://${manager ip:port}/api/inlong/manager/openapi/sort/standalone/getSortSource |
+
+## Modify configuration file: SortClusterConfig.conf
+
+- Get cluster configuration data from the file:SortClusterConfig.conf in classpath, it can not support online updating.
+- Get cluster configuration data from InlongManager URL, it can support online updating.
+
+| Parameter | Required |Type | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|clusterName | Y |String | NA | inlong-sort-standalone cluster id |
+|sortTasks | Y | JsonArray<SortTaskConfig> |NA | Sort task list |
+
+### Modify configuration: SortTaskConfig
+
+| Parameter | Required | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ |
+|name | Y | NA | sort task name |
+|type | Y | NA | sort task type, for example:HIVE("hive"), TUBE("tube"), KAFKA("kafka"), PULSAR("pulsar"), ELASTICSEARCH("elasticsearch"), UNKNOWN("n") |
+|idParams | Y | NA | Inlong DataStream configuration |
+|sinkParams | Y | NA | Sort task parameters |
+
+### Modify configuration: idParams of Elasticsearch sort task
+
+| Parameter | Required | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ |
+|inlongGroupId | Y | NA | inlongGroupId |
+|inlongStreamId | Y | NA | inlongStreamId |
+|separator | Y | NA | separator of Inlong datastream in data source |
+|fieldNames | Y | NA | field name list of Elasticsearch index, separated by space. |
+|indexNamePattern | Y | NA | index name pattern of Elasticsearch,date time variable include {yyyyMMdd},{yyyyMMddHH},{yyyyMMddHHmm}. |
+|contentOffset | Y | NA | field index offset of source content |
+|fieldOffset | Y | NA | offset of Elasticsearch index field name list |
+
+### Modify configuration: sinkParams of Elasticsearch sort task
+| Parameter | Required | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ |
+|httpHosts | Y | NA | Hosts of Elasticsearch |
+|username | Y | NA | Username of Elasticsearch |
+|password | Y | NA | Password of Elasticsearch |
+|isUseIndexId | N | false | Create index id or not |
+|bulkSizeMb | N | 10 | Max content size per bulk(MB) |
+|flushInterval | N | 60 | Max interval between flushing operation(Second) |
+|keywordMaxLength | N | 32767 | Max keyword length(Byte) |
+|bulkAction | N | 4000 | Max index request per bulk |
+|maxConnect | N | 10 | Max opening HTTP connect |
+|concurrentRequests | N | 5 | Max concurrent requests per HTTP connect |
+
+## Modify configuration file: sid_es_v3.conf
+
+- The file name include sort task name plus the postfix ".conf".
+- Get the configuration data from the file in classpath, it can not support online updating.
+- Get the configuration data from InlongManager URL, it can support online updating.
+
+### Modify configuration: sid_es_v3.conf
+
+| Parameter | Required |Type | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|sortClusterName | Y |String | NA | inlong-sort-standalone cluster id |
+|sortTaskId | Y | String |NA | Sort task name |
+|cacheZones | Y | JsonObject<String, JsonObject> |NA | Cache cluster list, Map<cacheClusterName, CacheCluster> |
+
+### Modify configuration: CacheCluster
+
+| Parameter | Required |Type | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|zoneName | Y |String | NA | cache cluster name |
+|zoneType | Y | String |NA | [pulsar,tube,kafka] |
+|serviceUrl | Y | String |NA | Pulsar serviceUrl or Kafka broker list |
+|authentication | Y | String |NA | Pulsar authentication |
+|cacheZoneProperties | N | Map<String,String> |NA | Cache consumer configuration |
+|topics | N | List<Topic> |NA | Topic list of Cache consumer |
+
+### Modify configuration: Topic
+
+| Parameter | Required |Type | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|topic | Y |String | NA | cache topic name |
+|partitionCnt | Y | Integer |NA | cache topic partition count |
+|topicProperties | N | Map<String,String> |NA | Cache topic configuration |
+
+## Start inlong-sort-standalone application
+At last, execute the shell file "./bin/sort-start.sh" for starting sort-standalone, you can check the log file "sort.log".
+
diff --git a/docs/modules/sort-standalone/hive_example.md b/docs/modules/sort-standalone/hive_example.md
new file mode 100644
index 0000000..218f8fa
--- /dev/null
+++ b/docs/modules/sort-standalone/hive_example.md
@@ -0,0 +1,199 @@
+---
+title: Hive Example
+sidebar_position: 3
+---
+## Prepare to get module archive
+Module archive is in the directory:inlong-sort-standalone/sort-standalone-dist/target/, the archive file is apache-inlong-sort-standalone-${project.version}-bin.tar.gz.
+
+## Prepare to modify configuration file
+At first, decompress the archive file, copy three files in the directory "conf/hive/" to the directory "conf/".
+
+- conf/common.properties, common configuration of all components.
+- conf/SortClusterConfig.conf, sink configuration of all sort tasks.
+- conf/sid_hive_inlong6th_v3.conf, data source configuration example of a sort task, the file name is same with sort task name in SortClusterConfig.conf.
+
+### Example: conf/common.properties
+
+```
+clusterId=hivev3-sz-sz1
+nodeId=nodeId
+metricDomains=Sort
+metricDomains.Sort.domainListeners=org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener
+metricDomains.Sort.snapshotInterval=60000
+sortChannel.type=org.apache.inlong.sort.standalone.channel.BufferQueueChannel
+sortSink.type=org.apache.inlong.sort.standalone.sink.hive.HiveSink
+sortSource.type=org.apache.inlong.sort.standalone.source.sortsdk.SortSdkSource
+
+sortClusterConfig.type=file
+sortClusterConfig.file=SortClusterConfig.conf
+sortSourceConfig.QueryConsumeConfigType=file
+#sortTaskId.conf
+
+#sortClusterConfig.type=manager
+#sortSourceConfig.QueryConsumeConfigType=manager
+#managerUrlLoaderType=org.apache.inlong.sort.standalone.config.loader.CommonPropertiesManagerUrlLoader
+#sortClusterConfig.managerUrl=http://${manager_ip:port}/api/inlong/manager/openapi/sort/getClusterConfig
+#sortSourceConfig.managerUrl=http://${manager_ip:port}/api/inlong/manager/openapi/sort/getSortSource
+
+```
+
+### Example: conf/SortClusterConfig.conf
+
+```
+{
+ "clusterName": "hivev3-sz-sz1",
+ "sortTasks": [{
+ "name": "sid_hive_inlong6th_v3",
+ "type": "HIVE",
+ "idParams": [{
+ "inlongGroupId": "atta",
+ "inlongStreamId": "0fc00000046",
+ "separator": "|",
+ "partitionIntervalMs": 3600000,
+ "idRootPath": "/user/hive/warehouse/t_inlong_v1_0fc00000046",
+ "partitionSubPath": "/{yyyyMMdd}/{yyyyMMddHH}",
+ "hiveTableName": "t_inlong_v1_0fc00000046",
+ "partitionFieldName": "dt",
+ "partitionFieldPattern": "yyyyMMddHH",
+ "msgTimeFieldPattern": "yyyy-MM-dd HH:mm:ss",
+ "maxPartitionOpenDelayHour": 8
+ }],
+ "sinkParams": {
+ "hdfsPath": "hdfs://10.160.139.123:9000",
+ "maxFileOpenDelayMinute": "5",
+ "tokenOvertimeMinute": "60",
+ "maxOutputFileSizeGb": "2",
+ "hiveJdbcUrl": "jdbc:hive2://10.160.142.179:10000",
+ "hiveDatabase": "default",
+ "hiveUsername": "hive",
+ "hivePassword": "hive"
+ }
+ }]
+}
+```
+
+### Example: conf/sid_hive_inlong6th_v3.conf
+
+```
+{
+ "sortClusterName": "hivev3-sz-sz1",
+ "sortTaskId": "sid_hive_inlong6th_v3",
+ "cacheZones": {
+ "pc_atta6th_sz1": {
+ "zoneName": "pc_atta6th_sz1",
+ "serviceUrl": "http://9.139.53.86:8080",
+ "authentication": "eyJrZXlJZCI6InB1bHNhci04MnhhN24zZWs1ZHciLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJwdWxzYXItODJ4YTduM2VrNWR3X2FkbWluIn0.D5H_j8UQk8KYWHw_mzq2HmR393SnbL5Gz7JYCANBPnI",
+ "topics": [{
+ "topic": "pulsar-82xa7n3ek5dw/atta/atta_topic_1",
+ "partitionCnt": 10,
+ "topicProperties": {}
+ }],
+ "cacheZoneProperties": {},
+ "zoneType": "pulsar"
+ }
+ }
+}
+```
+
+## Modify configuration file:conf/common.properties
+
+| Parameter | Required | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ |
+|clusterId | Y | NA | inlong-sort-standalone cluster id |
+|nodeId | N | Local IP | Current node id |
+|metricDomains | N | Sort | domain name of metric |
+|metricDomains.Sort.domainListeners | N | org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener | class name list of metric listener, separated by space |
+|metricDomains.Sort.snapshotInterval | N | 60000 | interval snapshoting metric data(millisecond) |
+|prometheusHttpPort | N | 8080 | HTTP server port of prometheus simple client |
+|sortChannel.type | N | org.apache.inlong.sort.standalone.channel.BufferQueueChannel | Channel class name |
+|sortSink.type | Y | NA | Sink class name |
+|sortSource.type | N | org.apache.inlong.sort.standalone.source.sortsdk.SortSdkSource | Source class name |
+|sortClusterConfig.type | N | manager | Loader source of cluster configuration data: [file,manager,UserDefinedClassName]. |
+|sortClusterConfig.file | N | SortClusterConfig.conf | File name in class resource when sortClusterConfig.type=file. |
+|sortClusterConfig.managerUrl | N | NA | The parameter is the cluster configuration URL of InlongManager when sortClusterConfig.type=manager. <br/>For example:http://${manager ip:port}/api/inlong/manager/openapi/sort/standalone/getClusterConfig |
+|sortSourceConfig.QueryConsumeConfigType | N | manager | Loader source of sort task configuration data: [file,manager,UserDefinedClassName]. <br/>Sort task configuration file is ${sortTaskId}.conf in the class resource when sortSourceConfig.QueryConsumeConfigType=file. |
+|sortSourceConfig.managerUrl | N | NA | The parameter is the sort task configuration URL of InlongManager when sortClusterConfig.type=manager. <br/>For example:http://${manager ip:port}/api/inlong/manager/openapi/sort/standalone/getSortSource |
+
+## Modify configuration file: SortClusterConfig.conf
+
+- Get cluster configuration data from the file:SortClusterConfig.conf in classpath, it can not support online updating.
+- Get cluster configuration data from InlongManager URL, it can support online updating.
+
+| Parameter | Required |Type | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|clusterName | Y |String | NA | inlong-sort-standalone cluster id |
+|sortTasks | Y | JsonArray<SortTaskConfig> |NA | Sort task list |
+
+### Modify configuration: SortTaskConfig
+
+| Parameter | Required | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ |
+|name | Y | NA | sort task name |
+|type | Y | NA | sort task type, for example:HIVE("hive"), TUBE("tube"), KAFKA("kafka"), PULSAR("pulsar"), ElasticSearch("ElasticSearch"), UNKNOWN("n") |
+|idParams | Y | NA | Inlong DataStream configuration |
+|sinkParams | Y | NA | Sort task parameters |
+
+### Modify configuration: idParams of Hive sort task
+
+| Parameter | Required | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ |
+|inlongGroupId | Y | NA | inlongGroupId |
+|inlongStreamId | Y | NA | inlongStreamId |
+|separator | Y | NA | separator of Inlong datastream in data source |
+|partitionIntervalMs | N | 3600000 | partition interval(millisecond) |
+|idRootPath | Y | NA | HDFS root path of Inlong DataStream |
+|partitionSubPath | Y | NA | partition sub path of Inlong DataStream |
+|hiveTableName | Y | NA | Hive table name of Inlong DataStream |
+|partitionFieldName | N | dt | partition field name of Inlong DataStream |
+|partitionFieldPattern | Y | NA | Date format of partition field value, the type have {yyyyMMdd},{yyyyMMddHH},{yyyyMMddHHmm} |
+|msgTimeFieldPattern | Y | NA | Date format of message generation time, it support Java date format |
+|maxPartitionOpenDelayHour | N | 8 | Max delay hour of partition(hour) |
+
+### Modify configuration: sinkParams of Hive sort task
+| Parameter | Required | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ |
+|hdfsPath | Y | NA | NameNode URL of HDFS |
+|maxFileOpenDelayMinute | N | 5 | Max writing delay minutes of simple HDFS file(minute) |
+|tokenOvertimeMinute | N | 60 | token overtime of Inlong Data Stream(minute) |
+|maxOutputFileSizeGb | N | 2 | Max file size of simple HDFS file(GB) |
+|hiveJdbcUrl | Y | NA | JDBC URL of Hive |
+|hiveDatabase | Y | NA | Hive database |
+|hiveUsername | Y | NA | Hive username |
+|hivePassword | Y | NA | Hive password |
+
+## Modify configuration file: sid_hive_inlong6th_v3.conf
+
+- The file name include sort task name plus the postfix ".conf".
+- Get the configuration data from the file in classpath, it can not support online updating.
+- Get the configuration data from InlongManager URL, it can support online updating.
+
+### Modify configuration: sid_hive_inlong6th_v3.conf
+
+| Parameter | Required |Type | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|sortClusterName | Y |String | NA | inlong-sort-standalone cluster id |
+|sortTaskId | Y | String |NA | Sort task name |
+|cacheZones | Y | JsonObject<String, JsonObject> |NA | Cache cluster list, Map<cacheClusterName, CacheCluster> |
+
+### Modify configuration: CacheCluster
+
+| Parameter | Required |Type | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|zoneName | Y |String | NA | cache cluster name |
+|zoneType | Y | String |NA | [pulsar,tube,kafka] |
+|serviceUrl | Y | String |NA | Pulsar serviceUrl or Kafka broker list |
+|authentication | Y | String |NA | Pulsar authentication |
+|cacheZoneProperties | N | Map<String,String> |NA | Cache consumer configuration |
+|topics | N | List<Topic> |NA | Topic list of Cache consumer |
+
+### Modify configuration: Topic
+
+| Parameter | Required |Type | DefaultValue |Remark |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|topic | Y |String | NA | cache topic name |
+|partitionCnt | Y | Integer |NA | cache topic partition count |
+|topicProperties | N | Map<String,String> |NA | Cache topic configuration |
+
+## Start inlong-sort-standalone application
+At last, execute the shell file "./bin/sort-start.sh" for starting sort-standalone, you can check the log file "sort.log".
+
diff --git a/docs/modules/sort-standalone/quick_start.md b/docs/modules/sort-standalone/quick_start.md
index bd3a909..f72dbb9 100644
--- a/docs/modules/sort-standalone/quick_start.md
+++ b/docs/modules/sort-standalone/quick_start.md
@@ -29,6 +29,7 @@ At first, decompress the archive file, execute the shell file "./bin/sort-start.
## SortClusterConfig
- Get SortClusterConfig from the file:SortClusterConfig.conf in classpath, but it can not support online updating.
- Get SortClusterConfig from InlongManager URL, but it can support online updating.
+
| Parameter | Required | DefaultValue |Remark |
| ------------ | ------------ | ------------ | ------------ |
|clusterName | Y | NA | inlong-sort-standalone cluster id |
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort-standalone/elasticsearch_example.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort-standalone/elasticsearch_example.md
new file mode 100644
index 0000000..7276bfa
--- /dev/null
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort-standalone/elasticsearch_example.md
@@ -0,0 +1,196 @@
+---
+title: Elasticsearch 示例
+sidebar_position: 4
+---
+## 准备安装文件
+安装文件在`inlong-sort-standalone/sort-standalone-dist/target/`目录下,文件名是apache-inlong-sort-standalone-${project.version}-bin.tar.gz。
+
+## 准备修改配置文件
+首先,解压压缩包apache-inlong-sort-standalone-${project.version}-bin.tar.gz,然后从目录"conf/es/"下拷贝3个文件到目录"conf/"。
+
+- conf/common.properties,所有组件的基本配置参数
+- conf/SortClusterConfig.conf,所有Sort任务的sink配置。, sink configuration of all sort tasks.
+- conf/sid_es_v3.conf,一个Sort任务的数据源配置,文件名和配置文件SortClusterConfig.conf中的Sort任务名一致,如果SortClusterConfig.conf中配置了多个Sort任务,那么会有多个Sort任务的数据源配置。
+
+### conf/common.properties配置样例
+
+```
+clusterId=esv3-sz-sz1
+nodeId=nodeId
+metricDomains=Sort
+metricDomains.Sort.domainListeners=org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener
+metricDomains.Sort.snapshotInterval=60000
+sortChannel.type=org.apache.inlong.sort.standalone.channel.BufferQueueChannel
+sortSink.type=org.apache.inlong.sort.standalone.sink.elasticsearch.EsSink
+sortSource.type=org.apache.inlong.sort.standalone.source.sortsdk.SortSdkSource
+
+sortClusterConfig.type=file
+sortClusterConfig.file=SortClusterConfig.conf
+sortSourceConfig.QueryConsumeConfigType=file
+#sortTaskId.conf
+
+#sortClusterConfig.type=manager
+#sortSourceConfig.QueryConsumeConfigType=manager
+#managerUrlLoaderType=org.apache.inlong.sort.standalone.config.loader.CommonPropertiesManagerUrlLoader
+#sortClusterConfig.managerUrl=http://${manager_ip:port}/api/inlong/manager/openapi/sort/getClusterConfig
+#sortSourceConfig.managerUrl=http://${manager_ip:port}/api/inlong/manager/openapi/sort/getSortSource
+```
+
+### conf/SortClusterConfig.conf配置样例
+
+```
+{
+ "clusterName": "esv3-gz-gz1",
+ "sortTasks": [{
+ "name": "sid_es_v3",
+ "type": "ES",
+ "idParams": [{
+ "indexNamePattern": "inlong0fc00000046_{yyyyMMdd}",
+ "contentOffset": "0",
+ "inlongGroupId": "atta",
+ "fieldOffset": "2",
+ "fieldNames": "ftime extinfo t1 t2 t3 t4",
+ "inlongStreamId": "0fc00000046",
+ "separator": "|"
+ }],
+ "sinkParams": {
+ "httpHosts": "11.187.135.221:9200",
+ "password": "yingyan@ES",
+ "auditSetName": "es-rmrv7g7a",
+ "bulkSizeMb": "10",
+ "flushInterval": "60",
+ "keywordMaxLength": "32767",
+ "bulkAction": "4000",
+ "concurrentRequests": "5",
+ "maxConnect": "10",
+ "isUseIndexId": "false",
+ "username": "elastic"
+ }
+ }]
+}
+```
+
+### conf/sid_es_v3.conf配置样例
+
+```
+{
+ "sortClusterName": "esv3-gz-gz1",
+ "sortTaskId": "sid_es_v3",
+ "cacheZones": {
+ "pc_atta6th_sz1": {
+ "zoneName": "pc_atta6th_sz1",
+ "serviceUrl": "http://9.139.53.86:8080",
+ "authentication": "eyJrZXlJZCI6InB1bHNhci04MnhhN24zZWs1ZHciLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJwdWxzYXItODJ4YTduM2VrNWR3X2FkbWluIn0.D5H_j8UQk8KYWHw_mzq2HmR393SnbL5Gz7JYCANBPnI",
+ "topics": [{
+ "topic": "pulsar-82xa7n3ek5dw/atta/atta_topic_1",
+ "partitionCnt": 10,
+ "topicProperties": {}
+ }],
+ "cacheZoneProperties": {},
+ "zoneType": "pulsar"
+ }
+ }
+}
+```
+
+## conf/common.properties配置参数
+
+| 参数名 | 是否必须 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ |
+|clusterId | Y | NA | 用来唯一标识一个inlong-sort-standalone集群 |
+|nodeId | N | 本机IP | 当前节点ID |
+|metricDomains | N | Sort | 指标汇总域名 |
+|metricDomains.Sort.domainListeners | N | org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener | 指标汇总监听器类名列表,空格分隔 |
+|metricDomains.Sort.snapshotInterval | N | 60000 | 订阅tube的重试超时时间,单位为ms |
+|prometheusHttpPort | N | 8080 | org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener的参数,Prometheus的HttpServer端口 |
+|sortChannel.type | N | org.apache.inlong.sort.standalone.channel.BufferQueueChannel | Channel类型 |
+|sortSink.type | Y | NA | Sink类名,不同的分发类型使用不同的Sink类 |
+|sortSource.type | N | org.apache.inlong.sort.standalone.source.sortsdk.SortSdkSource | Source类名 |
+|sortClusterConfig.type | N | manager | 集群配置数据的加载来源,有三种方式:[文件,Manager,自定义类]。 |
+|sortClusterConfig.file | N | SortClusterConfig.conf | 当集群配置数据加载来源是file时,在类路径下的配置文件名 |
+|sortClusterConfig.managerUrl | N | NA | 集群配置数据加载来源是manager时,这里定义InlongManager的URL<br/>比如:http://${manager ip:port}/api/inlong/manager/openapi/sort/standalone/getClusterConfig |
+|sortSourceConfig.QueryConsumeConfigType | N | manager | Sort任务配置数据的加载来源,有三种方式:[文件,Manager,自定义类]。 <br/>如果加载路径是file的话,Sort任务配置文件是在类路径里,文件名的格式:${sortTaskId}.conf。 |
+|sortSourceConfig.managerUrl | N | NA | Sort任务配置数据加载来源是manager时,这里定义InlongManager的URL<br/>比如::http://${manager ip:port}/api/inlong/manager/openapi/sort/standalone/getSortSource |
+
+## SortClusterConfig.conf配置参数
+
+- 可以从ClassPath的SortClusterConfig.conf源文件读取,但不支持实时更新
+- 可以从Inlong Manager的HTTP接口获取配置,支持实时更新
+
+| 参数名 | 是否必须 |类型 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|clusterName | Y |String | NA | 用来唯一标识一个inlong-sort-standalone集群 |
+|sortTasks | Y | JsonArray<SortTaskConfig> |NA | Sort任务列表 |
+
+### SortTaskConfig配置参数
+
+| 参数名 | 是否必须 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ |
+|name | Y | NA | Sort任务名 |
+|type | Y | NA | Sort任务类型,如:HIVE("hive"), TUBE("tube"), KAFKA("kafka"), PULSAR("pulsar"), ELASTICSEARCH("elasticsearch"), UNKNOWN("n") |
+|idParams | Y | NA | Inlong数据流参数列表 |
+|sinkParams | Y | NA | Sort任务参数 |
+
+### Sort-Elasticsearch任务的idParams配置参数
+
+| 参数名 | 是否必须 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ |
+|inlongGroupId | Y | NA | inlongGroupId |
+|inlongStreamId | Y | NA | inlongStreamId |
+|separator | Y | NA | 分隔符 |
+|fieldNames | Y | NA | Elasticsearch的Index字段列表,用空格分隔 |
+|indexNamePattern | Y | NA | Index的名字模板,支持三种日期时间格式变量:{yyyyMMdd},{yyyyMMddHH},{yyyyMMddHHmm} |
+|contentOffset | Y | NA | 源数据的有效字段开始偏移,从0开始 |
+|fieldOffset | Y | NA | Elasticsearch的Index字段列表的开始偏移 |
+
+### Sort-Elasticsearch任务的sinkParams配置参数
+
+| 参数名 | 是否必须 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ |
+|httpHosts | Y | NA | Elasticsearch的Host的IP端口|
+|username | Y | NA | Elasticsearch用户名 |
+|password | Y | NA | Elasticsearch密码 |
+|isUseIndexId | N | false | 是否创建IndexId,影响Index分片分布 |
+|bulkSizeMb | N | 10 | 单Bulk的最大大小,单位MB |
+|flushInterval | N | 60 | 刷盘间隔,单位是秒 |
+|keywordMaxLength | N | 32767 | 单个keyword最大长度,单位是字节 |
+|bulkAction | N | 4000 | 单个Bulk的最大IndexRequest数 |
+|maxConnect | N | 10 | 最大HTTP连接数 |
+|concurrentRequests | N | 5 | 单个HTTP连接的最大等待请求数 |
+
+## Sort-Elasticsearch任务的sid_es_v3.conf配置参数
+
+- 文件名格式:Sort任务名+".conf"。
+- 可以从ClassPath的SortClusterConfig.conf源文件读取,但不支持实时更新。
+- 可以从Inlong Manager的HTTP接口获取配置,支持实时更新。
+
+### sid_es_v3.conf配置参数
+
+| 参数名 | 是否必须 |类型 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|sortClusterName | Y |String | NA | 用来唯一标识一个inlong-sort-standalone集群 |
+|sortTaskId | Y | String |NA | Sort任务名 |
+|cacheZones | Y | JsonObject<String, JsonObject> |NA | 缓存层集群列表,格式:Map<cacheClusterName, CacheCluster> |
+
+### CacheCluster配置参数
+
+| 参数名 | 是否必须 |类型 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|zoneName | Y |String | NA | 缓存层集群名 |
+|zoneType | Y | String |NA | 缓存类型:[pulsar,tube,kafka] |
+|serviceUrl | Y | String |NA | Pulsar的serviceUrl参数,或者Kafka的Broker列表 |
+|authentication | Y | String |NA | Pulsar鉴权 |
+|cacheZoneProperties | N | Map<String,String> |NA | 缓存层Consumer参数 |
+|topics | N | List<Topic> |NA | 缓存层消费的Topic列表 |
+
+### Topic配置参数
+
+| 参数名 | 是否必须 |类型 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|topic | Y |String | NA | Topic完整名,Pulsar:tenant/namespace/topic |
+|partitionCnt | Y | Integer |NA | Topic分区数 |
+|topicProperties | N | Map<String,String> |NA | 缓存层Topic的Consumer参数 |
+
+## 启动inlong-sort-standalone应用
+最后,执行脚本"./bin/sort-start.sh",启动sort-standalone应用,之后可以检查日志文件sort.log,确认启动情况。
+
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort-standalone/hive_example.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort-standalone/hive_example.md
new file mode 100644
index 0000000..7be4625
--- /dev/null
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/sort-standalone/hive_example.md
@@ -0,0 +1,201 @@
+---
+title: Hive 示例
+sidebar_position: 3
+---
+## 准备安装文件
+安装文件在`inlong-sort-standalone/sort-standalone-dist/target/`目录下,文件名是apache-inlong-sort-standalone-${project.version}-bin.tar.gz。
+
+## 准备修改配置文件
+首先,解压压缩包apache-inlong-sort-standalone-${project.version}-bin.tar.gz,然后从目录"conf/hive/"下拷贝3个文件到目录"conf/"。
+
+- conf/common.properties,所有组件的基本配置参数
+- conf/SortClusterConfig.conf,所有Sort任务的sink配置。, sink configuration of all sort tasks.
+- conf/sid_hive_inlong6th_v3.conf,一个Sort任务的数据源配置,文件名和配置文件SortClusterConfig.conf中的Sort任务名一致,如果SortClusterConfig.conf中配置了多个Sort任务,那么会有多个Sort任务的数据源配置。
+
+### conf/common.properties配置样例
+
+```
+clusterId=hivev3-sz-sz1
+nodeId=nodeId
+metricDomains=Sort
+metricDomains.Sort.domainListeners=org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener
+metricDomains.Sort.snapshotInterval=60000
+sortChannel.type=org.apache.inlong.sort.standalone.channel.BufferQueueChannel
+sortSink.type=org.apache.inlong.sort.standalone.sink.hive.HiveSink
+sortSource.type=org.apache.inlong.sort.standalone.source.sortsdk.SortSdkSource
+
+sortClusterConfig.type=file
+sortClusterConfig.file=SortClusterConfig.conf
+sortSourceConfig.QueryConsumeConfigType=file
+#sortTaskId.conf
+
+#sortClusterConfig.type=manager
+#sortSourceConfig.QueryConsumeConfigType=manager
+#managerUrlLoaderType=org.apache.inlong.sort.standalone.config.loader.CommonPropertiesManagerUrlLoader
+#sortClusterConfig.managerUrl=http://${manager_ip:port}/api/inlong/manager/openapi/sort/getClusterConfig
+#sortSourceConfig.managerUrl=http://${manager_ip:port}/api/inlong/manager/openapi/sort/getSortSource
+
+```
+
+### conf/SortClusterConfig.conf配置样例
+
+```
+{
+ "clusterName": "hivev3-sz-sz1",
+ "sortTasks": [{
+ "name": "sid_hive_inlong6th_v3",
+ "type": "HIVE",
+ "idParams": [{
+ "inlongGroupId": "atta",
+ "inlongStreamId": "0fc00000046",
+ "separator": "|",
+ "partitionIntervalMs": 3600000,
+ "idRootPath": "/user/hive/warehouse/t_inlong_v1_0fc00000046",
+ "partitionSubPath": "/{yyyyMMdd}/{yyyyMMddHH}",
+ "hiveTableName": "t_inlong_v1_0fc00000046",
+ "partitionFieldName": "dt",
+ "partitionFieldPattern": "yyyyMMddHH",
+ "msgTimeFieldPattern": "yyyy-MM-dd HH:mm:ss",
+ "maxPartitionOpenDelayHour": 8
+ }],
+ "sinkParams": {
+ "hdfsPath": "hdfs://10.160.139.123:9000",
+ "maxFileOpenDelayMinute": "5",
+ "tokenOvertimeMinute": "60",
+ "maxOutputFileSizeGb": "2",
+ "hiveJdbcUrl": "jdbc:hive2://10.160.142.179:10000",
+ "hiveDatabase": "default",
+ "hiveUsername": "hive",
+ "hivePassword": "hive"
+ }
+ }]
+}
+```
+
+### conf/sid_hive_inlong6th_v3.conf配置样例
+
+```
+{
+ "sortClusterName": "hivev3-sz-sz1",
+ "sortTaskId": "sid_hive_inlong6th_v3",
+ "cacheZones": {
+ "pc_atta6th_sz1": {
+ "zoneName": "pc_atta6th_sz1",
+ "serviceUrl": "http://9.139.53.86:8080",
+ "authentication": "eyJrZXlJZCI6InB1bHNhci04MnhhN24zZWs1ZHciLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJwdWxzYXItODJ4YTduM2VrNWR3X2FkbWluIn0.D5H_j8UQk8KYWHw_mzq2HmR393SnbL5Gz7JYCANBPnI",
+ "topics": [{
+ "topic": "pulsar-82xa7n3ek5dw/atta/atta_topic_1",
+ "partitionCnt": 10,
+ "topicProperties": {}
+ }],
+ "cacheZoneProperties": {},
+ "zoneType": "pulsar"
+ }
+ }
+}
+```
+
+## conf/common.properties配置参数
+
+| 参数名 | 是否必须 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ |
+|clusterId | Y | NA | 用来唯一标识一个inlong-sort-standalone集群 |
+|nodeId | N | 本机IP | 当前节点ID |
+|metricDomains | N | Sort | 指标汇总域名 |
+|metricDomains.Sort.domainListeners | N | org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener | 指标汇总监听器类名列表,空格分隔 |
+|metricDomains.Sort.snapshotInterval | N | 60000 | 订阅tube的重试超时时间,单位为ms |
+|prometheusHttpPort | N | 8080 | org.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListener的参数,Prometheus的HttpServer端口 |
+|sortChannel.type | N | org.apache.inlong.sort.standalone.channel.BufferQueueChannel | Channel类型 |
+|sortSink.type | Y | NA | Sink类名,不同的分发类型使用不同的Sink类 |
+|sortSource.type | N | org.apache.inlong.sort.standalone.source.sortsdk.SortSdkSource | Source类名 |
+|sortClusterConfig.type | N | manager | 集群配置数据的加载来源,有三种方式:[文件,Manager,自定义类]。 |
+|sortClusterConfig.file | N | SortClusterConfig.conf | 当集群配置数据加载来源是file时,在类路径下的配置文件名 |
+|sortClusterConfig.managerUrl | N | NA | 集群配置数据加载来源是manager时,这里定义InlongManager的URL<br/>比如:http://${manager ip:port}/api/inlong/manager/openapi/sort/standalone/getClusterConfig |
+|sortSourceConfig.QueryConsumeConfigType | N | manager | Sort任务配置数据的加载来源,有三种方式:[文件,Manager,自定义类]。 <br/>如果加载路径是file的话,Sort任务配置文件是在类路径里,文件名的格式:${sortTaskId}.conf。 |
+|sortSourceConfig.managerUrl | N | NA | Sort任务配置数据加载来源是manager时,这里定义InlongManager的URL<br/>比如::http://${manager ip:port}/api/inlong/manager/openapi/sort/standalone/getSortSource |
+
+## SortClusterConfig.conf配置参数
+
+- 可以从ClassPath的SortClusterConfig.conf源文件读取,但不支持实时更新
+- 可以从Inlong Manager的HTTP接口获取配置,支持实时更新
+
+| 参数名 | 是否必须 |类型 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|clusterName | Y |String | NA | 用来唯一标识一个inlong-sort-standalone集群 |
+|sortTasks | Y | JsonArray<SortTaskConfig> |NA | Sort任务列表 |
+
+### SortTaskConfig配置参数
+
+| 参数名 | 是否必须 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ |
+|name | Y | NA | Sort任务名 |
+|type | Y | NA | Sort任务类型,如:HIVE("hive"), TUBE("tube"), KAFKA("kafka"), PULSAR("pulsar"), ElasticSearch("ElasticSearch"), UNKNOWN("n") |
+|idParams | Y | NA | Inlong数据流参数列表 |
+|sinkParams | Y | NA | Sort任务参数 |
+
+### Sort-Hive任务的idParams配置参数
+
+| 参数名 | 是否必须 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ |
+|inlongGroupId | Y | NA | inlongGroupId |
+|inlongStreamId | Y | NA | inlongStreamId |
+|separator | Y | NA | 分隔符 |
+|partitionIntervalMs | N | 3600000 | 分区间隔时间,单位毫秒 |
+|idRootPath | Y | NA | Inlong数据流的Hdfs根目录 |
+|partitionSubPath | Y | NA | Inlong数据流的分区子目录 |
+|hiveTableName | Y | NA | Inlong数据流的Hive表名 |
+|partitionFieldName | N | dt | Inlong数据流的分区字段名 |
+|partitionFieldPattern | Y | NA | Inlong数据流的分区字段值格式,如{yyyyMMdd}、{yyyyMMddHH}、{yyyyMMddHHmm} |
+|msgTimeFieldPattern | Y | NA | 消息生成时间的字段值格式,Java时间格式 |
+|maxPartitionOpenDelayHour | N | 8 | 分区最大打开延迟时间,单位小时 |
+
+### Sort-Hive任务的sinkParams配置参数
+
+| 参数名 | 是否必须 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ |
+|hdfsPath | Y | NA | HDFS的NameNode |
+|maxFileOpenDelayMinute | N | 5 | 单个HDFS文件最大写入时间,单位分钟 |
+|tokenOvertimeMinute | N | 60 | 单个Inlong数据流的分区创建token最大占用时间,单位分钟 |
+|maxOutputFileSizeGb | N | 2 | 单个HDFS文件最大大小,单位GB |
+|hiveJdbcUrl | Y | NA | Hive的JDBC路径 |
+|hiveDatabase | Y | NA | Hive的数据库 |
+|hiveUsername | Y | NA | Hive的用户名 |
+|hivePassword | Y | NA | Hive的密码 |
+
+## Sort-Hive任务的sid_hive_inlong6th_v3.conf配置
+
+- 文件名格式:Sort任务名+".conf"。
+- 可以从ClassPath的SortClusterConfig.conf源文件读取,但不支持实时更新。
+- 可以从Inlong Manager的HTTP接口获取配置,支持实时更新。
+
+### sid_hive_inlong6th_v3.conf配置参数
+
+| 参数名 | 是否必须 |类型 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|sortClusterName | Y |String | NA | 用来唯一标识一个inlong-sort-standalone集群 |
+|sortTaskId | Y | String |NA | Sort任务名 |
+|cacheZones | Y | JsonObject<String, JsonObject> |NA | 缓存层集群列表,格式:Map<cacheClusterName, CacheCluster> |
+
+### CacheCluster配置参数
+
+| 参数名 | 是否必须 |类型 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|zoneName | Y |String | NA | 缓存层集群名 |
+|zoneType | Y | String |NA | 缓存类型:[pulsar,tube,kafka] |
+|serviceUrl | Y | String |NA | Pulsar的serviceUrl参数,或者Kafka的Broker列表 |
+|authentication | Y | String |NA | Pulsar鉴权 |
+|cacheZoneProperties | N | Map<String,String> |NA | 缓存层Consumer参数 |
+|topics | N | List<Topic> |NA | 缓存层消费的Topic列表 |
+
+### Topic配置参数
+
+| 参数名 | 是否必须 |类型 | 默认值 |描述 |
+| ------------ | ------------ | ------------ | ------------ | ------------ |
+|topic | Y |String | NA | Topic完整名,Pulsar:tenant/namespace/topic |
+|partitionCnt | Y | Integer |NA | Topic分区数 |
+|topicProperties | N | Map<String,String> |NA | 缓存层Topic的Consumer参数 |
+
+## 启动inlong-sort-standalone应用
+最后,执行脚本"./bin/sort-start.sh",启动sort-standalone应用,之后可以检查日志文件sort.log,确认启动情况。
+
+