You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by do...@apache.org on 2022/06/30 06:20:21 UTC

[inlong-website] branch master updated: [INLONG-4741][Audit] AuditStore support ClickHouse sink (#469)

This is an automated email from the ASF dual-hosted git repository.

dockerzhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/inlong-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 1e937c1ec [INLONG-4741][Audit] AuditStore support ClickHouse sink (#469)
1e937c1ec is described below

commit 1e937c1ec2a0de05bfc4f06dee1628c1271d839f
Author: 卢春亮 <94...@qq.com>
AuthorDate: Thu Jun 30 14:20:16 2022 +0800

    [INLONG-4741][Audit] AuditStore support ClickHouse sink (#469)
---
 docs/modules/audit/overview.md                     | 18 +++++++++--
 docs/modules/audit/quick_start.md                  | 35 ++++++++++++++++++++--
 .../current/modules/audit/overview.md              | 18 +++++++++--
 .../current/modules/audit/quick_start.md           | 33 ++++++++++++++++++--
 4 files changed, 93 insertions(+), 11 deletions(-)

diff --git a/docs/modules/audit/overview.md b/docs/modules/audit/overview.md
index 56c39f7de..7c3601688 100644
--- a/docs/modules/audit/overview.md
+++ b/docs/modules/audit/overview.md
@@ -13,8 +13,8 @@ The transmission status of each module, and whether the data stream is lost or r
 ![](img/audit_architecture.png)
 1. The audit SDK is nested in the service that needs to be audited, audits the service, and sends the audit result to the audit access layer
 2. The audit proxy writes audit data to MQ (Pulsar or TubeMQ)
-3. The distribution service consumes the audit data of MQ, and writes the audit data to MySQL and Elasticsearch
-4. The interface layer encapsulates the data of MySQL and Elasticsearch
+3. The distribution service consumes the audit data of MQ, and writes the audit data to MySQL, Elasticsearch and ClickHouse.
+4. The interface layer encapsulates the data of MySQL, Elasticsearch and ClickHouse.
 5. Application scenarios mainly include report display, audit reconciliation, etc.
 
 ## Audit Dimension
@@ -196,10 +196,22 @@ MySQL distribution supports distribution to different MySQL instances according
   1. When the audit scale of the business is relatively small, less than ten million per day, you can consider using MySQL as the audit storage. Because the deployment of MySQL is much simpler than that of Elasticsearch, the resource cost will be much less.
   2. If the scale of audit data is large and MySQL cannot support it, you can consider using Elasticsearch as storage. After all, a single Elasticsearch cluster can support tens of billions of audit data and horizontal expansion.
   
+## ClickHouse Distribution Implementation
+### Target
+***1. High real-time performance (minute level)***  
+***2. Simple to deploy***  
+***3. Can be deduplicated***  
+
+### Main Logic Diagram
+ClickHouse distribution supports distribution to different ClickHouse instances according to the audit ID, and supports horizontal expansion.
+
+### Usage introduction
+  1. When the audit scale of the business is huge and you want to use SQL to access audit data, you can consider using ClickHouse as the audit storage. Because ClickHouse support SQL accessing, and support tens of billions of audit data and horizontal expansion.
+  
 ## Audit Usage Interface Design
 ### Main Logic Diagram
 ![](img/audit_api.png)
-The audit interface layer uses SQL to check MySQL or restful to check Elasticsearch. How to check which type of storage the interface uses depends on which type of storage is used.
+The audit interface layer uses SQL to check MySQL/ClickHouse or restful to check Elasticsearch. How to check which type of storage the interface uses depends on which type of storage is used.
 
 ### UI Interface Display
 ### Main Logic Diagram
diff --git a/docs/modules/audit/quick_start.md b/docs/modules/audit/quick_start.md
index bb1a191b7..fbd4103d4 100644
--- a/docs/modules/audit/quick_start.md
+++ b/docs/modules/audit/quick_start.md
@@ -7,7 +7,12 @@ All deploying files at `inlong-audit` directory, if you use MySQL to store audit
   # initialize database
   mysql -uDB_USER -pDB_PASSWD < sql/apache_inlong_audit.sql
   ```
-
+If you use ClickHouse to store audit data, you need to first create the database through `sql/apache_inlong_audit_clickhouse.sql`.
+  ```shell
+  # initialize database
+  clickhouse client -u DB_USER --password DB_PASSWD < sql/apache_inlong_audit_clickhouse.sql
+  ```
+  
 ## Audit Proxy
 ### Configure MessageQueue
 You can choose Apache Pulsar or InLong TubeMQ as your MessageQueue service:
@@ -46,7 +51,7 @@ The configuration file  is `conf/application.properties`.
 # proxy.type: pulsar / tube
 audit.config.proxy.type=pulsar
 
-# store.server: mysql / elasticsearch 
+# store.server: mysql / elasticsearch / clickhouse 
 audit.config.store.mode=mysql
 
 # audit pulsar config (optional), replace PULSAR_BROKER_LIST with your Pulsar service url
@@ -59,10 +64,34 @@ audit.tube.masterlist=TUBE_LIST
 audit.tube.topic=inlong-audit
 audit.tube.consumer.group.name=inlong-audit-consumer
 
-# mysql
+# mysql config
 spring.datasource.druid.url=jdbc:mysql://127.0.0.1:3306/apache_inlong_audit?characterEncoding=utf8&useSSL=false&serverTimezone=GMT%2b8&rewriteBatchedStatements=true&allowMultiQueries=true&zeroDateTimeBehavior=CONVERT_TO_NULL
 spring.datasource.druid.username=root
 spring.datasource.druid.password=inlong
+
+# es config
+elasticsearch.host=127.0.0.1
+elasticsearch.port=9200
+elasticsearch.authEnable=false
+elasticsearch.username=elastic
+elasticsearch.password=inlong
+elasticsearch.shardsNum=5
+elasticsearch.replicaNum=1
+elasticsearch.indexDeleteDay=5
+elasticsearch.enableDocId=true
+elasticsearch.bulkInterval=10
+elasticsearch.bulkThreshold=5000
+elasticsearch.auditIdSet=1,2
+
+# clickhouse config
+clickhouse.driver=ru.yandex.clickhouse.ClickHouseDriver
+clickhouse.url=jdbc:clickhouse://127.0.0.1:8123/default
+clickhouse.username=default
+clickhouse.password=default
+clickhouse.batchIntervalMs=1000
+clickhouse.batchThreshold=500
+clickhouse.processIntervalMs=100
+
 ```
 
 ### Dependencies
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/audit/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/audit/overview.md
index 51a5181ea..3863f4dbb 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/audit/overview.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/audit/overview.md
@@ -13,8 +13,8 @@ InLong审计是独立于InLong的一个子系统,对InLong系统的Agent、Dat
 ![](img/audit_architecture.png)
 1. 审计SDK嵌套在需要审计的服务,对服务进行审计,将审计结果发送到审计接入层。
 2. 审计接入层将审计数据写到MQ(Pulsar或者TubeMQ)。
-3. 分发服务消费MQ的审计数据,将审计数据写到MySQL、Elasticsearch。
-4. 接口层将MySQL、Elasticsearch的数据进行封装。
+3. 分发服务消费MQ的审计数据,将审计数据写到MySQL、Elasticsearch、ClickHouse。
+4. 接口层将MySQL、Elasticsearch、ClickHouse的数据进行封装。
 5. 应用场景主要包括报表展示、审计对账等等。
 
 ## 审计维度
@@ -198,10 +198,22 @@ MySQL分发支持根据审计ID分发到不同的MySQL实例,支持水平扩
   1.当业务的审计规模比较小,小于千万级/天时,就可以考虑采用MySQL作为审计的存储。因为MySQL的部署相对Elasticsearch要简单的多, 资源成本也会少很多。   
   2.如果审计数据规模很大,MySQL支撑不了时,就可以考虑采用Elasticsearch作为存储,毕竟单个Elasticsearch集群能够支持百亿级别的审计数据,也支持水平扩容。
   
+## ClickHouse分发实现
+### 目标
+***1.高实时性(分钟级)***   
+***2.部署简单***  
+***3.可去重***
+
+### 主要逻辑图
+ClickHouse分发支持根据审计ID分发到不同的ClickHouse实例,支持水平扩展。
+
+### 使用介绍
+  1.ClickHouse集群支持百亿级审计数据,也支持水平扩容,同时支持SQL方式访问审计数据,资源成本和ElasticSearch差不多。
+  
 ## 审计使用接口设计
 ### 主要逻辑图
 ![](img/audit_api.png)
-审计接口层通过SQL查MySQL或者restful查Elasticsearch。接口具体怎么查哪一种存储,取决使用了哪一种存储
+审计接口层通过SQL查MySQL/ClickHouse或者restful查Elasticsearch。接口具体怎么查哪一种存储,取决使用了哪一种存储
 
 ### UI 界面展示
 ### 主要逻辑图
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/audit/quick_start.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/audit/quick_start.md
index 9775db3f5..6ef1ee9e7 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/audit/quick_start.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/audit/quick_start.md
@@ -7,7 +7,12 @@ title: 安装部署
   # 初始化 database
   mysql -uDB_USER -pDB_PASSWD < sql/apache_inlong_audit.sql
   ```
-
+如果使用 ClickHouse 存储审计数据,需要先通过`sql/apache_inlong_audit_clickhouse.sql`初始化数据库。
+  ```shell
+  # 初始化 database
+  clickhouse client -u DB_USER --password DB_PASSWD < sql/apache_inlong_audit_clickhouse.sql
+  ```
+  
 ## 依赖
 - 如果后端连接 MySQL 数据库,请下载 [mysql-connector-java-8.0.27.jar](https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.27/mysql-connector-java-8.0.27.jar), 并将其放入 `lib/` 目录。
 - 如果后端连接 PostgreSQL 数据库,不需要引入额外依赖。
@@ -62,10 +67,34 @@ audit.tube.masterlist=TUBE_LIST
 audit.tube.topic=inlong-audit
 audit.tube.consumer.group.name=inlong-audit-consumer
 
-# mysql
+# mysql config
 spring.datasource.druid.url=jdbc:mysql://127.0.0.1:3306/apache_inlong_audit?characterEncoding=utf8&useSSL=false&serverTimezone=GMT%2b8&rewriteBatchedStatements=true&allowMultiQueries=true&zeroDateTimeBehavior=CONVERT_TO_NULL
 spring.datasource.druid.username=root
 spring.datasource.druid.password=inlong
+
+# es config
+elasticsearch.host=127.0.0.1
+elasticsearch.port=9200
+elasticsearch.authEnable=false
+elasticsearch.username=elastic
+elasticsearch.password=inlong
+elasticsearch.shardsNum=5
+elasticsearch.replicaNum=1
+elasticsearch.indexDeleteDay=5
+elasticsearch.enableDocId=true
+elasticsearch.bulkInterval=10
+elasticsearch.bulkThreshold=5000
+elasticsearch.auditIdSet=1,2
+
+# clickhouse config
+clickhouse.driver=ru.yandex.clickhouse.ClickHouseDriver
+clickhouse.url=jdbc:clickhouse://127.0.0.1:8123/default
+clickhouse.username=default
+clickhouse.password=default
+clickhouse.batchIntervalMs=1000
+clickhouse.batchThreshold=500
+clickhouse.processIntervalMs=100
+
 ```
 
 ### 依赖