You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by do...@apache.org on 2023/01/14 07:27:18 UTC

[inlong-website] branch master updated: [INLONG-664][Release] Add blog for the 1.5.0 release (#677)

This is an automated email from the ASF dual-hosted git repository.

dockerzhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/inlong-website.git


The following commit(s) were added to refs/heads/master by this push:
     new a87daa0e3f [INLONG-664][Release] Add blog for the 1.5.0 release (#677)
a87daa0e3f is described below

commit a87daa0e3f9b6707d0202a20e22aafe5f5cc6168
Author: Charles Zhang <do...@apache.org>
AuthorDate: Sat Jan 14 15:27:13 2023 +0800

    [INLONG-664][Release] Add blog for the 1.5.0 release (#677)
---
 blog/2023-01-13-release-1.5.0.md                   |  83 +++++++++++++++++++++
 blog/img/1.5.0-create-dashboard-stream.png         | Bin 0 -> 42399 bytes
 blog/img/1.5.0-create-hudi-source.png              | Bin 0 -> 68613 bytes
 blog/img/1.5.0-dirty-data.png                      | Bin 0 -> 100979 bytes
 blog/img/1.5.0-mq-handler.png                      | Bin 0 -> 40676 bytes
 blog/img/1.5.0-support-kafka.png                   | Bin 0 -> 36854 bytes
 .../2022-11-16-release-1.4.0.md                    |   2 +-
 .../2023-01-13-release-1.5.0.md                    |  83 +++++++++++++++++++++
 .../img/1.5.0-create-dashboard-stream.png          | Bin 0 -> 77996 bytes
 .../img/1.5.0-create-hudi-source.png               | Bin 0 -> 67001 bytes
 .../img/1.5.0-dirty-data.png                       | Bin 0 -> 100979 bytes
 .../img/1.5.0-mq-handler.png                       | Bin 0 -> 40676 bytes
 .../img/1.5.0-support-kafka.png                    | Bin 0 -> 67760 bytes
 13 files changed, 167 insertions(+), 1 deletion(-)

diff --git a/blog/2023-01-13-release-1.5.0.md b/blog/2023-01-13-release-1.5.0.md
new file mode 100644
index 0000000000..547ab4010f
--- /dev/null
+++ b/blog/2023-01-13-release-1.5.0.md
@@ -0,0 +1,83 @@
+---
+title: Release 1.5.0
+author: Charles Zhang
+author_url: https://github.com/dockerzhang
+author_image_url: https://avatars.githubusercontent.com/u/18047329?v=4
+tags: [Apache InLong, Version]
+---
+
+Apache InLong recently released version 1.5.0, which closed about 296+ issues, including 12+ major features and 110+ optimizations. Mainly completed the addition of StarRocks, Hudi, Doris, Elasticsearch, and other sinks, optimization of the Dashboard experience, refactor the MQ management model, support dirty data processing, full-link Apache Kafka support, and TubeMQ C++/Python SDK support for production, etc.
+<!--truncate-->
+
+## Abort Apache InLong
+As the industry's first one-stop open-source massive data integration framework, Apache InLong provides automatic, safe, reliable, and high-performance data transmission capabilities to facilitate businesses to build stream-based data analysis, modeling, and applications quickly. At present, InLong is widely used in various industries such as advertising, payment, social networking, games, artificial intelligence, etc., serving thousands of businesses, among which the scale of high-perfo [...]
+
+The core keywords of InLong project positioning are "one-stop" and actual "massive data". For "one-stop", we hope to shield technical details, provide complete data integration and support services, and realize out-of-the-box; With its advantages, such as multi-cluster management, it can stably support larger-scale data volumes on the basis of trillions/day.
+
+## 1.5.0 Overview
+Apache InLong recently released version 1.5.0, which closed about 296+ issues, including 12+ major features and 110+ optimizations. Mainly completed the addition of StarRocks, Hudi, Doris, Elasticsearch, and other data stream sinks, optimization of the Dashboard experience, reconstruction of the MQ management model, addition of dirty data processing, full-link Apache Kafka support, and TubeMQ C++/Python SDK support for production, etc. This version has also completed a large number of ot [...]
+
+### Agent Module
+- Support log collection in CVM scenarios
+- Added direct sending Pulsar, sending DataProxy synchronous and asynchronous strategies
+
+### DataProxy Module
+- Refactor the MQ management model to support the rapid expansion of new MQ types
+- Optimized caching layer to support Apache Kafka message queue
+- Added support for BufferQueueChannel
+
+### TubeMQ Module
+- Increase data sending and receiving delay statistics
+- TubeMQ C++ SDK supports the produce
+- TubeMQ Python SDK supports the produce
+
+### Manager Module
+- Added Hudi data node and data stream management
+- Added StarRocks data node and data stream management
+- Optimize Elasticsearch data node and data stream management
+- Added data conversion management in Manager Client
+- Optimize Apache Kafka message queue management
+
+### Sort Module
+- The MySQL Load node inventory phase supports concurrent reading of tables without primary keys
+- Added StarRocks, Hudi, Doris, Elasticsearch 5.x data flow support
+- Add dirty data processing for Doris, PostgreSQL, Hive, HBase, Elasticsearch, etc.
+- Upgraded Iceberg to version 1.1.0
+- StarRocks, PostgreSQL, Doris, Hudi and other flows support table-level indicators
+
+### Dashboard Module
+- Experience optimization with more than 50 optimization points
+- Add JSON, Key-Value, and AVRO formats
+- Support ClickHouse, Iceberg, Elasticsearch, MySQL, and other data node management pages
+- Added SQLServer, Oracle, MongoDB, and MQTT data source pages
+
+### Other
+- Add Spotless code formatting plugin and response pipeline
+- Docker-compose comes with Apache Flink environment
+- Added Grafana indicator display templates for Agent and DataProxy
+
+## 1.5.0 Feature Introduction
+### Support StarRocks, Hudi, Doris, Elasticsearch Sinks
+In version 1.5.0, InLong expanded the new data node Connector, supported StarRocks, Hudi, Doris, Elasticsearch, and other flow directions for community user scenarios, and expanded the data warehouse and lake scenarios. These new data nodes are mainly contributed by @liaorui, @featzhang, @kuansix, @LvJiancheng, and other developers.
+![1.5.0-create-hudi-source](./img/1.5.0-create-hudi-source.png)
+
+### Optimization of the Dashboard Experience
+Compared with traditional data integration projects, InLong has added concepts such as Group, Stream, and data nodes. Community users using Dashboard for the first time will be confused about the whole process. To reduce the cost of using Dashboard users, InLong has made a lot of optimizations for the Dashboard front-end page, with more than 50 optimization points, and adjusted the concept, process, and display. The figure below shows the process of creating a Stream in 1.5.0, which is m [...]
+![1.5.0-create-dashboard-stream](./img/1.5.0-create-dashboard-stream.png)
+
+### Refactor the MQ Management Model
+To quickly support new message queue services (such as RocketMQ) to implement plug-ins, and unify the existing support for Pulsar, Kafka, and TubeMQ, in version 1.5.0, InLong DataProxy refactored the MQ management model, and all MQ types are based on `MessageQueueHandler ` Implement the corresponding `Handler`. Thanks to @woofyzhao and @luchunliang for the implementation of this feature. If you need to develop a new MQ type, you can refer to the DataProxy plug-in guide.
+![1.5.0-mq-handler](./img/1.5.0-mq-handler.png)
+
+### Support Dirty Data Processing
+If there is dirty data that does not meet the data specifications (such as field range exceeding, missing data fields, etc.) when entering the lake into the warehouse, it may cause the user task to fail to write and restart continuously. In version 1.5.0, InLong supports storing unrecoverable dirty data in external storage, including S3 and local logs. At the same time, users can customize the output port of dirty data and can configure "whether to enable dirty data archiving" and "wheth [...]
+![1.5.0-dirty-data](./img/1.5.0-dirty-data.png)
+
+### Support Apache Kafka Full-link
+In version 1.5.0, the DataProxy, Manager, Sort, and Dashboard modules have completed the full-link support for Apache Kafka. The support for Kafka has gone through two versions, and it is available for production in 1.5.0. When users create data streams Just choose Kafka. The implementation of this feature is thanks to @woofyzhao, @fuweng11, @haifxu for their support.
+![1.5.0-support-kafka](./img/1.5.0-support-kafka.png)
+
+For more details on the 1.5.0 release, please refer to the release notes, which detail the features, enhancements, and bug fixes for this release.
+
+### Follow-up planning
+In the following versions, Apache InLong will add multi-tenant management, standardize data flow, resources, and permissions of projects, clusters, and users, and optimize the performance and stability of various data sources, Agent management, etc., expect more developers to participate and contribute.
\ No newline at end of file
diff --git a/blog/img/1.5.0-create-dashboard-stream.png b/blog/img/1.5.0-create-dashboard-stream.png
new file mode 100644
index 0000000000..3bf8359384
Binary files /dev/null and b/blog/img/1.5.0-create-dashboard-stream.png differ
diff --git a/blog/img/1.5.0-create-hudi-source.png b/blog/img/1.5.0-create-hudi-source.png
new file mode 100644
index 0000000000..8e2741ca65
Binary files /dev/null and b/blog/img/1.5.0-create-hudi-source.png differ
diff --git a/blog/img/1.5.0-dirty-data.png b/blog/img/1.5.0-dirty-data.png
new file mode 100644
index 0000000000..c9af1adf80
Binary files /dev/null and b/blog/img/1.5.0-dirty-data.png differ
diff --git a/blog/img/1.5.0-mq-handler.png b/blog/img/1.5.0-mq-handler.png
new file mode 100644
index 0000000000..da0ed64033
Binary files /dev/null and b/blog/img/1.5.0-mq-handler.png differ
diff --git a/blog/img/1.5.0-support-kafka.png b/blog/img/1.5.0-support-kafka.png
new file mode 100644
index 0000000000..9e5ad7e205
Binary files /dev/null and b/blog/img/1.5.0-support-kafka.png differ
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/2022-11-16-release-1.4.0.md b/i18n/zh-CN/docusaurus-plugin-content-blog/2022-11-16-release-1.4.0.md
index 10cdc757d8..c749da0a85 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-blog/2022-11-16-release-1.4.0.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/2022-11-16-release-1.4.0.md
@@ -9,7 +9,7 @@ tags: [Apache InLong, Version]
 Apache InLong(应龙)是一个一站式海量数据集成框架,提供自动、安全、可靠和高性能的数据传输能力,方便业务构建基于流式的数据分析、建模和应用。 InLong 支持大数据领域的采集、汇聚、缓存和分拣功能,用户只需要简单的配置就可以把数据从数据源导入到实时计算引擎或者落地到离线存储。
 <!--truncate-->
 
-## 1.4.0 版本纵览
+## 1.4.0 版本总览
 Apache InLong 最近发布了 1.4.0 版本,该版本关闭了约 364+ 个issue,包含 16+ 个特性和 120+ 个优化。主要完成了整库实时同步至 Apache Doris、整库实时同步至 Apache Iceberg、标准架构支持 HTTP 上报、标准架构新增 MongoDB 等多种采集节点。该版本还完成了大量其它特性,主要包括:
 
 ### Agent 模块
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/2023-01-13-release-1.5.0.md b/i18n/zh-CN/docusaurus-plugin-content-blog/2023-01-13-release-1.5.0.md
new file mode 100644
index 0000000000..183aa42eb8
--- /dev/null
+++ b/i18n/zh-CN/docusaurus-plugin-content-blog/2023-01-13-release-1.5.0.md
@@ -0,0 +1,83 @@
+---
+title: 1.5.0 版本发布
+author: Charles Zhang
+author_url: https://github.com/dockerzhang
+author_image_url: https://avatars.githubusercontent.com/u/18047329?v=4
+tags: [Apache InLong, Version]
+---
+
+Apache InLong(应龙)最近发布了 1.5.0 版本,该版本关闭了约 296+ 个issue,包含 12+ 个大特性和 110+ 个优化。主要完成了新增 StarRocks、Hudi、Doris、Elasticsearch 等流向、优化 Dashboard 体验、重构 MQ 管理模型、新增脏数据处理、全链路 Apache Kafka 支持、TubeMQ C++/Python SDK 支持生产等。
+<!--truncate-->
+
+## 关于 Apache InLong
+作为业界首个一站式开源海量数据集成框架,Apache InLong 提供了自动、安全、可靠和高性能的数据传输能力,方便业务快速构建基于流式的数据分析、建模和应用。目前 InLong 正广泛应用于广告、支付、社交、游戏、人工智能等各个行业领域,服务上千个业务,其中高性能场景数据规模超百万亿/天,高可靠场景数据规模超十万亿/天。
+
+InLong 项目定位的核心关键词是“一站式”和真正“海量数据”。对于“一站式”,我们希望屏蔽技术细节、提供完整数据集成及配套服务,实现开箱即用;对于“海量数据”,我们希望通过架构上的数据链路分层、全组件可扩展、自带多集群管理等优势,在百万亿/天的基础上,稳定支持更大规模的数据量。
+
+## 1.5.0 版本总览
+Apache InLong 最近发布了 1.5.0 版本,该版本关闭了约 296+ 个issue,包含 12+ 个大特性和 110+ 个优化。主要完成了新增 StarRocks、Hudi、Doris、Elasticsearch 等流向、优化 Dashboard 体验、重构 MQ 管理模型、新增脏数据处理、全链路 Apache Kafka 支持、TubeMQ C++/Python SDK 支持生产等。该版本还完成了大量其它特性,主要包括:
+
+### Agent 模块
+- 支持 CVM 场景下的日志采集
+- 新增直发Pulsar、发送 DataProxy 同步异步策略
+
+### DataProxy 模块
+- 重构 MQ 管理模型,支持快速扩展新的 MQ 类型
+- 优化缓存层支持 Apache Kafka 消息队列
+- 新增支持 BufferQueueChannel
+
+### TubeMQ 模块
+- 增加数据发送和接收延迟统计
+- TubeMQ C++ SDK 支持生产
+- TubeMQ Python SDK 支持生产
+
+### Manager 模块
+- 新增 Hudi 数据节点和流向管理
+- 新增 StarRocks 数据节点和流向管理
+- 优化 Elasticsearch 数据节点和流向管理
+- Manager Client 新增数据转换管理
+- 优化  Apache Kafka 消息队列管理
+
+### Sort 模块
+- MySQL Load 节点存量阶段支持对无主键的表的并发读取
+- 新增 StarRocks、Hudi、Doris、Elasticsearch 5.x 数据流向支持
+- 为 Doris、PostgreSQL、Hive、HBase、Elasticsearch 等流向增加脏数据处理
+- 升级 Iceberg 到 1.1.0 版本
+- StarRocks、PostgreSQL、Doris、Hudi 等流向支持表级别指标
+
+### Dashboard 模块
+- 体验优化超 50 个优化点
+- 增加 JSON、Key-Value、AVRO 格式
+- 支持 ClickHouse 、Iceberg、Elasticsearch、MySQL 等数据节点管理页面
+- 新增 SQLServer 、Oracle 、MongoDB、MQTT 数据源页面
+
+### 其它
+- 增加 Spotless 代码格式化插件及响应流水线
+- Docker-compose 自带 Apache Flink 环境
+- 增加 Agent、DataProxy 的 Grafana 指标显示模板
+
+## 1.5.0 版本特性介绍
+### 新增 StarRocks、Hudi、Doris、Elasticsearch 等流向
+在 1.5.0 版本中,InLong 持续扩展新的数据节点 Connector,针对社区用户使用场景,新增 StarRocks、Hudi、Doris、Elasticsearch 等流向的支持,拓展了数据入仓入湖场景。这些新增数据节点主要由 @liaorui、@featzhang、@kuansix、@LvJiancheng 等开发者贡献。
+![1.5.0-create-hudi-source](./img/1.5.0-create-hudi-source.png)
+
+### 优化 Dashboard 体验
+相比于传统的数据集成项目,InLong 新增了 Group、Stream 、数据节点等概念,初次使用 Dashboard 创建的社区用户会对整个流程有些困惑。为了降低 Dashboard 用户的使用成本,InLong 针对 Dashboard 前端页面进行了大量的优化,优化点超过 50 个,在概念、流程、展示上面进行了调整。下图为 1.5.0 中创建 Stream 的流程,相比较之前版本更加简化。Dashboard 的优化特别感谢 @leezng、@bluewang、@kinfuy,也感谢 @Charles Zhang 提供的修改建议。
+![1.5.0-create-dashboard-stream](./img/1.5.0-create-dashboard-stream.png)
+
+### 重构 MQ 管理模型
+为了快速支持新的消息队列服务(比如 RocketMQ)实现插件化,同时统一现有支持 Pulsar、Kafka、TubeMQ,在 1.5.0 版本中,InLong DataProxy 重构了 MQ 管理模型,所有 MQ 类型都基于 `MessageQueueHandler` 实现对应的 `Handler`。该特性的实现感谢 @woofyzhao、@luchunliang,如果需要开发新的 MQ 类型,可以参考 DataProxy 插件指引。
+![1.5.0-mq-handler](./img/1.5.0-mq-handler.png)
+
+### 新增脏数据处理
+如果入湖入仓时存在不符合数据规范的脏数据(例如字段范围超限、数据字段缺失等 ),可能会导致用户任务写入失败并不断重启。在 1.5.0 版本中,InLong 支持将不能恢复的脏数据到外部存储,包括 S3 和本地日志,同时用户可以自定义脏数据的输出端,可以配置 “是否开启脏数据归档” 与 “是否忽略写入错误”,如下为脏数据归档设计 UML 图。该特性的实现感谢 @yunqingmoswu、@Yizhou-Yang 的支持。
+![1.5.0-dirty-data](./img/1.5.0-dirty-data.png)
+
+### 全链路 Apache Kafka 支持
+在 1.5.0 版本中,完成了 DataProxy、Manager、Sort、Dashboard 模块全链路对 Apache Kafka 的支持,对于 kafka 的支持经历了两个版本,在 1.5.0 实现了生产可用,用户创建数据流时选择 Kafka 即可。该特性的实现感谢 @woofyzhao、@fuweng11、@haifxu 的支持。
+![1.5.0-support-kafka](./img/1.5.0-support-kafka.png)
+
+更多 1.5.0 版本的细节请参考 版本说明 ,其中详细列出了此版本的特性、提升和 Bug 修复。
+
+### 后续规划
+在后续版本中,Apache  InLong 会增加多租户管理,规范数据流、项目、集群和用户的资源和权限,同时对多种数据源进行性能和稳定性优化、Agent 管理等,期待更多开发者参与贡献。
\ No newline at end of file
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-create-dashboard-stream.png b/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-create-dashboard-stream.png
new file mode 100644
index 0000000000..9c1c61599c
Binary files /dev/null and b/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-create-dashboard-stream.png differ
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-create-hudi-source.png b/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-create-hudi-source.png
new file mode 100644
index 0000000000..b1aea9a11c
Binary files /dev/null and b/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-create-hudi-source.png differ
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-dirty-data.png b/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-dirty-data.png
new file mode 100644
index 0000000000..c9af1adf80
Binary files /dev/null and b/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-dirty-data.png differ
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-mq-handler.png b/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-mq-handler.png
new file mode 100644
index 0000000000..da0ed64033
Binary files /dev/null and b/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-mq-handler.png differ
diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-support-kafka.png b/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-support-kafka.png
new file mode 100644
index 0000000000..d7fea031fc
Binary files /dev/null and b/i18n/zh-CN/docusaurus-plugin-content-blog/img/1.5.0-support-kafka.png differ