You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by zh...@apache.org on 2022/03/21 12:00:28 UTC

[dolphinscheduler-website] branch master updated: add news (#745)

This is an automated email from the ASF dual-hosted git repository.

zhongjiajie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 25a8edf  add news (#745)
25a8edf is described below

commit 25a8edff41a55a5999a8c5ce5cbd07bc1aa37a70
Author: lifeng <53...@users.noreply.github.com>
AuthorDate: Mon Mar 21 20:00:23 2022 +0800

    add news (#745)
    
    * addd nesw
    
    news Fully Embracing K8s, Cisco Hangzhou Seeks to Support K8s Tasks Based on ApacheDolphinScheduler
    
    * Update 3.png
    
    * remove blank space
    
    Co-authored-by: Jiajie Zhong <zh...@hotmail.com>
---
 blog/en-us/K8s_Cisco_Hangzhou.md | 176 +++++++++++++++++++++++++++++++++++++++
 blog/zh-cn/K8s_Cisco_Hangzhou.md | 116 ++++++++++++++++++++++++++
 img/2022-03-21/1.png             | Bin 0 -> 148939 bytes
 img/2022-03-21/10.jpeg           | Bin 0 -> 253045 bytes
 img/2022-03-21/2.png             | Bin 0 -> 195672 bytes
 img/2022-03-21/3.png             | Bin 0 -> 106293 bytes
 img/2022-03-21/4.png             | Bin 0 -> 134606 bytes
 img/2022-03-21/5.png             | Bin 0 -> 94764 bytes
 img/2022-03-21/6.png             | Bin 0 -> 204169 bytes
 img/2022-03-21/7.png             | Bin 0 -> 90526 bytes
 img/2022-03-21/8.png             | Bin 0 -> 243171 bytes
 img/2022-03-21/9.png             | Bin 0 -> 243171 bytes
 site_config/blog.js              |  16 ++++
 site_config/home.jsx             |  28 +++----
 14 files changed, 322 insertions(+), 14 deletions(-)

diff --git a/blog/en-us/K8s_Cisco_Hangzhou.md b/blog/en-us/K8s_Cisco_Hangzhou.md
new file mode 100644
index 0000000..64f9ba6
--- /dev/null
+++ b/blog/en-us/K8s_Cisco_Hangzhou.md
@@ -0,0 +1,176 @@
+---
+title:Fully Embracing K8s, Cisco Hangzhou Seeks to Support K8s Tasks Based on ApacheDolphinScheduler
+keywords: Apache,DolphinScheduler,scheduler,big data,ETL,airflow,hadoop,orchestration,dataops,K8s
+description: K8s is the future of the cloud and is the only infrastructure platform that
+---
+# Fully Embracing K8s, Cisco Hangzhou Seeks to Support K8s Tasks Based on ApacheDolphinScheduler
+
+<div align=center>
+
+<img src="/img/2022-03-21/1.png"/>
+
+</div>
+
+K8s is the future of the cloud and is the only infrastructure platform that connects public and private clouds, making it the choice of more enterprises to modernize their IT.
+
+ 
+
+Based on Apache DolphinScheduler, Cisco Hangzhou is also exploring K8s support, and some of the features are already running successfully. Recently, Qian Li, a big data engineer from Cisco Hangzhou, will share their results with us.
+
+<div align=center>
+
+<img src="/img/2022-03-21/2.png"/>
+
+</div>
+
+
+Qian Li
+
+ 
+
+Big Data Engineer from Cisco Hangzhou, with years of experience in Big Data solutions, Spark, Flink, scheduling system, ETL, and other projects.
+
+ 
+
+My presentation will be related to these parts: Namespace management, K8s tasks running continuously, workflow scheduling of K8s tasks, and our future planning. 
+
+## 01 Namespace Management
+
+### Resource Management
+
+ 
+
+In the first part, I will first introduce resource management. The purpose of introducing resource management is to use K8s clusters to run tasks that are not part of the Apache DolphinScheduler concept of scheduling, such as Namespace, which are more akin to a data solution that limits resources in a queue if the CPU has limited memory, and enabling some resource isolation.
+
+ 
+
+In the future, we may merge some of the resource management functionality onto Apache DolphinScheduler.
+
+### Adding, deleting, and maintaining management
+
+We can add some Type, i.e. the type of tag, e.g. some Namespace only allows certain types of jobs to be run. We can count the number of jobs under the Namespace, the number of pods, the number of resources requested, requests, etc. to see the resource usage of the queue, and the interface is only available to the administrator by default.
+
+<div align=center>
+
+<img src="/img/2022-03-21/3.png"/>
+
+</div>
+
+### Multiple K8s clusters
+
+K8s supports multiple clusters, we can connect to multiple K8s clusters via the Apache DolphinScheduler client, batch, PROD, etc to build multiple sets of  K8s clusters and support multiple K8s clusters via Namespace.
+
+ 
+
+We can edit the developed clusters and modify all the properties such as memory.
+
+ 
+
+In the new version, user permissions are set in user master, authorizing a user to submit tasks to a Namespace and edit resources.
+
+## 02 K8s tasks running continuously
+
+The second section is about the types of tasks we currently support.
+
+### Mirrors that are started without exiting, such as ETL
+
+ETL, for example, is a task that must be done manually before it will exit after being committed. Once a task like this is committed, it sinks the data and theoretically never stops as long as no upgrades.
+
+<div align=center>
+
+<img src="/img/2022-03-21/4.png"/>
+
+</div>
+
+This kind of task may not actually be used for scheduling, as it only has two states, start and stop. So we put it in a live list and made a set of monitors. The POD runs in real-time, interacting mainly through a Fabris operator, and can be dynamically scaled to improve resource utilization.
+
+### Flink tasks
+
+We can manage the CPU down to 0.01%, making full use of the K8s virtual CPU.
+
+<div align=center>
+
+<img src="/img/2022-03-21/5.png"/>
+
+</div>
+
+<div align=center>
+
+<img src="/img/2022-03-21/6.png"/>
+
+</div>
+
+<div align=center>
+
+<img src="/img/2022-03-21/7.png"/>
+
+</div>
+
+We also use Flink Tasks, an ETL-based extension that includes an interface for editing, viewing status, going online, going offline, deleting, execution history, and monitoring. We designed the Flink UI using a proxy model and developed permission controls to prevent external parties from modifying it.
+
+Flink starts by default based on a checkpoint, or can be created at a specified time, or submitted and started based on the last checkpoint.
+
+Flink tasks support multiple mirror versions, as K8s is inherently mirror-running, you can specify mirrors directly to choose a package, or via the file to submit the task.
+
+Also, Batch type tasks may be run once and finished or may be scheduled on a cycle basis and exit automatically after execution, which is not quite the same as Flink, so for this type of task, we would deal it based on Apache DolphinScheduler.
+
+## 03 Running K8s tasks
+
+### Workflow scheduling for K8s tasks
+
+We added some Flink batch and Spark batch tasks at the bottom and added some configurations such as the resources used, the namespace to be run, and so on. The image information can be started with some custom parameters, and when wrapped up it is equivalent to the plugin model, which is perfectly extended by Apache DolphinScheduler.
+
+<div align=center>
+
+<img src="/img/2022-03-21/8.png"/>
+
+</div>
+
+### Spark Tasks
+
+Under Spark tasks, you can view information such as CPU, files upload supports Spark Jar packages, or can be configured separately.
+
+<div align=center>
+
+<img src="/img/2022-03-21/9.png"/>
+
+</div>
+
+This multi-threaded upper layer can dramatically increase processing speed.
+
+## 04 Others and Delineation
+
+### Watch Status
+
+<div align=center>
+
+<img src="/img/2022-03-21/10.jpeg"/>
+
+</div>
+
+In addition to the above changes, we have also optimized the task run state.
+
+ 
+
+When a task is submitted, the runtime may fail and even the parallelism of the task may change based on certain policies in the real run state. In this case, we need to watch and fetch the task status in real-time and synchronize it with the Apache DolphinScheduler system to ensure that the status seen in the interface is always accurate.
+
+ 
+
+For batch, we can treat it with or without the watch, as it is not a standalone task that requires fully watch and the namespace resource usage is based on watch mode so that the state is always accurate.
+
+### Multiple environments
+
+Multi-environment means that the same task can be pushed to different K8s clusters, such as a Flink task.
+
+ 
+
+In terms of code, there are two ways to do watch, one is to put some pods separately, for example when using the K8s module, define information about multiple K8s clusters, create some watch pods on each cluster to watch the status of tasks in the cluster and do some proxy functions. Another option is to follow the API or a separate service and start a watch service to track all K8s clusters. However, this does not allow you to do proxying outside of the K8s internal and external networks.
+
+ 
+
+There are several options to watch Batch, one of them is by synchronization based on Apache DolphinScheduler, which is more compatible with the latter. We may submit a PR on this work in the future soon. Spark uses the same model, providing a number of pods to interact with, and the internal code we use is the Fabric K8s client.
+
+ 
+
+Going forward, we will be working with Apache DolphinScheduler to support the features discussed here and share more information about our progress. Thank you all!
+
diff --git a/blog/zh-cn/K8s_Cisco_Hangzhou.md b/blog/zh-cn/K8s_Cisco_Hangzhou.md
new file mode 100644
index 0000000..19ed29d
--- /dev/null
+++ b/blog/zh-cn/K8s_Cisco_Hangzhou.md
@@ -0,0 +1,116 @@
+# 全面拥抱 K8s,ApacheDolphinScheduler 应用与支持 K8s 任务的探索
+
+<div align=center>
+<img src="/img/2022-03-21/1.png"/>
+</div>
+
+>K8s 打通了主流公私云之间的壁垒,成为唯一连通公私云的基础架构平台。K8s 是未来云端的趋势,全面拥抱 K8s 成为更多企业推动 IT 现代化的选择。
+>>杭州思科基于 Apache DolphinScheduler,也在进行支持 K8s 的相关探索,且部分功能已经成功上线运行。今天,来自杭州思科的大数据工程师 李千,将为我们分享他们的开发成果。
+
+<div align=center>
+<img src="/img/2022-03-21/2.png"/>
+</div>
+
+李千,杭州思科 大数据工程师,多年大数据解决方案经验,有 Spark,Flink,以及调度系统,ETL 等方面的项目经验。
+
+正文:
+
+本次我的分享主要分为这几部分,Namespace 管理,持续运行的 K8s 任务,K8s 任务的工作流调度,以及未来的规划。
+
+## Namespace 管理
+
+### 资源管理
+
+第一部分中,我首先介绍一下资源管理。我们引入资源管理目的,是为了利用 K8s 集群运行不属于 Apache DolphinScheduler 所属的调度概念上的任务,比如 Namespace,更类似于一个数据解决方案,如果 CPU 的 memory 有限,就可以限制队列中的资源,实现一定的资源隔离。
+
+以后我们可能会把一部分资源管理功能合并到 Apache DolphinScheduler 上。
+
+### 增删维护管理
+
+我们可以加一些 Type,即标记的类型,比如某些 Namespace 只允许跑一些特定类型的 job。我们可以统计Namespace 下面的任务数量、pod 数量、请求资源量、请求等,查看队列的资源使用情况,界面默认只有管理员才可以操作。
+<div align=center>
+<img src="/img/2022-03-21/3.png"/>
+</div>
+
+### 多 K8s 集群
+
+K8s 支持多个集群,我们通过 Apache DolphinScheduler 客户端连接到多个 K8s 集群,batch、PROD 等可以搭建多套这K8s 集群,并通过 Namespace 支持多套 K8s 集群。
+
+我们可以编辑所开发的集群,修改所有的属性,如内存等。
+
+在新版中,用户权限的管理位于 user master 中,可以给某个用户授权,允许用户可以向某个 Namespace 上提交任务,并编辑资源。
+
+## 02 持续运行的 K8s 任务
+
+第二部分是关于我们目前已经支持的任务类型。
+
+### 启动不退出的普通镜像,如 ETL 等
+
+比如 ETL 这种提交完之后必须要手动操作才会退出的任务。这种任务一旦提交,就会把数据 sink,这种任务理论上只要不做升级,它永远不会停。
+<div align=center>
+<img src="/img/2022-03-21/4.png"/>
+</div>
+
+这种任务其实调度可能用不到,因为它只有启停这两种状态。所以,我们把它放在一个实时列表中,并做了一套监控。POD是实时运行的状态,主要是通过一个 Fabris operator 进行交互,可以进行动态进行扩展,以提高资源利用率。
+
+### Flink 任务
+
+我们对于 CPU 的管理可以精确到 0.01%,充分利用了 K8s 虚拟 CPU。
+<div align=center>
+<img src="/img/2022-03-21/5.png"/>
+</div>
+<div align=center>
+<img src="/img/2022-03-21/6.png"/>
+</div>
+<div align=center>
+<img src="/img/2022-03-21/7.png"/>
+</div>
+另外,我们也常用 Flink 任务,这是一种基于 ETL 的扩展。Flink 任务界面中包含编辑、查看状态、上线、下线、删除、执行历史,以及一些监控的设计。我们用代理的模式来设计 Flink UI,并开发了权限管控,不允许外部的人随意修改。
+
+Flink 默认了基于 checkpoint 启动,也可以指定一个时间创建,或基于上一次 checkpoint 来提交和启动。
+
+Flink 任务支持多种模式镜像版本,因为 K8s 本身就是运行镜像的,可以直接指定一些镜像来选择使用包,或通过文件上传的方式提交任务。
+
+另外,Batch 类型的任务可能一次运行即结束,或是按照周期来调度,自动执行完后退出,这和 Flink 不太一样,所以对于这种类型的任务,我们还是基于 Apache DolphinScheduler 做。
+
+## 03 K8s 任务的运行
+
+### K8s 任务的工作流调度
+
+我们在最底层增加了一些 Flink 的 batch 和 Spark 的 batch 任务,添加了一些配置,如使用的资源,所运行的 namespace 等。镜像信息可以支持一些自定义参数启动,封装起来后就相当于插件的模式,Apache DolphinScheduler 完美地扩展了它的功能。
+<div align=center>
+<img src="/img/2022-03-21/8.png"/>
+</div>
+
+### Spark 任务
+
+Spark 任务下可以查看 CPU 等信息,上传文件支持 Spark Jar 包,也可以单独上传配置文件。
+<div align=center>
+<img src="/img/2022-03-21/9.png"/>
+</div>
+
+这种多线程的上层,可以大幅提高处理速度。
+
+## 04 其他和规划
+
+### Watch 状态
+
+<div align=center>
+<img src="/img/2022-03-21/10.jpeg"/>
+</div>
+
+除了上述改动,我们还对任务运行状态进行了优化。
+
+当提交任务后,实际情况下运行过程中可能会出现失败,甚至任务的并行度也会基于某些策略发生改变。这时,我们就需要一种 watch 的方式来动态实时地来获取任务状态,并同步给 Apache DolphinScheduler 系统,以保证界面上看到的状态一定是最准确的。
+
+Batch 做不做 watch 都可以,因为这不是一个需要全量监听的独立任务而且 namespace 的资源使用率也是基于 watch 模式,这样就可以保证状态都是准确的。
+
+### 多环境
+
+多环境是指,同一个任务可以推送到不同的 K8s 集群上,比如同一个Flink 任务。
+
+从代码上来说,watch 有两种方式,一种是单独放一些 pod,比如当使用了 K8s 模块时,定义多个 K8s 集群的信息,在每个集群上创建一些watch pod 来监听集群中的任务状态,并做一些代理的功能。另一种是跟随api或单独服务,启动一个监听服务监听所有k8s集群。但这样无法而外做一些k8s内外网络的代理。
+
+Batch 有多种方案,一种是可以基于 Apache DolphinScheduler 自带功能,通过同步的方式进行 watch,这和 Apache DolphinScheduler 比较兼容。关于这方面的工作我们未来可能很快会提交 PR。Spark 使用相同的模式,提供一些 pod 来进行交互,而内部代码我们使用的是 Fabric K8s 的 client。
+
+今后,我们将与 Apache DolphinScheduler 一起共建,陆续支持这里讨论的功能,并和大家分享更多关于我们的工作进展。谢谢大家!
diff --git a/img/2022-03-21/1.png b/img/2022-03-21/1.png
new file mode 100644
index 0000000..0b01706
Binary files /dev/null and b/img/2022-03-21/1.png differ
diff --git a/img/2022-03-21/10.jpeg b/img/2022-03-21/10.jpeg
new file mode 100644
index 0000000..d53a0df
Binary files /dev/null and b/img/2022-03-21/10.jpeg differ
diff --git a/img/2022-03-21/2.png b/img/2022-03-21/2.png
new file mode 100644
index 0000000..e47690f
Binary files /dev/null and b/img/2022-03-21/2.png differ
diff --git a/img/2022-03-21/3.png b/img/2022-03-21/3.png
new file mode 100644
index 0000000..9f056da
Binary files /dev/null and b/img/2022-03-21/3.png differ
diff --git a/img/2022-03-21/4.png b/img/2022-03-21/4.png
new file mode 100644
index 0000000..b4fb077
Binary files /dev/null and b/img/2022-03-21/4.png differ
diff --git a/img/2022-03-21/5.png b/img/2022-03-21/5.png
new file mode 100644
index 0000000..c494c3b
Binary files /dev/null and b/img/2022-03-21/5.png differ
diff --git a/img/2022-03-21/6.png b/img/2022-03-21/6.png
new file mode 100644
index 0000000..5a6e3b6
Binary files /dev/null and b/img/2022-03-21/6.png differ
diff --git a/img/2022-03-21/7.png b/img/2022-03-21/7.png
new file mode 100644
index 0000000..0a4b5c6
Binary files /dev/null and b/img/2022-03-21/7.png differ
diff --git a/img/2022-03-21/8.png b/img/2022-03-21/8.png
new file mode 100644
index 0000000..aeaa101
Binary files /dev/null and b/img/2022-03-21/8.png differ
diff --git a/img/2022-03-21/9.png b/img/2022-03-21/9.png
new file mode 100644
index 0000000..aeaa101
Binary files /dev/null and b/img/2022-03-21/9.png differ
diff --git a/site_config/blog.js b/site_config/blog.js
index c870f04..167afbe 100644
--- a/site_config/blog.js
+++ b/site_config/blog.js
@@ -5,6 +5,14 @@ export default {
         list: [
             {
 
+                title: 'Fully Embracing K8s, Cisco Hangzhou Seeks to Support K8s Tasks Based on ApacheDolphinScheduler',
+                author: 'Debra Chen',
+                dateStr: '2022-3-21',
+                desc: 'K8s is the future of the cloud and is the only infrastructure platform.. ',
+                link: '/en-us/blog/K8s_Cisco_Hangzhou.html',
+            },
+            {
+
                 title: 'Cisco Hangzhou\'s Travel Through Apache DolphinScheduler Alert Module Refactor',
                 author: 'Debra Chen',
                 dateStr: '2022-3-16',
@@ -168,6 +176,14 @@ export default {
         postsTitle: '所有文章',
         list: [
             {
+                title: '全面拥抱 K8s,ApacheDolphinScheduler 应用与支持 K8s 任务的探索',
+                author: 'Debra Chen',
+                dateStr: '2022-3-21',
+                desc: 'K8s 打通了主流公私云之间的壁垒,成为唯一连通公私云的基础架构平台......',
+                link: '/zh-cn/blog/K8s_Cisco_Hangzhou.html',
+
+            },
+            {
                 title: '杭州思科对 Apache DolphinScheduler Alert 模块的改造',
                 author: 'Debra Chen',
                 dateStr: '2022-3-16',
diff --git a/site_config/home.jsx b/site_config/home.jsx
index b38977b..f766da2 100644
--- a/site_config/home.jsx
+++ b/site_config/home.jsx
@@ -55,6 +55,13 @@ export default {
       title: '事件 & 新闻',
       list: [
         {
+          img: '/img/2022-03-21/1.png',
+          title: '全面拥抱 K8s,ApacheDolphinScheduler 应用与支持 K8s 任务的探索',
+          content: 'K8s 打通了主流公私云之间的壁垒,成为唯一连通公私云的基础架构平台...',
+          dateStr: '2022-3-21',
+          link: '/zh-cn/blog/K8s_Cisco_Hangzhou.html',
+        },
+        {
           img: '/img/3-16/1.png',
           title: '杭州思科对 Apache DolphinScheduler Alert 模块的改造',
           content: '杭州思科已经将 Apache DolphinScheduler 引入公司自建的大数据平台..',
@@ -68,13 +75,6 @@ export default {
           dateStr: '2022-3-15',
           link: '/zh-cn/blog/How_Does_360_DIGITECH_process_10_000+_workflow_instances_per_day.html',
         },
-        {
-          img: '/img/2022-3-9/1.jpeg',
-          title: '途家大数据平台基于 Apache DolphinScheduler 的探索与实践',
-          content: '途家在 2019 年引入 Apache DolphinScheduler,在不久前的 Apache DolphinScheduler 2...',
-          dateStr: '2022-3-10',
-          link: '/zh-cn/blog/Exploration_and_practice_of_Tujia_Big_Data_Platform_Based.html',
-        },
       ],
     },
     ourusers: {
@@ -547,6 +547,13 @@ export default {
       title: 'Events & News',
       list: [
         {
+          img: '/img/2022-03-21/1.png',
+          title: 'Fully Embracing K8s, Cisco Hangzhou Seeks to Support K8s Tasks Based on ApacheDolphinScheduler',
+          content: 'K8s is the future of the cloud and is the only infrastructure platform...',
+          dateStr: '2022-3-10',
+          link: '/en-us/blog/K8s_ Cisco_Hangzhou.html',
+        },
+        {
 
           img: '/img/3-16/1.png',
           title: 'Cisco Hangzhou\'s Travel Through Apache DolphinScheduler Alert Module Refactor',
@@ -561,13 +568,6 @@ export default {
           dateStr: '2022-2-24',
           link: '/en-us/blog/How_Does_360_DIGITECH_process_10_000+_workflow_instances_per_day.html',
         },
-        {
-          img: '/img/2022-3-9/Eng/1.jpeg',
-          title: 'Exploration and practice of Tujia Big Data Platform Based on Apache DolphinScheduler',
-          content: 'Tujia introduced Apache DolphinScheduler in 2019...',
-          dateStr: '2022-3-10',
-          link: '/en-us/blog/Exploration_and_practice_of_Tujia_Big_Data_Platform_Based.html',
-        },
       ],
     },
     userreview: {