You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by ni...@apache.org on 2023/03/31 08:23:46 UTC

[dolphinscheduler-website] branch master updated: ADD Blog (#891)

This is an automated email from the ASF dual-hosted git repository.

nielifeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 5e9f26d6bf ADD Blog  (#891)
5e9f26d6bf is described below

commit 5e9f26d6bf5be36ed0f0613c6f3e233288f45c14
Author: lifeng <53...@users.noreply.github.com>
AuthorDate: Fri Mar 31 16:23:40 2023 +0800

    ADD Blog  (#891)
    
    * ADD Blog
    
    * Delete modules.xml
    
    * Delete .gitignore
    
    * Delete dolphinscheduler-website.iml
    
    * move img——>blog/mig
---
 ...tice_Boosting_Big_Data_Processing_Efficiency.md |  72 +++++++++++++++++++++
 ...Submission_Issue_with_DolphinScheduler_3.1.4.md |  62 ++++++++++++++++++
 blog/img/2023-03-31/assets01/01.png                | Bin 0 -> 179821 bytes
 blog/img/2023-03-31/assets01/02.png                | Bin 0 -> 58208 bytes
 blog/img/2023-03-31/assets01/03.png                | Bin 0 -> 131881 bytes
 blog/img/2023-03-31/assets01/04.png                | Bin 0 -> 320135 bytes
 blog/img/2023-03-31/assets01/05.png                | Bin 0 -> 450654 bytes
 blog/img/2023-03-31/assets02/01.jpg                | Bin 0 -> 51566 bytes
 config/blog/en-us/release.json                     |   8 ++-
 config/blog/en-us/user.json                        |   8 +++
 10 files changed, 149 insertions(+), 1 deletion(-)

diff --git a/blog/en-us/DolphinScheduler_Cisco_Webex_k8s_Integration_Practice_Boosting_Big_Data_Processing_Efficiency.md b/blog/en-us/DolphinScheduler_Cisco_Webex_k8s_Integration_Practice_Boosting_Big_Data_Processing_Efficiency.md
new file mode 100755
index 0000000000..d3a6f70ec4
--- /dev/null
+++ b/blog/en-us/DolphinScheduler_Cisco_Webex_k8s_Integration_Practice_Boosting_Big_Data_Processing_Efficiency.md
@@ -0,0 +1,72 @@
+---
+title:DolphinScheduler✖️Cisco Webex: k8s Integration Practice, Boosting Big Data Processing Efficiency!
+keywords: Apache,DolphinScheduler,scheduler,big data,ETL,airflow,hadoop,orchestration,dataops,Meetup
+description: ummary: Cisco Webex is a software company that develops and sells online meeting...
+---
+# DolphinScheduler✖️Cisco Webex: k8s Integration Practice, Boosting Big Data Processing Efficiency!
+
+Summary: Cisco Webex is a software company that develops and sells online meeting, video conferencing, cloud calling services, and contact center as a service applications. The team has designed and built a big data platform to serve data ingestion and workload data processing for their suite of products. Taking the Webex Meeting product as an example, Webex Meetings generate various metrics. When a meeting is held, both the client and server send numerous metrics and logs to the Kafka c [...]
+
+## Business Challenges
+Since Cisco Webex is a global collaboration service provider with customers spanning multiple time zones and continents, it has numerous data centers worldwide. These data centers include locally self-managed data centers and clusters managed by cloud providers such as Amazon and Google. In the past, Cisco Webex would use mirroring to aggregate all data from global data centers into a centralized Kafka cluster in the United States, and from there, data processing and integration would begin.
+
+In recent years, Cisco Webex has established multiple clusters worldwide for data localization. The data model has shifted from a centralized cluster containing data from all around the world to individual data centers containing locally generated data.
+
+Another issue that Cisco Webex's next-generation data platform aims to address is "data silos." Since different types of services run on different infrastructures, each product has its own data ingestion and data platform implementation, and data sources are diverse with numerous data storage formats. This makes it difficult to provide customers with a unified data source and ensure consistency between different systems.
+
+Cisco Webex's vision is to create a data platform that can serve every internal and external customer, eliminating data silos from a unified architecture, data storage, and data ingestion technology, and integrating all infrastructures. Furthermore, this data platform must be able to adapt to any public cloud and existing private data center within the architecture.
+
+## Solution
+
+### 1. DolphinScheduler and k8s Integration
+![01](/img/2023-03-31/assets01/01.png)
+
+
+
+As shown in the architecture diagram, the left part represents DolphinScheduler's features. Different task types run on these workers. All data processing jobs, such as Flink and Sparks, used to run on multiple separate Yarn clusters. We had a CDH cluster for batch Spark jobs and Flink jobs, and multiple Flink jobs ran on different Flink clusters. In 2021, we decided to build a Kubernetes cluster to replace the Yarn clusters for the following reasons.
+
+1. Using Kubernetes makes our daily operations smoother and easier.
+2. The second reason for adopting Kubernetes is that it allows us to deploy various containerized services.
+
+**As a result, Cisco Webex built a Kubernetes cluster to replace the Yarn clusters,** allowing all data processing jobs to run on the Kubernetes cluster, extending DolphinScheduler's capabilities, and **integrating Flink, Spark, and Kubernetes features with DolphinScheduler.**
+
+### 2. Multi-cluster ETL Job Management
+![02](/img/2023-03-31/assets01/02.png)
+
+
+A typical use case for Cisco Webex is deploying the same job on multiple clusters. To minimize deployment work, the approach is to generalize the common processing logic and replace the required configurations for each cluster, enabling one-click development for multiple clusters. Cisco Webex **uses a centralized DolphinScheduler as the job scheduling platform for all data processing jobs,** **running jobs in different data centers.** When users submit new jobs to different clusters, Dol [...]
+
+### 3. Kubernetes Multi-cluster Management
+
+![03](/img/2023-03-31/assets01/03.png)
+
+
+Cisco Webex has built many Kubernetes compute clusters in private data centers worldwide or in public clouds like AWS. To enable DolphinScheduler to submit and manage jobs for data centers spread across the globe, Cisco Webex has implemented features such as cluster management and namespaces on DolphinScheduler.
+
+### 4. Simple ETL Pipeline Drag-and-Drop Generation Framework
+
+![04](/img/2023-03-31/assets01/04.png)
+
+
+For simple processing jobs without complex competitive logic, Cisco Webex has developed a drag-and-drop pipeline generation framework on DolphinScheduler.
+
+Users can generate complex real-time data processing pipelines by dragging and dropping on the canvas. By configuring predefined source filters, mappings, and sync operators, users don't need to write any code. Notably, Cisco Webex has also integrated metadata into the data center for source and map operators to use. As a result, when users choose the topics they want to process, the job list they see comes from the API data in the data center. Users don't need to type in names and Kafka [...]
+
+### 5. Flink Jobs on Kubernetes
+![05](/img/2023-03-31/assets01/05.png)
+
+
+Cisco Webex has also built Flink jobs in DolphinScheduler based on Kubernetes features. Some people might be confused because there is already a Flink task port in the DolphinScheduler workflow. This is because the Flink task in DolphinScheduler only applies to Yarn, but we intend to run all jobs on Kubernetes clusters. We achieved Flink job execution on Kubernetes by adding Kubernetes-related APIs to the current DolphinScheduler architecture.
+
+
+## User Benefits
+
+1. Built the next-generation data platform based on DolphinScheduler, enabling running all types of jobs on a single platform;
+2. Broke down Cisco Webex data silos, connected global data centers, integrated any public cloud and existing private data centers, ensuring consistency across systems;
+3. Ran all data processing jobs on Kubernetes clusters, reducing operational and maintenance costs.
+
+## User Profile
+
+San Francisco-based Cisco Webex (WebEx) is a subsidiary of Cisco, a software company that develops and sells online meetings, video conferencing, cloud calling services, and contact center as service applications, creating on-demand software solutions for companies of various sizes.
+
+
diff --git a/blog/en-us/PyDolphinScheduler_Releases_Version_4.0.2_Fixing_Workflow_Submission_Issue_with_DolphinScheduler_3.1.4.md b/blog/en-us/PyDolphinScheduler_Releases_Version_4.0.2_Fixing_Workflow_Submission_Issue_with_DolphinScheduler_3.1.4.md
new file mode 100755
index 0000000000..ee46290279
--- /dev/null
+++ b/blog/en-us/PyDolphinScheduler_Releases_Version_4.0.2_Fixing_Workflow_Submission_Issue_with_DolphinScheduler_3.1.4.md
@@ -0,0 +1,62 @@
+---
+title:PyDolphinScheduler releases version 4.0.2, fixing the problem that workflow cannot be submitted to DolphinScheduler 3.1.4
+keywords: Apache,DolphinScheduler,scheduler,big data,ETL,airflow,hadoop,orchestration,dataops,Meetup
+description: ummary:PyDolphinScheduler officially releases version 4.0.2...
+---
+# PyDolphinScheduler releases version 4.0.2, fixing the problem that workflow cannot be submitted to DolphinScheduler 3.1.4
+
+PyDolphinScheduler officially releases version 4.0.2, which mainly fixes the problem that version 4.0.1 cannot submit workflows to Apache DolphinScheduler 3.1.4.
+
+In addition, the major optimizations of PyDolphinScheduler 4.0.2 include:
+
+* PyDolphinScheduler verifies the wrong version of Apache DolphinScheduler
+* Python task type adds stmdency dependency
+* The problem of missing dependencies of lower versions of Python
+## Optimization Details
+
+### 01 Fix the problem that the workflow cannot be submitted to DolphinScheduler 3.1.4
+
+PyDolphinScheduler 4.0.1 cannot submit workflows to Apache DolphinScheduler 3.1.4, because Apache DolphinScheduler 3.1.4 is released later than PyDolphinScheduler 4.0.1, and there are some incompatible updates. 
+
+PyDolphinScheduler 4.0.2 version fixes this problem.
+
+### 02 The issue of verifying the version of DolphinScheduler incorrectly
+
+This happens only in extreme situations, where the user does not use the official installation package of Apache DolphinScheduler, but modifies the code and packages it by himself, there may be a version problem reported by PyDolphinScheduler that is not supported. 
+
+PyDolphinScheduler 4.0.2 is compatible with this scenario.
+
+### 03 Add stmdency dependency to Python task type
+
+Before version 4.0.2, only the Python function wrapper introduced stmdency dependency parsing. In 4.0.2 and later versions, we have also added stmdency dependency parsing for the Python task type itself to ensure that the functional dependencies can be obtained.
+
+
+
+## Modification list
+
+
+
+### 01 Bugfixes
+
+* Support for submitting workflows to Apache DolphinScheduler 3.1.4
+* Detect Apache DolphinScheduler version issues #69
+* CI anomaly detection due to dev version #70
+* Python task type supports stmdency #72
+* Add missing packaging dependencies #81
+### 02 Optimization
+
+* Workflow start and end time support datetime type, schedule detection #68
+* Migrate CI related configuration to setup.cfg #82
+### 03 documents
+
+Modify the release document
+
+## Release Notes
+
+[https://github.com/apache/dolphinscheduler-sdk-Python/releases/tag/4.0.2](https://github.com/apache/dolphinscheduler-sdk-Python/releases/tag/4.0.2)
+
+
+## Thanks to contributors
+
+zhongjiajie
+
diff --git a/blog/img/2023-03-31/assets01/01.png b/blog/img/2023-03-31/assets01/01.png
new file mode 100755
index 0000000000..3a952fdd21
Binary files /dev/null and b/blog/img/2023-03-31/assets01/01.png differ
diff --git a/blog/img/2023-03-31/assets01/02.png b/blog/img/2023-03-31/assets01/02.png
new file mode 100755
index 0000000000..24ab1d3312
Binary files /dev/null and b/blog/img/2023-03-31/assets01/02.png differ
diff --git a/blog/img/2023-03-31/assets01/03.png b/blog/img/2023-03-31/assets01/03.png
new file mode 100755
index 0000000000..95c4de8c6f
Binary files /dev/null and b/blog/img/2023-03-31/assets01/03.png differ
diff --git a/blog/img/2023-03-31/assets01/04.png b/blog/img/2023-03-31/assets01/04.png
new file mode 100755
index 0000000000..61977e7dbb
Binary files /dev/null and b/blog/img/2023-03-31/assets01/04.png differ
diff --git a/blog/img/2023-03-31/assets01/05.png b/blog/img/2023-03-31/assets01/05.png
new file mode 100755
index 0000000000..e861aa2834
Binary files /dev/null and b/blog/img/2023-03-31/assets01/05.png differ
diff --git a/blog/img/2023-03-31/assets02/01.jpg b/blog/img/2023-03-31/assets02/01.jpg
new file mode 100755
index 0000000000..bf21770c39
Binary files /dev/null and b/blog/img/2023-03-31/assets02/01.jpg differ
diff --git a/config/blog/en-us/release.json b/config/blog/en-us/release.json
index 2fb3a43560..4f5f1b9c73 100644
--- a/config/blog/en-us/release.json
+++ b/config/blog/en-us/release.json
@@ -1,4 +1,10 @@
-
+{
+  "PyDolphinScheduler_Releases_Version_4.0.2_Fixing_Workflow_Submission_Issue_with_DolphinScheduler_3.1.4": {
+    "title": "# PyDolphinScheduler releases version 4.0.2, fixing the problem that workflow cannot be submitted to DolphinScheduler 3.1.4",
+    "author": "Leonard Nie",
+    "dateStr": "2023-3-27",
+    "desc": "PyDolphinScheduler officially releases version 4.0.2........ "
+  },
 {
   "Apache_dolphinScheduler_3.1.2": {
     "title": "Apache DolphinScheduler releases version 3.1.2 with Python API optimizations",
diff --git a/config/blog/en-us/user.json b/config/blog/en-us/user.json
index e902f0fd52..eae0629fe1 100644
--- a/config/blog/en-us/user.json
+++ b/config/blog/en-us/user.json
@@ -1,3 +1,11 @@
+{ "DolphinScheduler_Cisco_Webex_k8s_Integration_Practice_Boosting_Big_Data_Processing_Efficiency": {
+  "title": "DolphinScheduler✖️Cisco Webex: k8s Integration Practice, Boosting Big Data Processing Efficiency!",
+  "author": "Leonard Nie",
+  "dateStr": "2022-3-29",
+  "desc": "Cisco Webex is a software company that develops and sells online meeting,... ",
+  "img": "/img/2023-03-31/assets01/01.png",
+  "logo": ""
+},
 { "Application_transformation_of_the_FinTech_data_center_based_on_DolphinScheduler": {
   "title": "Application transformation of the FinTech data center based on DolphinScheduler",
   "author": "Leonard Nie",