You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by wa...@apache.org on 2022/12/26 09:22:56 UTC
[dolphinscheduler-website] branch master updated: ADD Blog (#871)

This is an automated email from the ASF dual-hosted git repository.

wanggenhua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git


The following commit(s) were added to refs/heads/master by this push:
     new b021424931 ADD Blog (#871)
b021424931 is described below

commit b0214249314e125cb2a1ccca934403def149fe6c
Author: lifeng <53...@users.noreply.github.com>
AuthorDate: Mon Dec 26 17:22:45 2022 +0800

    ADD Blog (#871)
    
    * ADD Blog
    
    * updata
    
    * updata
---
 blog/en-us/Apache_dolphinScheduler_3.1.2.md        |  68 ++++++
 ...inTech_data_center_based_on_DolphinScheduler.md | 176 +++++++++++++++
 ...e_DolphinScheduler_Machine_Learning_Workflow.md | 236 +++++++++++++++++++++
 blog/img/media/16720397220045/16720397367629.jpg   | Bin 0 -> 59519 bytes
 blog/img/media/16720400637574/16720400704016.jpg   | Bin 0 -> 83706 bytes
 blog/img/media/16720400637574/16720400759248.jpg   | Bin 0 -> 58667 bytes
 blog/img/media/16720400637574/16720401185208.jpg   | Bin 0 -> 46943 bytes
 blog/img/media/16720400637574/16720401253440.jpg   | Bin 0 -> 126403 bytes
 blog/img/media/16720400637574/16720401472681.jpg   | Bin 0 -> 67800 bytes
 blog/img/media/16720400637574/16720402083983.jpg   | Bin 0 -> 33154 bytes
 blog/img/media/16720400637574/16720402291980.jpg   | Bin 0 -> 63322 bytes
 blog/img/media/16720400637574/16720402508893.jpg   | Bin 0 -> 65984 bytes
 blog/img/media/16720400637574/16720402711565.jpg   | Bin 0 -> 27768 bytes
 blog/img/media/16720400637574/16720402758234.jpg   | Bin 0 -> 50365 bytes
 blog/img/media/16720400637574/16720403297820.jpg   | Bin 0 -> 61933 bytes
 blog/img/media/16720400637574/16720403572773.jpg   | Bin 0 -> 25883 bytes
 blog/img/media/16720400637574/16720403977529.jpg   | Bin 0 -> 47344 bytes
 blog/img/media/16720400637574/16720404142720.jpg   | Bin 0 -> 34341 bytes
 blog/img/media/16720400637574/16720404412259.jpg   | Bin 0 -> 33141 bytes
 blog/img/media/16720405454837/16720405586499.jpg   | Bin 0 -> 155526 bytes
 blog/img/media/16720405454837/16720407528096.jpg   | Bin 0 -> 53369 bytes
 blog/img/media/16720405454837/16720407653742.jpg   | Bin 0 -> 33004 bytes
 blog/img/media/16720405454837/16720408372893.jpg   | Bin 0 -> 32516 bytes
 blog/img/media/16720405454837/16720408471707.jpg   | Bin 0 -> 40498 bytes
 blog/img/media/16720405454837/16720408537181.jpg   | Bin 0 -> 25097 bytes
 blog/img/media/16720405454837/16720408664980.jpg   | Bin 0 -> 31613 bytes
 blog/img/media/16720405454837/16720408742949.jpg   | Bin 0 -> 75740 bytes
 blog/img/media/16720405454837/16720408868765.jpg   | Bin 0 -> 36381 bytes
 blog/img/media/16720405454837/16720408963992.jpg   | Bin 0 -> 60099 bytes
 blog/img/media/16720405454837/16720409057879.jpg   | Bin 0 -> 42155 bytes
 blog/img/media/16720405454837/16720409115839.jpg   | Bin 0 -> 30300 bytes
 blog/img/media/16720405454837/16720409204499.jpg   | Bin 0 -> 14531 bytes
 blog/img/media/16720405454837/16720409274430.jpg   | Bin 0 -> 25945 bytes
 config/blog/en-us/release.json                     |   6 +
 config/blog/en-us/tech.json                        |   7 +
 config/blog/en-us/user.json                        |   9 +-
 36 files changed, 501 insertions(+), 1 deletion(-)

diff --git a/blog/en-us/Apache_dolphinScheduler_3.1.2.md b/blog/en-us/Apache_dolphinScheduler_3.1.2.md
new file mode 100644
index 0000000000..19a69f25ea
--- /dev/null
+++ b/blog/en-us/Apache_dolphinScheduler_3.1.2.md
@@ -0,0 +1,68 @@
+---
+title: Apache DolphinScheduler releases version 3.1.2 with Python API optimizations
+keywords: Apache,DolphinScheduler,scheduler,big data,ETL,airflow,hadoop,orchestration,dataops,Kubernetes
+description: Recently, Apache DolphinScheduler released version 3.1.2.
+---
+# Apache DolphinScheduler releases version 3.1.2 with Python API optimizations
+![](/img/media/16720397220045/16720397367629.jpg)
+Recently, Apache DolphinScheduler released version 3.1.2. This version is mainly based on version 3.1.2, with 6 Python API optimizations, 19 bug fixes, and 4 document updates.
+
+## Important bug fixes:
+
+* Worker kill process does not take effect #12995
+* Complement dependency mode generates wrong workflow instance (#13009)
+* Python task parameter passing error (#12961)
+* Fix dependency task null pointer (#12965)
+* Task retry error (#12903)
+* Shell task calls dolphinscheduler_env.sh configuration file exception (#12909)
+* Corrected documentation for multiple Hive SQL runs (#12765)
+* Added token authentication for Python API #12893
+
+## Change Log
+
+### Bug fix
+* [Improvement] change alert start.sh (#13100)
+* [Fix] Add token as authentication for python gateway (#12893)
+* [Fix-13010] [Task] The Flink SQL task page selects the pre-job deployment mode, but the task executed by the worker is the Flink local mode
+* [Fix-12997][API] Fix that the end time is not reset when the workflow instance reruns. (#12998)
+* [Fix-12994] [Worker] Fix kill process does not take effect (#12995)
+* Fix sql task will send alert if we don’t choose the send email #12984
+* [Fix-13008] [UI] When using the complement function, turn on the dependent mode to generate multiple unrelated workflow instances (#13009)
+* [Fix][doc] python api release link
+* [Fix] Python task can not pass the parameters to downstream task. (#12961)
+* [Fix] Fix Java path in Kubernetes Helm Chart (#12987)
+* [Fix-12963] [Master] Fix dependent task node null pointer exception (#12965)
+* [Fix-12954] [Schedule] Fix that workflow-level configuration information does not take effect when timing triggers execution
+* Fix execute shell task exception no dolphinscheduler_env.sh file execute permission (#12909)
+* Upgrade clickhouse jdbc driver #12639
+* add spring-context to alert api (#12892)
+* [Upgrade][SQL]Modify the table t_ds_worker_group to add a description field in the postgresql upgrade script #12883
+* Fix NPE while retry task (#12903)
+* Fix-12832][API] Fix update worker group exception group name already exists. #12874
+* Fix and enhance helm db config (#12707)
+
+### Document
+* [Fix][Doc] Fix sql-hive and hive-cli doc (#12765)
+* [Fix][Alert] Ignore alert not write info to db (#12867)
+* [Doc] Add skip spotless check during ASF release #12835
+* [Doc][Bug] Fix dead link caused by markdown cross-files anchor #12357 (#12877)
+
+### Python API
+* [Fix] python API upload resource center failed
+* [Feature] Add CURD to the project/tenant/user section of the python-DS (#11162)
+* [Chore][Python] Change name from process definition to workflow (#12918)
+* [Feature] Support set execute type to pydolphinscheduler (#12871)
+* [Hotfix] Correct python doc link
+* [Improvement][Python] Validate version of Python API at launch (#11626)
+
+## Acknowledgment
+
+Thanks to all community contributors who participated in the release of Apache DolphinScheduler 3.1.2. Below is the list of the contributors by GitHub ID, in no particular order.
+
+
+
+| liqingwang   | liqingwang    | hezean       |
+|--------------|-------------|--------------|
+| ruanwenjun | simsicon | jieguangzhou |
+| Tianqi-Dotes  | zhuangchong | zhongjiajie |
+
diff --git a/blog/en-us/Application_transformation_of_the_FinTech_data_center_based_on_DolphinScheduler.md b/blog/en-us/Application_transformation_of_the_FinTech_data_center_based_on_DolphinScheduler.md
new file mode 100644
index 0000000000..0d6e2b5703
--- /dev/null
+++ b/blog/en-us/Application_transformation_of_the_FinTech_data_center_based_on_DolphinScheduler.md
@@ -0,0 +1,176 @@
+---
+title:Application transformation of the FinTech data center based on DolphinScheduler
+keywords: Apache,DolphinScheduler,scheduler,big data,ETL,airflow,hadoop,orchestration,dataops,Kubernetes
+description: On Apache DolphinScheduler Meetup last week, Feng Mingxia,
+---
+# Application transformation of the FinTech data center based on DolphinScheduler
+![](/img/media/16720400637574/16720400704016.jpg)
+On Apache DolphinScheduler Meetup last week, Feng Mingxia, a big data engineer from Chengfang FinTech, brought us the application practice of DolphinScheduler in the field of FinTech. The following is the presentation.
+
+![](/img/media/16720400637574/16720400759248.jpg)
+Feng Mingxia, Chengfang Financial Technology Big Data Engineer
+
+Focusing on real-time and offline data processing and analysis in the field of big data, at present, he is mainly responsible for the research and development of data middle platforms.
+
+Speech summary:
+
+· Use background
+
+· Secondary transformation based on DolphinScheduler
+
+· DolphinScheduler plug-in expansion
+
+· Future and outlook
+
+## Use Background
+
+### Data Center Construction
+
+At present, big data technology is widely used in the financial field, and the big data platform has become a financial infrastructure. In the construction of a big data platform, the data center is the brightest star, which is the entrance and interface for business systems to use big data, when various business systems are connected to the data center, the data middle office needs to provide unified management and unified access to ensure the security, reliability, efficiency, and reli [...]
+
+As shown in the figure below, the data middle office is in the middle link between the business systems and the big data platform, each business system accesses the big data platform through the services provided by the data center.
+
+![](/img/media/16720400637574/16720401185208.jpg)
+The core concept of the data middle office is to realize four modernizations, namely, business data, data asset, asset service, and service business. From business to data, and back to the complete closed loop formed by business, support the digital transformation of enterprises.
+
+![](/img/media/16720400637574/16720401253440.jpg)
+The logical architecture of the data center is shown in the figure above, analyzing from bottom to top, First, the bottom layer is the data resource layer, which is the original data generated by various business systems; The next layer is data integration, and the methods of data integration include offline collection and real-time collection, of which the technologies used include Flume, CDC real-time collection, etc.
+
+The next layer is the data lake, which puts data in the lake through data integration, stored in Hadoop distributed storage or MPP architecture database.
+
+The next layer is the data engine layer, which processes and analyzes the data in the data lake through real-time and offline computing engines like Flink and Spark, form service data is available for the upper layer.
+
+The next layer is the data service that the data center needs to provide. At present, the data service includes data development service and data sharing service, providing data development and sharing capabilities for the upper business systems.
+
+The data application layer is the specific application of data, including data anomaly detection, data governance, AI decision-making, and BI analysis.
+
+In the construction of the whole data middle platform, the scheduling engine is the core position in the data engine layer and is also an important function in the construction of the data middle platform.
+
+### Problems and challenges faced by the data center
+The data middle office will face some problems and challenges.
+
+First of all, the execution and scheduling of data tasks are the core and key of data development services provided by the data center.
+
+Secondly, the data center provides unified data service management, service development, service invocation, and service monitoring.
+
+Third, ensuring the security of financial data is the primary task of FinTech, and the data middle office needs to ensure the security and reliability of data services.
+
+Under the above problems and challenges, we investigated some open-source scheduling engines.
+
+![](/img/media/16720400637574/16720401472681.jpg)
+
+At present, we use a variety of scheduling engines in the production process, such as oozie, XXL job, and DolphinScheduler, which we introduced through research and analysis in 2022, and plays a very important role in the construction of the entire data center.
+
+First of all, DolphinScheduler partially addresses our requirements for unified service management, service development, service invocation, and service management.
+
+Secondly, it has its own unique design in task fault tolerance, supporting HA, elastic expansion, fault tolerance, and basically ensuring the safe operation of tasks.
+
+Third, it supports task and node monitoring.
+
+Fourth, it supports multi-tenant and permission control.
+
+Finally, its community is very active, with rapid version change and problem repair.
+
+Through the analysis of DolphinScheduler’s architecture and source code, we believe that its architecture conforms to the mainstream big data framework design and has similar architecture patterns and designs with excellent foreign products such as Hbase and Kafka.
+
+### Re-development based on DolphinScheduler
+
+To make DolphinScheduler more suitable for our application scenarios, we have made a second transformation based on DolphinScheduler, it includes 6 aspects.
+
+* Add asynchronous service call function
+* Add Metabase Oracle adaptation
+* Add multi-environment configuration capability
+* Add log and historical data-cleaning strategy
+* Add access to Yarn logs
+* Add service security strategy
+
+### Add asynchronous service calling function
+
+First, the asynchronous service invocation function is added, the figure above shows the architecture of DolphinScheduler version 2.0.5, and most of them are service components of the native DolphinScheduler. GateWay marked in red is a gateway service added based on DolphinScheduler. It realizes flow control, black and white list, and is also the access for users to access service development. By optimizing the startup interface of the process and returning the unique code of the process [...]
+
+![](/img/media/16720400637574/16720402083983.jpg)
+In the classic DolphinScheduler access mode, the workflow execution instructions submitted by users will enter the command table in the original database, after getting the zk lock, the master component obtains commands from the Metabase, performs DAG parsing, generates actual process instances, delivers the decomposed tasks to the work node for execution through RPC, and then synchronously waits for the execution results.
+
+In the native DolphinScheduler request, After the user submits the instruction, The return code for executing the workflow is missing, Therefore, we have added a unique return ID, through which users can query the subsequent process status, download logs, and download data.
+
+### Add Metabase Oracle adaptation
+Our second transformation is to adapt DolphinScheduler to the Oracle database. At present, the metadatabase of the native DolphinScheduler is MySQL, and we need to convert the original database into an Oracle database according to our production needs. To achieve this, it is necessary to complete the adaptation of the data initialization module and the data operation module.
+
+![](/img/media/16720400637574/16720402291980.jpg)
+
+First, for the data initialization module, we modified the install_ config. Conf configuration file to change it to the configuration of Oracle.
+
+Secondly, the Oracle application needs to be added Yml, we are in dolphinscheduler-2.0*/ the application. yml of Oracle is added to the apache-dolphinscheduler-2.0. * — bin/conf/directory.
+
+Finally, we convert the data operation module, Modify the mapper file and the file, Because the Dolphinscheduler-dao module is a database operation module, other modules will reference this module to implement database operations. It uses Mybatis for database connection, so you need to change the mapper file, all mapper files are in the resources directory.
+
+### Multi-environment configuration capability
+The installation of the native DolphinScheduler version cannot be configured according to the environment, Generally, relevant parameters need to be adjusted according to the actual environment. We want to enhance the environment selection and configuration through the installation script, to reduce the cost of manual online modification, Automated installation. It is believed that all partners have encountered similar difficulties. In order to use DolphinScheduler in a development envir [...]
+
+We modify the install Sh.file, add the input parameter [dev|test|product], and select the appropriate install_ config_$ {evn}. Conf can be installed to automatically select the environment.
+
+In addition, DolphinScheduler’s workflow is strongly bound to the environment, and workflows in different environments cannot be shared. The following figure shows the JSON file of a workflow exported by the native DolphinScheduler. The grayed part represents the resource resources on which the process depends. The ID is a number, which is generated by the auto-increment of the database. However, if the process instances generated by environment a are placed in environment b, there may b [...]
+
+![](/img/media/16720400637574/16720402508893.jpg)
+We solve this problem by generating the absolute path of the resource as the unique ID of the resource.
+
+### Log and historical data cleaning policy
+
+The DolphinScheduler generates a lot of data. The database will generate instance data in the instance table, which will continue to grow with the running of instance tasks. Our strategy is to clean up the data of these tables according to the agreed save cycle by defining the scheduled task of DolphinScheduler.
+
+Secondly, the data of DolphinScheduler mainly includes log data and task execution directory, including the service log data of the worker, master, API, and the directory executed by the worker. These data will not be automatically deleted at the end of task execution, but also need to be deleted through scheduled tasks. By running the log cleanup script, we can automatically delete logs.
+
+![](/img/media/16720400637574/16720402711565.jpg)
+![](/img/media/16720400637574/16720402758234.jpg)
+
+
+###  Increased access to Yarn logs
+
+The native DolphinScheduler can obtain the log information executed on the worker node, but for tasks on Yarn, you need to log in to the Yarn cluster and obtain it through the command or interface. We obtain the Yarn task ID by analyzing the YARNID tag in the log and obtain the task log through the yarnclient. The process of manually viewing logs is reduced.
+
+![](/img/media/16720400637574/16720403297820.jpg)
+
+
+### Service security policy
+
+Add Monitor component monitoring
+
+![](/img/media/16720400637574/16720403572773.jpg)
+
+
+The above figure shows the interaction between the master and worker, the two core components of DolphinScheduler, and Zookeeper. When the MasterServer service starts, it will register a temporary node with Zookeeper, and conduct fault tolerance processing by listening for changes in Zookeeper temporary nodes. WorkerServer is mainly responsible for task execution. When the WorkerServer service starts, it registers a temporary node with Zookeeper and maintains the heartbeat. At present, Z [...]
+
+The relevant parameters can be seen when the master and worker connect to Zookeeper, including connection timeout, session timeout, and a maximum number of retries.
+
+Due to network jitter and other factors, master and worker nodes may lose connection with zk. After the loss of connection, because the temporary information registered on the zk by the worker and master disappears, it will be determined that the zk is lost from the master and worker, affecting the task execution. Without human intervention, the task will be delayed. We added the monitor component to monitor the service status. Through the scheduled task cron, we run the monitor program  [...]
+
+* Add Kerberos authentication link for service components using zk
+
+The second security policy is to add the Kerberos authentication link for service components using zk. Kerberos is a network authentication protocol designed to provide powerful authentication services for client/server applications through a key system. Master service components, API service components, and worker service components complete Kerberos authentication at startup, and then use zk for relevant service registration and heartbeat connection to ensure service security.
+
+### DolphinScheduler-based plugin extension
+In addition, we have extended the plug-in based on DolphinScheduler. We have extended four types of operators, including Richshell, SparkSQL, Dataexport, and GBase operators.
+
+### Add a new task type Richshell
+First of all, Richshell, a new task type, has enhanced the native Shell function. It mainly realizes the dynamic replacement of script parameters through the template engine. Users can replace script parameters through service calls, making users more flexible in using parameters. It is a supplement to global parameters.
+
+![](/img/media/16720400637574/16720403977529.jpg)
+
+
+### Add a new task type SparkSQL
+
+The second operator added is SparkSQL. Users can execute Spark tasks by writing SQL so that tasks can be scheduled on Yarn. DolphinScheduler natively also supports SparkSQL execution in JDBC mode, but there is a situation of resource contention because the number of JDBC connections is limited. The Yarn cluster mode cannot be used for execution through tools such as SparkSQL/Spark beer. By using this task type, SparkSQL programs can be run on the Yarn cluster in cluster mode to maximize  [...]
+
+### Add a new task type Dataexport
+
+The third addition is Dataexport, which is also a data export operator. Users can export data stored in components by selecting different storage components. Components include ES, Hive, Hbase, etc.
+
+![](/img/media/16720400637574/16720404142720.jpg)
+The data in the big data platform may be used for BI display, statistical analysis, machine learning, and other data preparation after being exported. Most of these scenarios require data export, and Spark’s data processing capability is used to achieve the export function of different data sources.
+
+### Add a new task type GBase
+The fourth plug-in added is Gbase. GBase 8a MPP Cluster is a distributed parallel database cluster with column storage and shared nothing architecture. It has the characteristics of high performance, high availability, high expansion, etc. It is suitable for OLAP scenarios (query scenarios), can provide a cost-effective general computing platform for large-scale data management, and is widely used to support various data warehouse systems, BI systems, and decision support systems.
+
+![](/img/media/16720400637574/16720404412259.jpg)
+As an application scenario of data entering the lake, we have added a GBase operator, which supports the import, export, and execution of GBase data.
+
diff --git a/blog/en-us/Quick_Start_with_Apache_DolphinScheduler_Machine_Learning_Workflow.md b/blog/en-us/Quick_Start_with_Apache_DolphinScheduler_Machine_Learning_Workflow.md
new file mode 100644
index 0000000000..bc78b98f16
--- /dev/null
+++ b/blog/en-us/Quick_Start_with_Apache_DolphinScheduler_Machine_Learning_Workflow.md
@@ -0,0 +1,236 @@
+---
+title:Quick Start with Apache DolphinScheduler Machine Learning Workflow
+keywords: Apache,DolphinScheduler,scheduler,big data,ETL,airflow,hadoop,orchestration,dataops,Kubernetes,Conda
+description: With the release of Apache DolphinScheduler 3.1.0, many AI components
+---
+# Quick Start with Apache DolphinScheduler Machine Learning Workflow
+![](/img/media/16720405454837/16720405586499.jpg)
+## Abstract
+With the release of Apache DolphinScheduler 3.1.0, many AI components have been added to help users to build machine learning workflows on Apache DolphinScheduler more efficiently.
+
+This article describes in detail how to set up DolphinScheduler with some Machine Learning environments. It also introduces the use of the MLflow component and the DVC component with experimental examples.
+
+## DolphinScheduler and Machine Learning Environment
+Test Program
+All code can be found at https://github.com/jieguangzhou/dolphinscheduler-ml-tutorial
+
+Get the code
+
+```git clone <https://github.com/jieguangzhou/dolphinscheduler-ml-tutorial.git>
+git checkout dev
+```
+### Installation environment
+**Conda**
+Simply install it following the official website and add the path to Conda to the environment variables
+
+After installation mlflow and dvc commands will be installed in conda’s bin directory.
+```
+pip install mlflow==1.30.0 dvc
+```
+
+**Java8 environment**
+
+```sudo apt-get update
+sudo apt-get install openjdk-8-jdk
+java -version
+```
+Configure the Java environment variable, ~/.bashrc or ~/.zshrc
+
+```# Confirm that your jdk is as below and configure the environment variables
+export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
+export PATH=$PATH:$JAVA_HOME/bin
+```
+
+**Apache DolphinScheduler 3.1.0**
+
+Download DolphinScheduler 3.1.0
+```
+# Go to the following directory (you can install in other directories, for the convenience of replication, in this case, the installation is performed in the following directory)
+cd first-example/install_dolphinscheduler
+## install DolphinScheduler
+wget <https://dlcdn.apache.org/dolphinscheduler/3.1.0/apache-dolphinscheduler-3.1.0-bin.tar.gz>
+tar -zxvf apache-dolphinscheduler-3.1.0-bin.tar.gz
+rm apache-dolphinscheduler-3.1.0-bin.tar.gz
+```
+
+Configuring the Conda environment and Python environment in DolphinScheduler
+```
+## Configure conda environment and default python environment
+cp common.properties apache-dolphinscheduler-3.1.0-bin/standalone-server/conf
+echo "export PATH=$(which conda)/bin:\\$PATH" >> apache-dolphinscheduler-3.1.0-bin/bin/env/dolphinscheduler_env.sh
+echo "export PYTHON_HOME=$(dirname $(which conda))/python" >> apache-dolphinscheduler-3.1.0-bin/bin/env/dolphinscheduler_env.sh
+```
+
+* dolphinscheduler-mlflow configuration
+When using the MLFLOW component, the dolphinscheduler-mlflow project on GitHub will be used as a reference, so if you can’t get a proper network connection, you can replace the repository source by following these steps
+
+Firstly execute git clone <https://github.com/apache/dolphinscheduler-mlflow.git>
+
+Then change the value of the ml.mlflow.preset_repository field in common.properties to the default path for the download
+
+Start DolphinScheduler
+```
+## start DolphinScheduler
+cd apache-dolphinscheduler-3.1.0-bin
+bash bin/dolphinscheduler-daemon.sh start standalone-server
+## You can view the log using the following command
+# tail -500f standalone-server/logs/dolphinscheduler-standalone.log
+```
+
+Once started, wait a moment for the service to boot up and you will be taken to the DolphinScheduler page
+
+Open http://localhost:12345/dolphinscheduler/ui and you will see the DolphinScheduler page
+
+Account: admin, Password: dolphinscheduler123
+![](/img/media/16720405454837/16720407528096.jpg)
+**MLflow**
+The MLflow Tracking Server is relatively simple to start up, and can simply be started by using the command docker run — name mlflow -p 5000:5000 -d jalonzjg/mlflow:latest
+
+Open http://localhost:5000, and you will be able to find the MLflow model and test management page
+
+![](/img/media/16720405454837/16720407653742.jpg)
+The Dockerfile for this mirror image can be found at first-example/docker-mlflow/Dockerfile
+
+**Components Introduction**
+There are 5 main types of components used in this article
+
+**SHELL component**
+The SHELL component is used to run shell-type tasks
+
+**PYTHON component**
+The PYTHON component is used to run python-type tasks
+
+**CONDITIONS component**
+CONDITIONS is a conditional node that determines which downstream task should be run based on the running status of the upstream task.
+
+**MLFLOW component**
+MLFLOW component is used to run the MLflow Project on DolphinScheduler based on the dolphinscheduler-mlflow library to implement pre-built algorithms and AutoML functionality for classification scenarios and to deploy models on the MLflow tracking server
+
+**DVC component**
+DVC component is used for data versioning in machine learning on DolphinScheduler, such as registering specific data as a specific version and downloading specific versions of data.
+
+Among the above five components
+
+* SHELL component and PYTHON component are the base components, which can run a wide range of tasks.
+* CONDITIONS are logical components that can dynamically control the logic of the workflow’s operation.
+* The MLFLOW component and DVC component are machine learning type components that can be used to facilitate the ease of use of machine learning scenario feature capabilities within the workflow.
+Machine learning workflow
+The workflow consists of three parts.
+
+* The first part is the preliminary preparation, such as data download, data versioning management repository, etc.; it is a one-time preparation.
+* The second part is the training model workflow: it includes data pre-processing, training model, and model evaluation
+* The third part is the deployment workflow, which includes model deployment and interface testing.
+
+Preliminary preparation workflow
+Create a directory to store all the process data mkdir /tmp/ds-ml-example
+
+At the beginning of the program, we need to download the test data and initialize the DVC repository for data versioning
+
+All the following commands are run in the dolphinscheduler-ml-tutorial/first-example directory
+
+Since we are submitting the workflow via pydolphinscheduler, let’s install pip install apache-dolphinscheduler==3.1.0
+
+Workflow(download-data): Downloading test data
+
+Command: pydolphinscheduler yaml -f pyds/download_data.yaml
+
+Execute the following two tasks in order
+
+1. Install-dependencies: install the python dependencies packages needed in the download script
+
+2. Download-data: download the dataset to /tmp/ds-ml-example/raw
+
+![](/img/media/16720405454837/16720408372893.jpg)
+Workflow(dvc_init_local): Initialize the dvc data versioning management repository
+
+Command: pydolphinscheduler yaml -f pyds/init_dvc_repo.yaml
+
+Execute the following tasks in order
+
+1. create_git_repo: Create an empty git repository in the local environment
+
+2. init_dvc: convert the repository to a dvc-type repository for data versioning
+
+3. condition: determine the status of the init_dvc task, if successful then execute report_success_message, otherwise execute report_error_message
+
+![](/img/media/16720405454837/16720408471707.jpg)
+Training model workflow
+In the training model part, the workflow includes data pre-processing, model training, and model evaluation.
+
+Workflow(download-data): data preprocessing
+
+Command: pydolphinscheduler yaml -f pyds/prepare_data.yaml
+
+![](/img/media/16720405454837/16720408537181.jpg)
+Perform the following tasks in order
+
+1. data_preprocessing: preprocesses the data, for demo purposes, we’ve only perform a simple truncation procedure here
+
+2. upload_data: uploads data to the repository and registers it as a specific version v1
+
+The following image shows the information in the git repository
+
+![](/img/media/16720405454837/16720408664980.jpg)
+Workflow(train_model): Training model
+
+Command: pydolphinscheduler yaml -f pyds/train_model.yaml
+
+Perform the following tasks in order
+
+1. clean_exists_data: Delete the historical data generated by potentially repeated tests /tmp/ds-ml-example/train_data
+
+2. pull_data: pull v1 data to /tmp/ds-ml-example/train_data
+
+3. train_automl: Uses the MLFLOW component’s AutoML function to train the classification model and register it with the MLflow Tracking Server, if the current model version F1 is the highest, then register it as the Production version.
+
+4. inference: import a small part of the data for batch inference using the mlflow CLI
+
+5. evaluate: Obtain the results of the inference and perform a simple evaluation of the model again, which includes the metrics of the new data, the projected label distribution, etc.
+![](/img/media/16720405454837/16720408742949.jpg)
+
+
+The results of the test and the model can be viewed in the MLflow Tracking Server ( http://localhost:5000 ) after train_automl has completed its operation.
+
+![](/img/media/16720405454837/16720408868765.jpg)
+The operation logs for the evaluation task can be viewed after it has completed its operation.
+
+![](/img/media/16720405454837/16720408963992.jpg)
+Deployment Process Workflow
+Workflow(deploy_model): Deployment model
+
+Run: pydolphinscheduler yaml -f pyds/deploy.yaml
+
+Run the following tasks in order.
+
+1. kill-server: Shut down the previous server
+
+2. deploy-model: Deploy the model
+
+3. test-server: Test the server
+
+![](/img/media/16720405454837/16720409057879.jpg)
+If this workflow is started manually, the interface will look as follows, just enter the port number and the model version number.
+
+![](/img/media/16720405454837/16720409115839.jpg)
+Integrate the workflows
+For practical use, after obtaining stable workflow iterations, the whole process needs to be linked together, for example after getting a new version, then train the model, and if it performs better, then deploy the model.
+
+For example, we switch to the production version git checkout first-example-production
+
+The differences between the two versions are:
+
+1. there is an additional workflow definition in train_and_deploy.yaml, which is used to combine the various workflows
+
+2. modify the pre-processing script to get the v2 data
+
+3. change the flag in the definition of each sub-workflow to false and let train_and_deploy.yaml run in unison.
+
+Run: pydolphinscheduler yaml -f pyds/train_and_deploy.yaml
+
+Each task in the diagram below is a sub-workflow task, which corresponds to the three workflows described above.
+
+![](/img/media/16720405454837/16720409204499.jpg)
+As below, the new version of the model, version2, is obtained after the operation and has been registered as the Production version
+
+![](/img/media/16720405454837/16720409274430.jpg)
+
diff --git a/blog/img/media/16720397220045/16720397367629.jpg b/blog/img/media/16720397220045/16720397367629.jpg
new file mode 100644
index 0000000000..38390fff76
Binary files /dev/null and b/blog/img/media/16720397220045/16720397367629.jpg differ
diff --git a/blog/img/media/16720400637574/16720400704016.jpg b/blog/img/media/16720400637574/16720400704016.jpg
new file mode 100644
index 0000000000..c900e1f6f0
Binary files /dev/null and b/blog/img/media/16720400637574/16720400704016.jpg differ
diff --git a/blog/img/media/16720400637574/16720400759248.jpg b/blog/img/media/16720400637574/16720400759248.jpg
new file mode 100644
index 0000000000..2c6dd50560
Binary files /dev/null and b/blog/img/media/16720400637574/16720400759248.jpg differ
diff --git a/blog/img/media/16720400637574/16720401185208.jpg b/blog/img/media/16720400637574/16720401185208.jpg
new file mode 100644
index 0000000000..6f65489aa6
Binary files /dev/null and b/blog/img/media/16720400637574/16720401185208.jpg differ
diff --git a/blog/img/media/16720400637574/16720401253440.jpg b/blog/img/media/16720400637574/16720401253440.jpg
new file mode 100644
index 0000000000..4d9db4352b
Binary files /dev/null and b/blog/img/media/16720400637574/16720401253440.jpg differ
diff --git a/blog/img/media/16720400637574/16720401472681.jpg b/blog/img/media/16720400637574/16720401472681.jpg
new file mode 100644
index 0000000000..ef2743f6bd
Binary files /dev/null and b/blog/img/media/16720400637574/16720401472681.jpg differ
diff --git a/blog/img/media/16720400637574/16720402083983.jpg b/blog/img/media/16720400637574/16720402083983.jpg
new file mode 100644
index 0000000000..2a6855fdb0
Binary files /dev/null and b/blog/img/media/16720400637574/16720402083983.jpg differ
diff --git a/blog/img/media/16720400637574/16720402291980.jpg b/blog/img/media/16720400637574/16720402291980.jpg
new file mode 100644
index 0000000000..e6d516381c
Binary files /dev/null and b/blog/img/media/16720400637574/16720402291980.jpg differ
diff --git a/blog/img/media/16720400637574/16720402508893.jpg b/blog/img/media/16720400637574/16720402508893.jpg
new file mode 100644
index 0000000000..66135c4c22
Binary files /dev/null and b/blog/img/media/16720400637574/16720402508893.jpg differ
diff --git a/blog/img/media/16720400637574/16720402711565.jpg b/blog/img/media/16720400637574/16720402711565.jpg
new file mode 100644
index 0000000000..75d2bcfa18
Binary files /dev/null and b/blog/img/media/16720400637574/16720402711565.jpg differ
diff --git a/blog/img/media/16720400637574/16720402758234.jpg b/blog/img/media/16720400637574/16720402758234.jpg
new file mode 100644
index 0000000000..4f2b486f97
Binary files /dev/null and b/blog/img/media/16720400637574/16720402758234.jpg differ
diff --git a/blog/img/media/16720400637574/16720403297820.jpg b/blog/img/media/16720400637574/16720403297820.jpg
new file mode 100644
index 0000000000..d2ddbb7512
Binary files /dev/null and b/blog/img/media/16720400637574/16720403297820.jpg differ
diff --git a/blog/img/media/16720400637574/16720403572773.jpg b/blog/img/media/16720400637574/16720403572773.jpg
new file mode 100644
index 0000000000..369e1c9dc2
Binary files /dev/null and b/blog/img/media/16720400637574/16720403572773.jpg differ
diff --git a/blog/img/media/16720400637574/16720403977529.jpg b/blog/img/media/16720400637574/16720403977529.jpg
new file mode 100644
index 0000000000..3c546a6219
Binary files /dev/null and b/blog/img/media/16720400637574/16720403977529.jpg differ
diff --git a/blog/img/media/16720400637574/16720404142720.jpg b/blog/img/media/16720400637574/16720404142720.jpg
new file mode 100644
index 0000000000..22673aa79a
Binary files /dev/null and b/blog/img/media/16720400637574/16720404142720.jpg differ
diff --git a/blog/img/media/16720400637574/16720404412259.jpg b/blog/img/media/16720400637574/16720404412259.jpg
new file mode 100644
index 0000000000..2c4d8d2eaa
Binary files /dev/null and b/blog/img/media/16720400637574/16720404412259.jpg differ
diff --git a/blog/img/media/16720405454837/16720405586499.jpg b/blog/img/media/16720405454837/16720405586499.jpg
new file mode 100644
index 0000000000..dff47f339b
Binary files /dev/null and b/blog/img/media/16720405454837/16720405586499.jpg differ
diff --git a/blog/img/media/16720405454837/16720407528096.jpg b/blog/img/media/16720405454837/16720407528096.jpg
new file mode 100644
index 0000000000..cf4a1c2ae5
Binary files /dev/null and b/blog/img/media/16720405454837/16720407528096.jpg differ
diff --git a/blog/img/media/16720405454837/16720407653742.jpg b/blog/img/media/16720405454837/16720407653742.jpg
new file mode 100644
index 0000000000..7483499fd4
Binary files /dev/null and b/blog/img/media/16720405454837/16720407653742.jpg differ
diff --git a/blog/img/media/16720405454837/16720408372893.jpg b/blog/img/media/16720405454837/16720408372893.jpg
new file mode 100644
index 0000000000..e4a4dc49e7
Binary files /dev/null and b/blog/img/media/16720405454837/16720408372893.jpg differ
diff --git a/blog/img/media/16720405454837/16720408471707.jpg b/blog/img/media/16720405454837/16720408471707.jpg
new file mode 100644
index 0000000000..435a69bc29
Binary files /dev/null and b/blog/img/media/16720405454837/16720408471707.jpg differ
diff --git a/blog/img/media/16720405454837/16720408537181.jpg b/blog/img/media/16720405454837/16720408537181.jpg
new file mode 100644
index 0000000000..22c055187c
Binary files /dev/null and b/blog/img/media/16720405454837/16720408537181.jpg differ
diff --git a/blog/img/media/16720405454837/16720408664980.jpg b/blog/img/media/16720405454837/16720408664980.jpg
new file mode 100644
index 0000000000..d2a3e087c4
Binary files /dev/null and b/blog/img/media/16720405454837/16720408664980.jpg differ
diff --git a/blog/img/media/16720405454837/16720408742949.jpg b/blog/img/media/16720405454837/16720408742949.jpg
new file mode 100644
index 0000000000..8c403a6582
Binary files /dev/null and b/blog/img/media/16720405454837/16720408742949.jpg differ
diff --git a/blog/img/media/16720405454837/16720408868765.jpg b/blog/img/media/16720405454837/16720408868765.jpg
new file mode 100644
index 0000000000..d38346a64f
Binary files /dev/null and b/blog/img/media/16720405454837/16720408868765.jpg differ
diff --git a/blog/img/media/16720405454837/16720408963992.jpg b/blog/img/media/16720405454837/16720408963992.jpg
new file mode 100644
index 0000000000..538f2f239e
Binary files /dev/null and b/blog/img/media/16720405454837/16720408963992.jpg differ
diff --git a/blog/img/media/16720405454837/16720409057879.jpg b/blog/img/media/16720405454837/16720409057879.jpg
new file mode 100644
index 0000000000..df8deba7a0
Binary files /dev/null and b/blog/img/media/16720405454837/16720409057879.jpg differ
diff --git a/blog/img/media/16720405454837/16720409115839.jpg b/blog/img/media/16720405454837/16720409115839.jpg
new file mode 100644
index 0000000000..b6899fc11b
Binary files /dev/null and b/blog/img/media/16720405454837/16720409115839.jpg differ
diff --git a/blog/img/media/16720405454837/16720409204499.jpg b/blog/img/media/16720405454837/16720409204499.jpg
new file mode 100644
index 0000000000..9e5e2d180a
Binary files /dev/null and b/blog/img/media/16720405454837/16720409204499.jpg differ
diff --git a/blog/img/media/16720405454837/16720409274430.jpg b/blog/img/media/16720405454837/16720409274430.jpg
new file mode 100644
index 0000000000..6cabb7a729
Binary files /dev/null and b/blog/img/media/16720405454837/16720409274430.jpg differ
diff --git a/config/blog/en-us/release.json b/config/blog/en-us/release.json
index cca5b78b9d..2fb3a43560 100644
--- a/config/blog/en-us/release.json
+++ b/config/blog/en-us/release.json
@@ -1,5 +1,11 @@
 
 {
+  "Apache_dolphinScheduler_3.1.2": {
+    "title": "Apache DolphinScheduler releases version 3.1.2 with Python API optimizations",
+    "author": "Leonard Nie",
+    "dateStr": "2022-12-24",
+    "desc": "Recently, Apache DolphinScheduler released version 3.1.2........ "
+  },
   "Apache_dolphinScheduler_3.0.3": {
     "title": "DolphinScheduler released version 3.0.3, focusing on fixing 6 bugs",
     "author": "Leonard Nie",
diff --git a/config/blog/en-us/tech.json b/config/blog/en-us/tech.json
index 11fa91973a..45412d474e 100644
--- a/config/blog/en-us/tech.json
+++ b/config/blog/en-us/tech.json
@@ -1,4 +1,5 @@
 {
+
   "DolphinScheduler_python_api_ci_cd": {
     "title": "DolphinScheduler Python API CI/CD",
     "author": "Leonard Nie",
@@ -11,6 +12,12 @@
     "dateStr": "2022-12-10",
     "desc": "Apache DolphinScheduler has officially launched on the AWS EC2 AMI application marketplace... "
   },
+  "Quick_Start_with_Apache_DolphinScheduler_Machine_Learning_Workflow": {
+    "title": "Quick Start with Apache DolphinScheduler Machine Learning Workflow",
+    "author": "Leonard Nie",
+    "dateStr": "2022-12-5",
+    "desc": "With the release of Apache DolphinScheduler 3.1.0, many AI components... "
+  },
   "How_can_more_people_benefit_from_big_data": {
     "title": "How can more people benefit from big data?",
     "author": "Leonard Nie",
diff --git a/config/blog/en-us/user.json b/config/blog/en-us/user.json
index ea0e7d697f..e902f0fd52 100644
--- a/config/blog/en-us/user.json
+++ b/config/blog/en-us/user.json
@@ -1,4 +1,11 @@
-{
+{ "Application_transformation_of_the_FinTech_data_center_based_on_DolphinScheduler": {
+  "title": "Application transformation of the FinTech data center based on DolphinScheduler",
+  "author": "Leonard Nie",
+  "dateStr": "2022-12-6",
+  "desc": "On Apache DolphinScheduler Meetup last week, ... ",
+  "img": "/img/media/16720400637574/16720400704016.jpg",
+  "logo": ""
+},
   "How_did_Yili_explore_a_path_for_digital_transformation_based_on_DolphinScheduler": {
     "title": "How did Yili explore a “path” for digital transformation based on DolphinScheduler?",
     "author": "Debra Chen",