You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by gi...@apache.org on 2021/06/05 13:37:15 UTC

[dolphinscheduler-website] branch asf-site updated: Automated deployment: 3a185130f9880217a5a96c3e389c09c1ff563e7a

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/dolphinscheduler-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 3363474  Automated deployment: 3a185130f9880217a5a96c3e389c09c1ff563e7a
3363474 is described below

commit 3363474d32bb9f29bd3acb048bd2de088a51cec3
Author: github-actions[bot] <gi...@users.noreply.github.com>
AuthorDate: Sat Jun 5 13:37:04 2021 +0000

    Automated deployment: 3a185130f9880217a5a96c3e389c09c1ff563e7a
---
 build/{blog.0d6b9bb.js => blog.4bb08c5.js} |   2 +-
 en-us/blog/Json_Split.html                 | 117 +++++++++++++++++++++++++++++
 en-us/blog/Json_Split.json                 |   6 ++
 en-us/blog/index.html                      |   4 +-
 zh-cn/blog/index.html                      |   2 +-
 zh-cn/blog/json_split.html                 |   4 +-
 zh-cn/blog/json_split.json                 |   2 +-
 7 files changed, 130 insertions(+), 7 deletions(-)

diff --git a/build/blog.0d6b9bb.js b/build/blog.4bb08c5.js
similarity index 73%
rename from build/blog.0d6b9bb.js
rename to build/blog.4bb08c5.js
index 28a819b..54b231d 100644
--- a/build/blog.0d6b9bb.js
+++ b/build/blog.4bb08c5.js
@@ -1 +1 @@
-webpackJsonp([1],{1:function(e,t){e.exports=React},2:function(e,t){e.exports=ReactDOM},401:function(e,t,n){e.exports=n(402)},402:function(e,t,n){"use strict";function r(e){return e&&e.__esModule?e:{default:e}}function o(e,t){if(!(e instanceof t))throw new TypeError("Cannot call a class as a function")}function a(e,t){if(!e)throw new ReferenceError("this hasn't been initialised - super() hasn't been called");return!t||"object"!=typeof t&&"function"!=typeof t?e:t}function l(e,t){if("functi [...]
\ No newline at end of file
+webpackJsonp([1],{1:function(e,t){e.exports=React},2:function(e,t){e.exports=ReactDOM},401:function(e,t,n){e.exports=n(402)},402:function(e,t,n){"use strict";function r(e){return e&&e.__esModule?e:{default:e}}function o(e,t){if(!(e instanceof t))throw new TypeError("Cannot call a class as a function")}function a(e,t){if(!e)throw new ReferenceError("this hasn't been initialised - super() hasn't been called");return!t||"object"!=typeof t&&"function"!=typeof t?e:t}function l(e,t){if("functi [...]
\ No newline at end of file
diff --git a/en-us/blog/Json_Split.html b/en-us/blog/Json_Split.html
new file mode 100644
index 0000000..72184ed
--- /dev/null
+++ b/en-us/blog/Json_Split.html
@@ -0,0 +1,117 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
+  <meta name="keywords" content="Json_Split">
+  <meta name="description" content="Json_Split">
+  <title>Json_Split</title>
+  <link rel="shortcut icon" href="/img/favicon.ico">
+  <link rel="stylesheet" href="/build/vendor.c5ba65d.css">
+  <link rel="stylesheet" href="/build/blog.md.fd8b187.css">
+</head>
+<body>
+  <div id="root"><div class="blog-detail-page" data-reactroot=""><header class="header-container header-container-dark"><div class="header-body"><a href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div class="search search-dark"><span class="icon-search"></span></div><span class="language-switch language-switch-dark">中</span><div class="header-menu"><img class="header-menu-toggle" src="/img/system/menu_white.png"/><div><ul class="ant-menu whiteClass ant-menu-lig [...]
+<h3>The Background</h3>
+<p>Currently DolphinScheduler saves tasks and relationships in process as big json to the process_definition_json field in the process_definiton table in the database. If a process is large, for example, with 1000 tasks, the json field becomes very large and needs to be parsed when using the json, which is very performance intensive and the tasks cannot be reused, so the community plans to start a json splitting project. Encouragingly, we have now completed most of this work, so a summar [...]
+<h3>Summarization</h3>
+<p>The json split project was started on 2021-01-12 and the main development was initially completed by 2021-04-25. The code has been merged into the dev branch. Thanks to lenboo, JinyLeeChina, simon824 and wen-hemin for coding.</p>
+<p>The main changes, as well as the contributions, are as follows:</p>
+<ul>
+<li>Code changes 12793 lines</li>
+<li>168 files modified/added</li>
+<li>145 Commits in total</li>
+<li>There were 85 PRs</li>
+</ul>
+<h3>拆分方案回顾</h3>
+<p><img src="https://user-images.githubusercontent.com/42576980/117598604-b1ad8e80-b17a-11eb-9d99-d593fce7bab6.png" alt="拆分方案"></p>
+<ul>
+<li>[ ] When the api module performs a save operation</li>
+</ul>
+<ol>
+<li>The process definition is saved to process_definition (main table) and process_definition_log (log table), both tables hold the same data and the process definition version is 1</li>
+<li>The task definition table is saved to task_definition (main table) and task_definition_log (log table), also saving the same data, with task definition version 1</li>
+<li>process task relationships are stored in the process_task_relation (main table) and process_task_relation_log (log table), which holds the code and version of the process, as tasks are organised through the process and the dag is drawn in terms of the process. The current node of the dag is also known by its post_task_code and post_task_version, the predecessor dependency of this node is identified by pre_task_code and pre_task_version, if there is no dependency, the pre_task_code an [...]
+</ol>
+<ul>
+<li>[ ] When the api module performs an update operation, the process definition and task definition update the main table data directly, and the updated data is inserted into the log table. The main table is deleted and then inserted into the new relationship, and the log table is inserted directly into the new relationship.</li>
+<li>[ ] When the api module performs a delete operation, the process definition, task definition and relationship table are deleted directly from the master table, leaving the log table data unchanged.</li>
+<li>[ ] When the api module performs a switch operation, the corresponding version data in the log table is overwritten directly into the main table.</li>
+</ul>
+<h3>Json Access Solutions</h3>
+<p><img src="https://user-images.githubusercontent.com/42576980/117598643-c9851280-b17a-11eb-9a6e-c81ee083b09c.png" alt="json"></p>
+<ul>
+<li>
+<p>[ ] In the current phase of the splitting scheme, the api module controller layer remains unchanged and the incoming big json is still mapped to ProcessData objects in the service layer. insert or update operations are done in the public Service module through the ProcessService. saveProcessDefiniton() entry in the public Service module, which saves the database operations in the order of task_definition, process_task_relation, process_definition. When saving, the task is changed if i [...]
+</li>
+<li>
+<p>[ ] The data is assembled in the public Service module through the ProcessService.genTaskNodeList() entry, or assembled into a ProcessData object, which in turn generates a json to return</p>
+</li>
+<li>
+<p>[ ] The Server module (Master) also gets the TaskNodeList through the public Service module ProcessService.genTaskNodeList() to generate the dispatch dag, which puts all the information about the current task into the MasterExecThread. readyToSubmitTaskQueue queue in order to generate taskInstance, dispatch to worker</p>
+</li>
+</ul>
+<h2>Phase 2 Planning</h2>
+<h3>API / UI module transformation</h3>
+<ul>
+<li>[ ] The processDefinition interface requests a back-end replacement for processDefinitonCode via processDefinitionId</li>
+<li>[ ] Support for separate definition of task, the current task is inserted and modified through the workflow, Phase 2 needs to support separate definition</li>
+<li>[ ] Frontend and backend controller layer json splitting, Phase 1 has completed the api module service layer to dao json splitting, Phase 2 needs to complete the front-end and controller layer json splitting</li>
+</ul>
+<h3>server module retrofit</h3>
+<ul>
+<li>[ ] Replace process_definition_id with process_definition_code in t_ds_command and t_ds_error_command、t_ds_schedules</li>
+<li>[ ] Generating a taskInstance process transformation</li>
+</ul>
+<p>The current process_instance is generated from the process_definition and schedules and command tables, while the taskInstance is generated from the MasterExecThread. readyToSubmitTaskQueue queue, and the data in the queue comes from the dag object. At this point, the queue and dag hold all the information about the taskInstance, which is very memory intensive. It can be modified to the following data flow, where the readyToSubmitTaskQueue queue and dag hold the task code and version  [...]
+<p><img src="https://user-images.githubusercontent.com/42576980/117598659-d3a71100-b17a-11eb-8fe1-8725299510e6.png" alt="server"></p>
+<hr>
+<p><strong>Appendix: The snowflake algorithm</strong></p>
+<p><strong>snowflake:</strong> is an algorithm for generating distributed, drama-wide unique IDs called <strong>snowflake</strong>, which was created by Twitter and used for tweeting IDs.</p>
+<p>A Snowflake ID has 64 bits. the first 41 bits are timestamps, representing the number of milliseconds since the selected period. The next 10 bits represent the computer ID to prevent conflicts. The remaining 12 bits represent the serial number of the generated ID on each machine, which allows multiple Snowflake IDs to be created in the same millisecond. snowflakeIDs are generated based on time and can therefore be ordered by time. In addition, the generation time of an ID can be infer [...]
+<ol>
+<li>
+<p><strong>Structure of the snowflake algorithm:</strong></p>
+<p><img src="https://github.com/apache/dolphinscheduler-website/blob/master/img/JsonSplit/snowflake.png?raw=true" alt="snowflake"></p>
+<p>It is divided into 5 main parts.</p>
+<ol>
+<li>is 1 bit: 0, this is meaningless.</li>
+<li>is 41 bits: this represents the timestamp</li>
+<li>is 10 bits: the room id, 0000000000, as 0 is passed in at this point.</li>
+<li>is 12 bits: the serial number, which is the serial number of the ids generated at the same time during the millisecond on a machine in a certain room, 0000 0000 0000.</li>
+</ol>
+<p>Next we will explain the four parts:</p>
+</li>
+</ol>
+<p><strong>1 bit, which is meaningless:</strong></p>
+<p>Because the first bit in binary is a negative number if it is 1, but the ids we generate are all positive, so the first bit is always 0.</p>
+<p><strong>41 bit: This is a timestamp in milliseconds.</strong></p>
+<p>41 bit can represent as many numbers as 2^41 - 1, i.e. it can identify 2 ^ 41 - 1 milliseconds, which translates into 69 years of time.</p>
+<p><strong>10 bit: Record the work machine ID, which represents this service up to 2 ^ 10 machines, which is 1024 machines.</strong></p>
+<p>But in 10 bits 5 bits represent the machine room id and 5 bits represent the machine id, which means up to 2 ^ 5 machine rooms (32 machine rooms), each of which can represent 2 ^ 5 machines (32 machines), which can be split up as you wish, for example by taking out 4 bits to identify the service number and the other 6 bits as the machine number. This can be combined in any way you like.</p>
+<p><strong>12 bit: This is used to record the different ids generated in the same millisecond.</strong></p>
+<p>12 bit can represent the maximum integer of 2 ^ 12 - 1 = 4096, that is, can be distinguished from 4096 different IDs in the same milliseconds with the numbers of the 12 BIT representative. That is, the maximum number of IDs generated by the same machine in the same milliseconds is 4096</p>
+<p>In simple terms, if you have a service that wants to generate a globally unique id, you can send a request to a system that has deployed the SnowFlake algorithm to generate the unique id. The SnowFlake algorithm then receives the request and first generates a 64 bit long id using binary bit manipulation, the first bit of the 64 bits being meaningless.  This is followed by 41 bits of the current timestamp (in milliseconds), then 10 bits to set the machine id, and finally the last 12 bi [...]
+<p>The characteristics of SnowFlake are:</p>
+<ol>
+<li>the number of milliseconds is at the high end, the self-incrementing sequence is at the low end, and the entire ID is trended incrementally.</li>
+<li>it does not rely on third-party systems such as databases, and is deployed as a service for greater stability and performance in generating IDs.</li>
+<li>the bit can be allocated according to your business characteristics, very flexible.</li>
+</ol>
+</section><footer class="footer-container"><div class="footer-body"><div><h3>About us</h3><h4>Do you need feedback? Please contact us through the following ways.</h4></div><div class="contact-container"><ul><li><img class="img-base" src="/img/emailgray.png"/><img class="img-change" src="/img/emailblue.png"/><a href="/en-us/community/development/subscribe.html"><p>Email List</p></a></li><li><img class="img-base" src="/img/twittergray.png"/><img class="img-change" src="/img/twitterblue.png [...]
+  <script src="//cdn.jsdelivr.net/npm/react@15.6.2/dist/react-with-addons.min.js"></script>
+  <script src="//cdn.jsdelivr.net/npm/react-dom@15.6.2/dist/react-dom.min.js"></script>
+  <script>window.rootPath = '';</script>
+  <script src="/build/vendor.d44685f.js"></script>
+  <script src="/build/blog.md.57874be.js"></script>
+  <script>
+    var _hmt = _hmt || [];
+    (function() {
+      var hm = document.createElement("script");
+      hm.src = "https://hm.baidu.com/hm.js?4e7b4b400dd31fa015018a435c64d06f";
+      var s = document.getElementsByTagName("script")[0];
+      s.parentNode.insertBefore(hm, s);
+    })();
+  </script>
+</body>
+</html>
\ No newline at end of file
diff --git a/en-us/blog/Json_Split.json b/en-us/blog/Json_Split.json
new file mode 100644
index 0000000..8c67050
--- /dev/null
+++ b/en-us/blog/Json_Split.json
@@ -0,0 +1,6 @@
+{
+  "filename": "Json_Split.md",
+  "__html": "<h2>Why did we split the big json that holds the tasks and relationships in the DolphinScheduler workflow definition?</h2>\n<h3>The Background</h3>\n<p>Currently DolphinScheduler saves tasks and relationships in process as big json to the process_definition_json field in the process_definiton table in the database. If a process is large, for example, with 1000 tasks, the json field becomes very large and needs to be parsed when using the json, which is very performance inten [...]
+  "link": "/dist/en-us/blog/Json_Split.html",
+  "meta": {}
+}
\ No newline at end of file
diff --git a/en-us/blog/index.html b/en-us/blog/index.html
index 18b62c5..bcd2ac3 100644
--- a/en-us/blog/index.html
+++ b/en-us/blog/index.html
@@ -11,12 +11,12 @@
   <link rel="stylesheet" href="/build/blog.acc2955.css">
 </head>
 <body>
-  <div id="root"><div class="blog-list-page" data-reactroot=""><header class="header-container header-container-dark"><div class="header-body"><a href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div class="search search-dark"><span class="icon-search"></span></div><span class="language-switch language-switch-dark">中</span><div class="header-menu"><img class="header-menu-toggle" src="/img/system/menu_white.png"/><div><ul class="ant-menu whiteClass ant-menu-light [...]
+  <div id="root"><div class="blog-list-page" data-reactroot=""><header class="header-container header-container-dark"><div class="header-body"><a href="/en-us/index.html"><img class="logo" src="/img/hlogo_white.svg"/></a><div class="search search-dark"><span class="icon-search"></span></div><span class="language-switch language-switch-dark">中</span><div class="header-menu"><img class="header-menu-toggle" src="/img/system/menu_white.png"/><div><ul class="ant-menu whiteClass ant-menu-light [...]
   <script src="//cdn.jsdelivr.net/npm/react@15.6.2/dist/react-with-addons.min.js"></script>
   <script src="//cdn.jsdelivr.net/npm/react-dom@15.6.2/dist/react-dom.min.js"></script>
   <script>window.rootPath = '';</script>
   <script src="/build/vendor.d44685f.js"></script>
-  <script src="/build/blog.0d6b9bb.js"></script>
+  <script src="/build/blog.4bb08c5.js"></script>
   <script>
     var _hmt = _hmt || [];
     (function() {
diff --git a/zh-cn/blog/index.html b/zh-cn/blog/index.html
index b2d21ed..d9cd477 100644
--- a/zh-cn/blog/index.html
+++ b/zh-cn/blog/index.html
@@ -16,7 +16,7 @@
   <script src="//cdn.jsdelivr.net/npm/react-dom@15.6.2/dist/react-dom.min.js"></script>
   <script>window.rootPath = '';</script>
   <script src="/build/vendor.d44685f.js"></script>
-  <script src="/build/blog.0d6b9bb.js"></script>
+  <script src="/build/blog.4bb08c5.js"></script>
   <script>
     var _hmt = _hmt || [];
     (function() {
diff --git a/zh-cn/blog/json_split.html b/zh-cn/blog/json_split.html
index 75d6048..f38e055 100644
--- a/zh-cn/blog/json_split.html
+++ b/zh-cn/blog/json_split.html
@@ -80,7 +80,7 @@
 <p><strong>雪花算法(snowflake):</strong> 是一种生成分布式全剧唯一 ID 的算法,生成的 ID 称为 <strong>snowflake</strong>,这种算法是由 Twitter 创建,并用于推文的 ID。</p>
 <p>一个 Snowflake ID 有 64 bit。前 41 bit 是时间戳,表示了自选定的时期以来的毫秒数。 接下来的 10 bit 代表计算机 ID,防止冲突。 其余 12 bit 代表每台机器上生成 ID 的序列号,这允许在同一毫秒内创建多个 Snowflake ID。SnowflakeID 基于时间生成,故可以按时间排序。此外,一个 ID 的生成时间可以由其自身推断出来,反之亦然。该特性可以用于按时间筛选 ID,以及与之联系的对象。</p>
 <p><strong>雪花算法的结构:</strong></p>
-<p><img src="https://github.com/QuakeWang/incubator-dolphinscheduler-website/blob/add-blog/img/JsonSplit/snowflake.png?raw=true" alt="snowflake"></p>
+<p><img src="https://github.com/apache/dolphinscheduler-website/blob/master/img/JsonSplit/snowflake.png?raw=true" alt="snowflake"></p>
 <p>主要分为 5 个部分:</p>
 <ol>
 <li>是 1 个 bit:0,这个是无意义的;</li>
@@ -102,7 +102,7 @@
 <ol>
 <li>毫秒数在高位,自增序列在低位,整个 ID 都是趋势递增的。</li>
 <li>不依赖数据库等第三方系统,以服务的方式部署,稳定性更高,生成 ID 的性能也是非常高的。</li>
-<li>可以根据自身业务特性分配 bi t位,非常灵活。</li>
+<li>可以根据自身业务特性分配 bit 位,非常灵活。</li>
 </ol>
 </section><footer class="footer-container"><div class="footer-body"><div><h3>联系我们</h3><h4>有问题需要反馈?请通过以下方式联系我们。</h4></div><div class="contact-container"><ul><li><img class="img-base" src="/img/emailgray.png"/><img class="img-change" src="/img/emailblue.png"/><a href="/zh-cn/community/development/subscribe.html"><p>邮件列表</p></a></li><li><img class="img-base" src="/img/twittergray.png"/><img class="img-change" src="/img/twitterblue.png"/><a href="https://twitter.com/dolphinschedule"><p>Twitt [...]
   <script src="//cdn.jsdelivr.net/npm/react@15.6.2/dist/react-with-addons.min.js"></script>
diff --git a/zh-cn/blog/json_split.json b/zh-cn/blog/json_split.json
index 1db2814..1051d04 100644
--- a/zh-cn/blog/json_split.json
+++ b/zh-cn/blog/json_split.json
@@ -1,6 +1,6 @@
 {
   "filename": "json_split.md",
-  "__html": "<h2>为什么要把 DolphinScheduler 工作流定义中保存任务及关系的大 json 给拆了?</h2>\n<h3>背景</h3>\n<p>当前 DolphinScheduler 的工作流中的任务及关系保存时是以大 json 的方式保存到数据库中 process_definiton 表的 process_definition_json 字段,如果某个工作流很大比如有 1000 个任务,这个 json 字段也就随之变得非常大,在使用时需要解析 json,非常耗费性能,且任务没法重用,故社区计划启动 json 拆分项目。可喜的是目前我们已经完成了这个工作的大部分,因此总结一下,供大家参考学习。</p>\n<h3>总结</h3>\n<p>json split 项目从 2021-01-12 开始启动,到 2021-04-25 初步完成主要开发。代码已合入 dev 分支。感谢 lenboo、JinyLeeChina、simon824、wen-hemin 四位伙伴参与 coding。</p>\n<p>主要变化以及贡献如下:</p>\n<ul>\n [...]
+  "__html": "<h2>为什么要把 DolphinScheduler 工作流定义中保存任务及关系的大 json 给拆了?</h2>\n<h3>背景</h3>\n<p>当前 DolphinScheduler 的工作流中的任务及关系保存时是以大 json 的方式保存到数据库中 process_definiton 表的 process_definition_json 字段,如果某个工作流很大比如有 1000 个任务,这个 json 字段也就随之变得非常大,在使用时需要解析 json,非常耗费性能,且任务没法重用,故社区计划启动 json 拆分项目。可喜的是目前我们已经完成了这个工作的大部分,因此总结一下,供大家参考学习。</p>\n<h3>总结</h3>\n<p>json split 项目从 2021-01-12 开始启动,到 2021-04-25 初步完成主要开发。代码已合入 dev 分支。感谢 lenboo、JinyLeeChina、simon824、wen-hemin 四位伙伴参与 coding。</p>\n<p>主要变化以及贡献如下:</p>\n<ul>\n [...]
   "link": "/dist/zh-cn/blog/json_split.html",
   "meta": {}
 }
\ No newline at end of file