You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2021/07/08 15:14:18 UTC
[GitHub] [dolphinscheduler] mgsky1 commented on issue #5752: [Feature][Module Name] Add Blocking Task

mgsky1 commented on issue #5752:
URL: https://github.com/apache/dolphinscheduler/issues/5752#issuecomment-876523723


   [English](#en) | [中文](#zh-ch)
   
   ## <a id='en'>Overview</a>
   
   First of all, I express my deep gratitude to @simsicon, @dailidong, @Jave-Chen and @shaojwu. Thank you for your interest in this feature and your advice!
   
   So for, the feedback received is more in favor of Solution I (modified based on existing nodes), but **I think Solution II (adding new node type) would be more suitable for DS**. I will discuss why I think the second solution is more suitable from the following two aspects.
   
   - Functionality (From basic functions and long-term plans)
   - Cost (From users and developers)
   
   ## Functionality
   
   ### Basic Functions
   
   In order to choose a solution properly, in my opinion, the baseline is reaching the basic functions. The goal of blocking task is that in the execution process in a workflow, **if encountered blocking task and the task met blocking requirements, the whole workflow should have been paused, otherwise, continued.**
   
   Obviously, **two of solutions will reach that goal.** In addition, in some sicuation, solution II will be better than solution I. Let me show you.
   
   >  Suppose your boss have a demond. There is a blocking task in workflow. And there are three independent sub_processes in front of the blocking task. Now, your boss tells you that if any two of the three sub_processes run successfully, the workflow continue, or else, pause.
   
   If I choose solution I, I will use condition node.  In the current version of condition node, I must have two branches, one for  success, one for failure, and two branch must be different. For completing the blocking task, **I must have a useless branch** (echo helloword in shell for example ) to meet my boss requirement. Maybe you will consider to modify current condition node, but I think it is not suitable. Because you **break the condition node function** which is used to turn the workflow into the user expected direction.
   
   If I choose solution II, everything becomes easy. As for this new type of node (Let me call it blocking node), we do not have limitations. This node has logic judgement function and it can support one successor branch or multi-successors branchs. In this way, **it does not destroy the functionality of existing nodes and accomplishes our blocking goal.**
   
   ### Long-term plans
   
   In solution I, blocking task is **used as a special node attribute**. If one day, Apache releases a new bigdata component and DS wants to support it, we still need to add the blocking option when coding related web pages. But in solution II, put things right once and for all.
   
   ## Cost
   
   ### User
   
   To be honest, the user cost is similar in both solutions. User will know what blocking task is and how to setup. @simsicon says that solution II will cause the modification of DAG. Actually, neithor solution I or solution II needs to modify the DAG, but the degree is different.
   
   ### Developer
   
   Coding in solution II maybe harder. But I believe that is worthy. Beacuse solution II is more flexiable and competent than solution I, just like I menthoned in the above sicuation.  For another instance, Apple releases iPad Air 4 and iPad Pro 2021. iPad Air 4 (256GB) sales 749 dollars and iPad Pro (128GB, 11 inch) sales 749 dollars. They differ by $50. But with that extra money you can enjoy the ultimate iPad experience while meeting your daily needs, why not pay that extra price? Blocking task is the same.
   
   In conclution, I think solution II is better.
   
   Welcome to express your opinion!
   
   ## <a id='zh-ch'>概览</a>
   
   *首先感谢社区小伙伴[@simsicon](https://github.com/simsicon)  、[@dailidong](https://github.com/dailidong)、[@Jave-Chen](https://github.com/Jave-Chen)、@shaojwu对本特性的关注，感谢你们为本提案提供意见！*
   
   虽然根据目前收到的反馈，赞成Solution I（基于现有节点修改的）会多一些，但是综合考虑，**我觉得Solution II（新增节点类型）会更适合DS**，我会从以下两个方面论述为什么我觉得第二个方案更合适。
   
   - 功能性方面（基础功能、长远打算）
   - 成本方面（用户成本、开发者成本）
   
   ## 功能性方面
   
   ### 基础功能
   
   衡量一个方案，最最基本的，我认为是功能要达到我们的预期。阻断功能的目标是：在工作流执行过程中，如果遇到阻断任务，且满足用户设定的阻断条件，那么**工作流就暂停执行**，通知用户干预；如果不满足阻断条件，就继续执行。
   
   很显然，**两种方案都能够达到这个基本目标**。但是，对于第二种方案来说，第一种方案能做的，它可以做，且可以做得更好！试考虑以下场景：
   
   >  一天老板提了一个需求，要求我在工作流的中设置一个阻断任务，它的上游有3个分支，对应3个独立的子流程，这个阻断任务需要检查上游3个任务的运行情况，其中任意两个任务检查通过就可以继续执行（后续是单分支的），否则需要对工作流进行暂停。
   
   如果使用方案一，我必然要使用条件节点，但是目前DS中的条件节点，成功分支和失败分支都需要设置，且它们必须不一样，这样DAG才可以保存。我为了实现这个阻断任务，**我必须建立一个无用的分支**（例如使用Shell输出一个hello world）来满足需求。你可能会考虑修改条件节点，让其只有一个分支的情况下才能保存，但是我认为这么做**破坏了条件节点进行分支流转的功能**。
   
   如果使用方案二，一切就变得简单了，对于这个新增节点（我们暂且叫它阻断节点吧），我们就没有这个限制了，这个节点具有逻辑判断功能，我既可以让阻断节点支持后继多分支，也可以支持单分支（待研究）。这么做，**即不会破坏现有节点的功能，又完成了我们的阻断目标**。
   
   ### 长远打算
   
   从长远来看，第一种方案，相当于把阻断任务当作了节点的属性进行**捆绑**。假如有一天Apache又发布了一款大数据组件，DS想要支持，在新节点前端编写的时候，依然需要增设阻断的选项。但是使用方案二，就不会存在捆绑问题，一劳永逸，且节点之间各司其职。
   
   ## 成本方面
   
   ### 用户成本
   
   用户成本的话，老实说，两个方案用户成本都差不多。用户都要认识什么是阻断任务，如何去设置阻断任务。[@simsicon](https://github.com/simsicon) 有在issue中提到使用方案二导致的DAG修改问题，其实不论哪一种方案，DAG都是需要一定程度的修改，只不过是修改程度上的区别。
   
   ### 开发成本
   
   老实说，预估第二种方案的代码量会比第一种多一些，**但是我觉得这是值得的**。因为第二种方案相比第一种更加灵活，能够更好地完成阻断任务。就好像苹果推出了iPad air 4和iPad Pro 2021，iPad air 4 256GB售价749美元， iPad Pro 2021 11英寸128GB售价799美元，二者相差50$，**但是多加这些钱可以在满足日常需求的情况下，享受到iPad的极致体验，为什么不多付出这些代价呢**？阻断任务也是一样的。
   
   综上，我更推荐第二种方案。
   
   欢迎表达您的意见！


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org