You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2019/11/21 09:03:30 UTC

[GitHub] [incubator-dolphinscheduler] wangsvip opened a new issue #1307: 项目间任务依赖关系/Inter-project task dependencies

wangsvip opened a new issue #1307: 项目间任务依赖关系/Inter-project task dependencies
URL: https://github.com/apache/incubator-dolphinscheduler/issues/1307
 
 
   
   举例:做数据仓库,就建立这一个账户,但是一个账户分了不同的小组再做,每个小组建立自己的项目,A小组的任务执行完触发B小组的任务,B再触发C小组的任务,这样做的好处就是避免了像之前A/B/C小组的任务都是定时任务,A:1点执行,B:2点执行,C:3点执行,如果说A的数据今天没过来,B到了2点执行了,C也到了3点执行了,没有数据这样执行不就是浪费集群资源么?
   
   再举例:数据仓库分为ods、dw、dm三层,每一层都是一个项目,我在一个账户下分别建立这三个项目,每一层都由一批同事在做,我需要ods层拿到今天的数据然后触发dw层的清洗再触发dm层的计算,每一层都是环环相扣的,需要依赖的,而且是跨项目依赖,而不是你们推荐的那样,扭成一锅粥,非要建到一个工作流下,odsTask---->dwTast----->dmTask,这种做法不就乱套了么,ods的同事写完工作流,dw的同事再后面补上一层,dm的同事再补上一层?这显然是错的,一个公司里都是很多小组在做,像这种跨小组任务怎么可能扭到一个工作流里面。
   
   目前解决方案:在dolphinscheduler没有任务依赖时,ods层数据进来的时候会在一个文件下建立状态文件,或者是mysql中建一张状态表来记录数据处理完的标识,然后下一层计算先去判断上一层输出的状态标志,再决定当前任务的执行与否。
   
   总结:公司都是有规矩的,都是分组做事,不可能把任务扭到一起!
   
   ================================================================
   
   Example: data warehouse, this can build up an account, but an account points to do A different group, each group set up their own project, A team task execution of the trigger B team tasks, and then trigger group C B task, the advantage is avoided as before A/B/C group tasks are timing task, A: 1, B: C: on 2 points, 3 points, if A data didn't come today, carried out by 2 PM, B C also carried out by 3 PM, no data that is not A waste of cluster resources?
   
   For example:Data warehouse is divided into ods, dw, dm three layers, each layer is a project, I respectively set up under an account this three projects, each layer by a group of colleagues in do, I need to get today's data and then trigger the ods layer dw cleaning to trigger the dm layer calculation, each layer is linked together, need to rely on, and it is cross project dependencies, rather than you recommended, twist a mess, have to build to a working flow, odsTask -- -- -- -- > dwTast -- -- -- -- -- > dmTask, this action is not were mixed, ods colleagues to finish the workflow,Dw colleagues add another layer, dm colleagues add another layer?This is obviously wrong, as there are many teams in a company, how can a cross-team task like this be twisted into a workflow.
   
   Current solution: when dolphinscheduler does not have a task dependency, the ods layer will create a status file in a file when the ods layer data comes in, or a status table in mysql to record the status of the data processed, and the next layer will determine the status of the previous layer before deciding whether the current task is executed or not.
   
   Conclusion: the company has rules, are working in groups, impossible to twist the task together!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services