You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2021/07/30 16:23:09 UTC

[GitHub] [dolphinscheduler] reele opened a new issue #5925: [Improvement][Generic Template API | Standalone import tool] About migrating to DolphinScheduler

reele opened a new issue #5925:
URL: https://github.com/apache/dolphinscheduler/issues/5925


   **About migrating to DolphinScheduler**
   
   It is believed that many users are attracted by the functions of DS, but not everyone can easily migrate the original scheduling configuration to DS, especially the scheduling platform with thousands of jobs such as data warehouse.
   
   Most users really need a tool for easy data migration, and I am one of them, and I need to migrate more than 6000 jobs.
   
   So after reading most of the code, I wrote a simple script, which can directly generate the job information of batch configuration into the database. Of course, I will continue to migrate my flows in Azkaban to DS:
   
   - https://github.com/reele/process_definition_import_tool
   
   **Therefore, is it necessary to add a common template import interface in DS, so that the data of other scheduling platforms can be formatted and exported, and then directly imported into DS? Or implement a common stand-alone tool for data conversion?**
   
   **Which version of DolphinScheduler:**
    -[1.3.6-release]
   ---
   
   **关于迁移到 DolphinScheduler**
   
   相信有很多用户都被DS的功能所吸引,但并不是所有人都能便捷的将原有的调度配置迁移至DS,尤其是数据仓库这类拥有数千个作业的调度平台。
   
   大多数用户确实需要一个便捷数据迁移的工具,我就是其中之一,而且我需要迁移的作业数达到6000多个。
   
   所以我在看过大部分代码之后,写了一个简单的脚本,可以将批量配置的作业信息直接生成到数据库中,当然后续我也准备继续把我在Azkaban中的配置也迁移至DS:
   
   - https://github.com/reele/process_definition_import_tool
   
   **所以,是否有必要在DS中加一个通用模版导入的接口,可以实现其他调度平台的数据格式化导出,然后直接导入至DS?再或者实现一个通用的独立工具用来做数据转换?**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] jon-qj commented on issue #5925: [Improvement][Common template API | Standalone import tool] About migrating to DolphinScheduler

Posted by GitBox <gi...@apache.org>.
jon-qj commented on issue #5925:
URL: https://github.com/apache/dolphinscheduler/issues/5925#issuecomment-891684920


   > **About migrating to DolphinScheduler**
   > 
   > It is believed that many users are attracted by the functions of DS, but not everyone can easily migrate the original scheduling configuration to DS, especially the scheduling platform with thousands of jobs such as data warehouse.
   > 
   > Most users really need a tool for easy data migration, and I am one of them, and I need to migrate more than 6000 jobs.
   > 
   > So after reading most of the code, I wrote a simple script, which can directly generate the job information of batch configuration into the database. Of course, I will continue to migrate my flows in Azkaban to DS:
   > 
   > * https://github.com/reele/process_definition_import_tool
   > 
   > **Therefore, is it necessary to add a common template import interface in DS, so that the data of other scheduling platforms can be formatted and exported, and then directly imported into DS? Or implement a common stand-alone tool for data conversion?**
   > 
   > ## **Which version of DolphinScheduler:**
   > -[1.3.6-release]
   
   In my opinion, this function is generally implemented by users themselves. There are too many scheduling platforms (including self-development), complex structures and more uncertainties, which makes it difficult to parse and initialize into DS general format (especially on front-end, how to initialize location to define the coordinates of each task?), we can collect the user's own scripts, summarize them to the tool set on the DS official website, and download and modify them by ourselves.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] CalvinKirs commented on issue #5925: [Improvement][Common template API | Standalone import tool] About migrating to DolphinScheduler

Posted by GitBox <gi...@apache.org>.
CalvinKirs commented on issue #5925:
URL: https://github.com/apache/dolphinscheduler/issues/5925#issuecomment-891658721


   General-purpose tools may be better. We can create a new warehouse for this purpose. Does anyone have any suggestions for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] reele edited a comment on issue #5925: [Improvement][Common template API | Standalone import tool] About migrating to DolphinScheduler

Posted by GitBox <gi...@apache.org>.
reele edited a comment on issue #5925:
URL: https://github.com/apache/dolphinscheduler/issues/5925#issuecomment-891811825


   > In my opinion, this function is generally implemented by users themselves. There are too many scheduling platforms (including self-development), complex structures and more uncertainties, which makes it difficult to parse and initialize into DS general format (especially on front-end, how to initialize location to define the coordinates of each task?), we can collect the user's own scripts, summarize them to the tool set on the DS official website, and download and modify them by ourselves.
   
   In my understanding, the main reasons for migration difficulties are as follows:
   1. Job data structure is complex and difficult to be solved by simple mapping
   2. Difficult for non-developer user
   
   Typically, the migration of large numbers of jobs (thousands) is done with little regard for the DAG graphics, and the dependencies are likely to be too complex for the data warehouse user to develop with the DAG graphical interface, like mine(One of the simpler):
   
   ![auto_generated_dag](https://user-images.githubusercontent.com/38578667/128014335-ff35b5c0-424b-4d72-912b-efa688ee68ba.png)
   
   In general, the scheduler requires the following basic data:
   1.Task definition
   2.Task Dependencies
   3.Scheduling of tasks
   4.DAG drawing definition
   5.Task group definition
   
   I think, 1-3 are required and 4,5 are optional.
   
   If the common interface of 1-3 can be implemented, the data migration will be very friendly to other platforms.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #5925: [Improvement][Generic Template API | Standalone import tool] About migrating to DolphinScheduler

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #5925:
URL: https://github.com/apache/dolphinscheduler/issues/5925#issuecomment-890002574






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] reele commented on issue #5925: [Improvement][Common template API | Standalone import tool] About migrating to DolphinScheduler

Posted by GitBox <gi...@apache.org>.
reele commented on issue #5925:
URL: https://github.com/apache/dolphinscheduler/issues/5925#issuecomment-891811825


   > In my opinion, this function is generally implemented by users themselves. There are too many scheduling platforms (including self-development), complex structures and more uncertainties, which makes it difficult to parse and initialize into DS general format (especially on front-end, how to initialize location to define the coordinates of each task?), we can collect the user's own scripts, summarize them to the tool set on the DS official website, and download and modify them by ourselves.
   
   In my understanding, the main reasons for migration difficulties are as follows:
   1. Job data structure is complex and difficult to be solved by simple mapping
   2. Difficult for non-developer user
   
   Typically, the migration of large numbers of jobs (thousands) is done with little regard for the DAG graphics, and the dependencies are likely to be too complex for the data warehouse user to develop with the DAG graphical interface, like mine(One of the simpler):
   
   ![auto_generated_dag](https://user-images.githubusercontent.com/38578667/128014335-ff35b5c0-424b-4d72-912b-efa688ee68ba.png)
   
   In general, the scheduler requires the following basic data:
   1.Task definition
   2.Task Dependencies
   3.Scheduling of tasks
   4.DAG drawing definition
   5.Task group definition
   
   I think, 1-3 are required and 4,5 is optional.
   
   If the common interface of 1-3 can be implemented, the data migration will be very friendly to other platforms.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org