You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2021/11/26 09:59:44 UTC

[GitHub] [dolphinscheduler] zhongjiajie opened a new issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

zhongjiajie opened a new issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016


   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement.
   
   
   ### Description
   
   Expand exists import process in 
   
   ![image](https://user-images.githubusercontent.com/15820530/143562382-7d49c649-3c43-4280-856e-612c090717e3.png)
   
   Auto create workflow while import sql script with specific hint `name` and `upstream`. would create `sql task type` task for and set dependence according setting.
   
   ### Use case
   
   When user import sql scripts with specific hint in header, it will create sql task for each sql scripts, and then parse the hint to set sql task upstream. After parsing all sql script, we would create the same number of tasks as the number of sql scripts(files).
   Beside that, we would connect tasks by sql scripts given hint and set task relation for it. if sql scripts set not exists upstream task we should pop a dialog and ask if ignore dependent or not. If user choose "yes" we should import but ignore the error, if choose "no", we should termine import without create any task or workflow.
   
   The flow chat as below:
   
   > source file is in https://drive.google.com/file/d/1aV4nHH9_xf8z9WiyT6_-rDlWv2fpXzEj/view?usp=sharing
   
   ![DS-AutoDAG-flow-chat drawio](https://user-images.githubusercontent.com/15820530/143552961-267ee1cf-4c9b-498e-9e9f-9a0ea4de355b.png)
   
   ## SQL scripts example
   
   And here is an example about sql scripts. Each sql script should have two hint, `name` to specific sql task name , and `upstream` to set task upstream for this task
   
   * `start.sql`: If both `name` and `upstream` hint is provided, we just use them to set task name and upstream task, if `upstream` set to root, mean task is the root task in the workflow
     ```sql
     -- name: start_auto_dag
     -- upstream: root
     select 'I am the start task of this workflow'
     ```
   
   * `child1.sql`: When task have upstream task, you could just set the value as `upstream`. And task relation would be create  after autodag parser done.
     ```sql
     -- name: branch_one
     -- upstream: start_auto_dag
     select 'I am the first branch task of this workflow'
     ```
   
   * `branch_two.sql`: If hint `name` not provide, we would use sql script filename as task name. In this case, we use `barnch_two` as task name, and set `start_auto_dag` as upstream task.
     ```sql
     -- upstream: start_auto_dag
     select 'I am the second branch task of this workflow'
     ```
   
   * `end.sql`: If task have two upstreams, you could list two task name and using specific delimiter for it, as an example we use `,` as delimiter, and set task `branch_one` and `branch_two` as upstream 
     ```sql
     -- name: end_auto_dag
     -- upstream: branch_one, branch_two
     select 'I am the ending of this workflow'
     ```
   
   * `independence.sql`: If upstream hint not set, we would use `root` as default. So it would become independence task in the workflow
     ```sql
     select 'I am the independence of this workflow'
     ```
   
   After we submit and DS parse, would could get workflow as below
   
   ```
                      -> branch_one ->
                    /                  \
   start_auto_dag ->                     -> end_auto_dag
                    \                  /
                      -> branch_two ->
   
   independence
   ```
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] zhongjiajie commented on issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-985362293


   @ououtt  Great, feel free to take it, looking forward your contribution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] ououtt commented on issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

Posted by GitBox <gi...@apache.org>.
ououtt commented on issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-985282176


   I want to try it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] zhongjiajie edited a comment on issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

Posted by GitBox <gi...@apache.org>.
zhongjiajie edited a comment on issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-985362293


   @ououtt  Great, feel free to take it, looking forward your contribution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] zhongjiajie commented on issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-979849314


   This is only the first step of this feature
   
   ## Some idea in the further
   
   After this patch merged, we could batch load sql scripts to DS and create workflow for it. Of course he can automatically create workflow, after we add some hints in sql scripts header.
   
   Let me think about which part we could improve, and maybe you could find it. Yeah, it called autodag base on we adding some addition information for original file. It could be simplify by using sql script name as task name, and using `sql parser` to analyze the dependences. And maybe that is the true `AutoDAG`.
   
   But this is not the scope of this issue, cause `sql parser` is a little difficult and not urgent for this moment. we should hold this until our community finish and done their important things.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] zhongjiajie commented on issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-985360692


   > I want to try it.
   
   @ououtt  Great! but as out weekly meeting said, I think you prefect to try sql parser, in https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-979849314 ,  but not the hint case, in https://github.com/apache/dolphinscheduler/issues/7016#issue-1064286923. Am I right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-979840223


   Hi:
   * Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can subscribe to the developer's email,Mail subscription steps reference https://dolphinscheduler.apache.org/en-us/community/development/subscribe.html ,Then write the issue URL in the email content and send question to dev@dolphinscheduler.apache.org.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] zhongjiajie edited a comment on issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

Posted by GitBox <gi...@apache.org>.
zhongjiajie edited a comment on issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-979849314


   This is only the first step of this feature
   
   ## Some idea in the further
   
   After this patch merged, we could batch load sql scripts to DS and create workflow for it. Of course he can automatically create workflow, after we add some hints in sql scripts header.
   
   Let me think about which part we could improve, and maybe you could find it. Yeah, it called autodag base on we adding some addition information for original file. It could be simplify by using sql script name as task name, and using `sql parser` to analyze the dependences. And maybe that is the true `AutoDAG`.
   
   But this is not the scope of this issue, cause `sql parser` is a little difficult and not urgent for this moment. we should hold this until our community finish and done their important things.
   
   ## SQL scripts example
   
   And here is an example about sql scripts. It just contain original sql scripts, our lovely `sql parser` would parse the dependence and set tasks relations
   
   * `start.sql`:
   
     ```sql
     insert into table start_auto_dag select 1;
     ```
   
   * `child1.sql`:
   
     ```sql
     insert into table branch_one select * from start_auto_dag;
     ```
   
   * `branch_two.sql`:
   
     ```sql
     insert into table branch_two select * from start_auto_dag;
     ```
   
   * `end.sql`:
   
     ```sql
     insert into table end_auto_dag
     select * from branch_one union all
     union all
     select * from branch_two union all
     ```
   
   * `independence.sql`: If upstream hint not set, we would use `root` as default. So it would become independence task in the workflow
   
     ```sql
     select 'I am the independence of this workflow'
     ```
   
   After we submit and DS parse, would could get workflow as below
   
   ```
                      -> branch_one ->
                    /                  \
   start_auto_dag ->                     -> end_auto_dag
                    \                  /
                      -> branch_two ->
   
   independence
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] zhongjiajie commented on issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-986384009


   > I don't think it's necessary. Keywords are conventions, not configurations
   
   After I think deeply, I agree with you and think we should use as constant, not configuration. If we make keyword as configure it would make user feel confuse cause they would have different keyword when they change. And with different DS clusters, with different keyword configure, cluster would have different behavior.
   
   I need another opinion here, cc @lenboo @CalvinKirs 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] caishunfeng closed issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

Posted by GitBox <gi...@apache.org>.
caishunfeng closed issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] zhongjiajie removed a comment on issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

Posted by GitBox <gi...@apache.org>.
zhongjiajie removed a comment on issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-985360692


   > I want to try it.
   
   @ououtt  Great! but as out weekly meeting said, I think you prefect to try sql parser, in https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-979849314 ,  but not the hint case, in https://github.com/apache/dolphinscheduler/issues/7016#issue-1064286923. Am I right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] ououtt commented on issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

Posted by GitBox <gi...@apache.org>.
ououtt commented on issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-986376019


   I don't think it's necessary. Keywords are conventions, not configurations


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [dolphinscheduler] zhongjiajie commented on issue #7016: [Feature][Auto DAG] Auto create workflow while import sql script with specific hint

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on issue #7016:
URL: https://github.com/apache/dolphinscheduler/issues/7016#issuecomment-985442313


   Hi @ououtt , be careful we should make our sql hint configurable which user could set keyword, WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org