You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/27 07:02:12 UTC

[GitHub] [airflow] bhavaniravi opened a new issue #12647: Dynamic Workflows by spinning multiple task Instance during Dag Run

bhavaniravi opened a new issue #12647:
URL: https://github.com/apache/airflow/issues/12647


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   **Description**
   
   Supporting Runtime-Dynamic workflows using multiple task instance during DAG run
   
   **Use case / motivation**
   
   One of the most common requirements that I constantly see repeating in multiple places and forums is to create dynamic task B based on the output of task A.
   
   ```
                      |---> Task B.1 -- |
                      |---> Task B.2 -- |
           Task A --- |---> Task B.3 -- |-----> Task C
                      |       ....     |
                      |---> Task B.N --|
   ```
   
   A specific requirement in the above case is to parallelize data processing while the specification of the task remains the same. Let's the requirement is to 
   1. Fetch the n records from the data lake
   2. Process `n` records in task B `preprocess_records`
   3. Spin up multiple B so that we can parallelize the processing of N records
   
   *Inspiration from Argo* :: https://argoproj.github.io/argo/examples/#loops
   
   **Idea**
   
   With current airflow, there is only one task instance per task during a DAG run. 
   
   How about we provide an API where `Task A` can inject the number of task instances it can spin up for the downstream task.
   
   Should this be a separate operator?
   
   **Related Issues**
   
   https://stackoverflow.com/questions/41517798/proper-way-to-create-dynamic-workflows-in-airflow
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #12647: Dynamic Workflows by spinning multiple task Instance during Dag Run

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #12647:
URL: https://github.com/apache/airflow/issues/12647#issuecomment-765432546


   We already discussed this and yes it is planned - feel free to start discussion about it in the devlist


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vidhithakrar commented on issue #12647: Dynamic Workflows by spinning multiple task Instance during Dag Run

Posted by GitBox <gi...@apache.org>.
vidhithakrar commented on issue #12647:
URL: https://github.com/apache/airflow/issues/12647#issuecomment-755138998


   I think this is a very important and powerful feature to have in Airflow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] hedrickw commented on issue #12647: Dynamic Workflows by spinning multiple task Instance during Dag Run

Posted by GitBox <gi...@apache.org>.
hedrickw commented on issue #12647:
URL: https://github.com/apache/airflow/issues/12647#issuecomment-765429204


   I was able to do something kinda similar, where Task A creates a yaml file,  Task Group reads yaml file and generates N tasks, Task B takes all inputs from Task Group but having a Type of Operator to do this would be really nice, you also lose UI with my method since tasks are removed depending on N


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org