You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by "Gaurav Sehgal (Jira)" <ji...@apache.org> on 2019/12/21 09:52:00 UTC

[jira] [Comment Edited] (AIRFLOW-6227) Ability to assign multiple pool names to a single task

    [ https://issues.apache.org/jira/browse/AIRFLOW-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001660#comment-17001660 ] 

Gaurav Sehgal edited comment on AIRFLOW-6227 at 12/21/19 9:51 AM:
------------------------------------------------------------------

[~ash] As I see, right now we are storing the pool information in the task instance itself. But what if someone changed the task pool, in that case, all the old ti will run in the old pool(while retrying), whereas the new ti will run in a new pool. Is it the right behavior? Not sure, but isn't it should be like all ti map to new pool itself.


was (Author: gaurav123):
[~ash] As I see, right now we are storing the pool information in the task instance itself. But what if someone changed the task pool, in that case, all the old ti will run in the old pool(while retrying), whereas the new ti will run in a new pool. Is it the right behavior? Not sure, but isn't it should be like all ti map to new pool itself.

> Ability to assign multiple pool names to a single task
> ------------------------------------------------------
>
>                 Key: AIRFLOW-6227
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6227
>             Project: Apache Airflow
>          Issue Type: New Feature
>          Components: scheduler
>    Affects Versions: 1.10.6
>            Reporter: t oo
>            Assignee: Gaurav Sehgal
>            Priority: Major
>
> Right now only a single pool name can be assigned to each task instance.
> Ideally 2 different pool names can be assigned to a task_instance.
> Use case:
> I have 300 Spark tasks writing to 60 different tables (ie. there are multiple tasks writing to same table).
> I want both:
>  # Maximum of 30 Spark tasks running in parallel
>  # Never more than 1 Spark task writing to the same table in parallel
> If i have a 'spark' pool of 30 and assign 'spark' pool to those tasks then i risk having 2 tasks writing to same table.
> But instead if i have a 'tableA' pool of 1, 'tableB' pool of 1, 'tableC' pool of 1...etc and assign relevant table name pool to each task then i risk having more than 30 spark tasks running in parallel.
> I can't use 'parallelism' or other settings because I have other non-spark tasks that I don't want to limit
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)