You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "t oo (Jira)" <ji...@apache.org> on 2019/12/29 11:27:00 UTC

[jira] [Updated] (AIRFLOW-6388) SparkSubmitOperator polling should not 'consume' a slot

     [ https://issues.apache.org/jira/browse/AIRFLOW-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

t oo updated AIRFLOW-6388:
--------------------------
    Description: 
Spark jobs can often take many minutes (or even hours) to complete. The spark submit operator submits a job to a spark cluster, then polls its status. This means it could be consuming a 'slot' (ie parallelism, dag_concurrency, max_active_dag_runs_per_dag, non_pooled_task_slot_count) for hours when it is not 'doing' anything but polling for status. https://github.com/apache/airflow/pull/6909#discussion_r361838225 suggested it should move to a poke/reschedule model.

"This actually means occupy worker and do nothing for n seconds is it not?
It was OK when it was 1 second but users may set it to even 5 min without realising that it occupys the worker.

My comment here is more of a concern rather than an action to do.
Should this work by occupying the worker "indefinitely" or can it be something like the sensors with (poke/reschedule)?"

  was:
My Dag has tasks from 12 different types of operators. One of the operators is the dummyoperator (which is meant to do 'nothing') but it can't be run during busy times as the '{{parallelism}}, {{dag_concurrency}}, {{max_active_dag_runs_per_dag}}, {{non_pooled_task_slot_count' }}limits have been met (so it is stuck in scheduled state). I would like a new config flag (dont_block_dummy=True) with the ability for dummyOperator tasks to always get run even if the parallelism.etc limits are met. Without this feature, the only workaround for this is to make a huge parallelism limit (above now) and then give pools to all the other operators in my dag. But my idea is that dummyOperator should not have limits as it is not a resource hog.

 
h4. Task Instance Details
h5. Dependencies Blocking Task From Getting Scheduled
||Dependency||Reason||
|Unknown|All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:
- The scheduler is down or under heavy load
- The following configuration values may be limiting the number of queueable processes: {{parallelism}}, {{dag_concurrency}}, {{max_active_dag_runs_per_dag}}, {{non_pooled_task_slot_count}}
 
If this task instance does not start soon please contact your Airflow administrator for assistance.|


> SparkSubmitOperator polling should not 'consume' a slot
> -------------------------------------------------------
>
>                 Key: AIRFLOW-6388
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6388
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: dependencies, scheduler
>    Affects Versions: 1.10.3
>            Reporter: t oo
>            Priority: Minor
>
> Spark jobs can often take many minutes (or even hours) to complete. The spark submit operator submits a job to a spark cluster, then polls its status. This means it could be consuming a 'slot' (ie parallelism, dag_concurrency, max_active_dag_runs_per_dag, non_pooled_task_slot_count) for hours when it is not 'doing' anything but polling for status. https://github.com/apache/airflow/pull/6909#discussion_r361838225 suggested it should move to a poke/reschedule model.
> "This actually means occupy worker and do nothing for n seconds is it not?
> It was OK when it was 1 second but users may set it to even 5 min without realising that it occupys the worker.
> My comment here is more of a concern rather than an action to do.
> Should this work by occupying the worker "indefinitely" or can it be something like the sensors with (poke/reschedule)?"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)