You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "t oo (Jira)" <ji...@apache.org> on 2019/12/29 11:59:00 UTC

[jira] [Updated] (AIRFLOW-6389) add config for 'allow_multi_scheduler_instances' default True

     [ https://issues.apache.org/jira/browse/AIRFLOW-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

t oo updated AIRFLOW-6389:
--------------------------
    Description: 
right now common deployment pattern with blue/green build is:
1. on EC2 1, start scheduler
2. Assign 'final' DNS to EC2 1
3. create EC2 2
4. start scheduler on EC2 2
5.  Assign 'final' DNS to EC2 2
6. Teardown EC2 1

Issue is that since the megastore db (ie mysql) is shared to both EC2s there is a period of time between point 4 and 6 above where there are multiple schedulers running. To avoid this proposing:



  was:
airflow is often shared between users of varying experience levels. to prevent a novice user causing a accidental denial of service to the scheduler (by uploading huge dag which hogs all scheduler resources) i propose 2 new server side configs: 

 
 # max num of tasks in a single dag: default say 250
 # max num of dags: default say 500

 

To add to this airflow can't handle 18k tasks in single dag (dag_processor_manager times out after 300 seconds) , but if i split up into 6 dags of 3k tasks each then it works.


> add config for 'allow_multi_scheduler_instances' default True
> -------------------------------------------------------------
>
>                 Key: AIRFLOW-6389
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6389
>             Project: Apache Airflow
>          Issue Type: New Feature
>          Components: scheduler
>    Affects Versions: 1.10.6
>            Reporter: t oo
>            Priority: Major
>
> right now common deployment pattern with blue/green build is:
> 1. on EC2 1, start scheduler
> 2. Assign 'final' DNS to EC2 1
> 3. create EC2 2
> 4. start scheduler on EC2 2
> 5.  Assign 'final' DNS to EC2 2
> 6. Teardown EC2 1
> Issue is that since the megastore db (ie mysql) is shared to both EC2s there is a period of time between point 4 and 6 above where there are multiple schedulers running. To avoid this proposing:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)