You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "t oo (Jira)" <ji...@apache.org> on 2019/12/29 11:59:00 UTC
[jira] [Updated] (AIRFLOW-6389) add config for
'allow_multi_scheduler_instances' default True
[ https://issues.apache.org/jira/browse/AIRFLOW-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
t oo updated AIRFLOW-6389:
--------------------------
Description:
right now common deployment pattern with blue/green build is:
1. on EC2 1, start scheduler
2. Assign 'final' DNS to EC2 1
3. create EC2 2
4. start scheduler on EC2 2
5. Assign 'final' DNS to EC2 2
6. Teardown EC2 1
Issue is that since the megastore db (ie mysql) is shared to both EC2s there is a period of time between point 4 and 6 above where there are multiple schedulers running. To avoid this proposing:
was:
airflow is often shared between users of varying experience levels. to prevent a novice user causing a accidental denial of service to the scheduler (by uploading huge dag which hogs all scheduler resources) i propose 2 new server side configs:
# max num of tasks in a single dag: default say 250
# max num of dags: default say 500
To add to this airflow can't handle 18k tasks in single dag (dag_processor_manager times out after 300 seconds) , but if i split up into 6 dags of 3k tasks each then it works.
> add config for 'allow_multi_scheduler_instances' default True
> -------------------------------------------------------------
>
> Key: AIRFLOW-6389
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6389
> Project: Apache Airflow
> Issue Type: New Feature
> Components: scheduler
> Affects Versions: 1.10.6
> Reporter: t oo
> Priority: Major
>
> right now common deployment pattern with blue/green build is:
> 1. on EC2 1, start scheduler
> 2. Assign 'final' DNS to EC2 1
> 3. create EC2 2
> 4. start scheduler on EC2 2
> 5. Assign 'final' DNS to EC2 2
> 6. Teardown EC2 1
> Issue is that since the megastore db (ie mysql) is shared to both EC2s there is a period of time between point 4 and 6 above where there are multiple schedulers running. To avoid this proposing:
--
This message was sent by Atlassian Jira
(v8.3.4#803005)