You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2018/09/07 00:04:03 UTC

[GitHub] ubermen edited a comment on issue #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date'

ubermen edited a comment on issue #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date'
URL: https://github.com/apache/incubator-airflow/pull/3840#issuecomment-419279096
 
 
   There was no index composed of dag_id and execution_date. So, when scheduler find all tis of dagrun like this "select * from task_instance where dag_id = 'some_id' and execution_date = '2018-09-01 ...'", this query will be using ti_dag_state index (I was testing it in mysql workbench). Perhaps there's no problem when range of execution_date is small (under 1000 dagrun), but I had experienced slow allocation of tis when the dag had 1000+ accumulative dagrun. So, now I was using airflow with adding new index (dag_id, execution_date) on task_instance table. I have attached result of my test
   ![image](https://user-images.githubusercontent.com/6738941/45191171-bc525000-b27c-11e8-9762-bfd18cf99011.png)
   ![image](https://user-images.githubusercontent.com/6738941/45191184-d2f8a700-b27c-11e8-8739-fda9742985ff.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services