You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Or Sher <or...@gmail.com> on 2016/10/26 08:58:14 UTC

UI load time of "recent statuses" is increasing

Hi,

I'm using Airflow 1.7  for some time now (~half a year in prod) and lately
I stated to notice that the load time of the "recent statuses" part on the
UI main page is starting to take a lot of time. Got to 10-13 seconds.

From looking in Mysql I found that this is the query that takes the whole
time:

SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state AS
task_instance_state, count(task_instance.task_id) AS count_1
FROM task_instance LEFT OUTER JOIN dag_run ON dag_run.dag_id =
task_instance.dag_id AND dag_run.execution_date =
task_instance.execution_date AND dag_run.state = 'running' LEFT OUTER JOIN
(SELECT dag_run.dag_id AS dag_id, max(dag_run.execution_date) AS
execution_date
FROM dag_run GROUP BY dag_run.dag_id) AS last_dag_run ON
last_dag_run.dag_id = task_instance.dag_id AND last_dag_run.execution_date
= task_instance.execution_date
WHERE task_instance.task_id IN ( <big list of all tasks> ) AND
(dag_run.dag_id IS NOT NULL OR last_dag_run.dag_id IS NOT NULL) GROUP BY
task_instance.dag_id, task_instance.state

First, I think the second left outer join might be a bit redundant and can
be replaced with a simple join. When I changed that, I got the same results
in less than 2 secs. (down from 13 seconds)

1. Does that change make sense?
2. Is that something that was resolved in recent version? (Couldn't find
any issue)
3. Is that something I can change by simply changing some sql resource
files or is it generated by some kind of an ORM?

Thanks!

Re: UI load time of "recent statuses" is increasing

Posted by Li Xuan Ji <xu...@gmail.com>.
> 3. Is that something I can change by simply changing some sql resource
files or is it generated by some kind of an ORM?

I think this code is generated by sqlalchemy here

https://github.com/apache/incubator-airflow/blob/6f4704a447756d6a17c617afe1a9b54d629c79ac/airflow/www/views.py#L467-L497

On 30 October 2016 at 08:01, Or Sher <or...@gmail.com> wrote:
> Anyone?
>
> On Wed, Oct 26, 2016 at 11:58 AM Or Sher <or...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm using Airflow 1.7  for some time now (~half a year in prod) and lately
>> I stated to notice that the load time of the "recent statuses" part on the
>> UI main page is starting to take a lot of time. Got to 10-13 seconds.
>>
>> From looking in Mysql I found that this is the query that takes the whole
>> time:
>>
>> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state
>> AS task_instance_state, count(task_instance.task_id) AS count_1
>> FROM task_instance LEFT OUTER JOIN dag_run ON dag_run.dag_id =
>> task_instance.dag_id AND dag_run.execution_date =
>> task_instance.execution_date AND dag_run.state = 'running' LEFT OUTER JOIN
>> (SELECT dag_run.dag_id AS dag_id, max(dag_run.execution_date) AS
>> execution_date
>> FROM dag_run GROUP BY dag_run.dag_id) AS last_dag_run ON
>> last_dag_run.dag_id = task_instance.dag_id AND last_dag_run.execution_date
>> = task_instance.execution_date
>> WHERE task_instance.task_id IN ( <big list of all tasks> ) AND
>> (dag_run.dag_id IS NOT NULL OR last_dag_run.dag_id IS NOT NULL) GROUP BY
>> task_instance.dag_id, task_instance.state
>>
>> First, I think the second left outer join might be a bit redundant and can
>> be replaced with a simple join. When I changed that, I got the same results
>> in less than 2 secs. (down from 13 seconds)
>>
>> 1. Does that change make sense?
>> 2. Is that something that was resolved in recent version? (Couldn't find
>> any issue)
>> 3. Is that something I can change by simply changing some sql resource
>> files or is it generated by some kind of an ORM?
>>
>> Thanks!
>>



-- 
Im Xuan Ji!

Re: UI load time of "recent statuses" is increasing

Posted by Or Sher <or...@gmail.com>.
Anyone?

On Wed, Oct 26, 2016 at 11:58 AM Or Sher <or...@gmail.com> wrote:

> Hi,
>
> I'm using Airflow 1.7  for some time now (~half a year in prod) and lately
> I stated to notice that the load time of the "recent statuses" part on the
> UI main page is starting to take a lot of time. Got to 10-13 seconds.
>
> From looking in Mysql I found that this is the query that takes the whole
> time:
>
> SELECT task_instance.dag_id AS task_instance_dag_id, task_instance.state
> AS task_instance_state, count(task_instance.task_id) AS count_1
> FROM task_instance LEFT OUTER JOIN dag_run ON dag_run.dag_id =
> task_instance.dag_id AND dag_run.execution_date =
> task_instance.execution_date AND dag_run.state = 'running' LEFT OUTER JOIN
> (SELECT dag_run.dag_id AS dag_id, max(dag_run.execution_date) AS
> execution_date
> FROM dag_run GROUP BY dag_run.dag_id) AS last_dag_run ON
> last_dag_run.dag_id = task_instance.dag_id AND last_dag_run.execution_date
> = task_instance.execution_date
> WHERE task_instance.task_id IN ( <big list of all tasks> ) AND
> (dag_run.dag_id IS NOT NULL OR last_dag_run.dag_id IS NOT NULL) GROUP BY
> task_instance.dag_id, task_instance.state
>
> First, I think the second left outer join might be a bit redundant and can
> be replaced with a simple join. When I changed that, I got the same results
> in less than 2 secs. (down from 13 seconds)
>
> 1. Does that change make sense?
> 2. Is that something that was resolved in recent version? (Couldn't find
> any issue)
> 3. Is that something I can change by simply changing some sql resource
> files or is it generated by some kind of an ORM?
>
> Thanks!
>