You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/01/04 05:06:00 UTC

[jira] [Commented] (AIRFLOW-6360) config option to skip task_stats from getting completed dagruns/tis

    [ https://issues.apache.org/jira/browse/AIRFLOW-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007912#comment-17007912 ] 

ASF GitHub Bot commented on AIRFLOW-6360:
-----------------------------------------

tooptoop4 commented on pull request #7037: [AIRFLOW-6360] 'Recent tasks' stats only show non-completed dagruns option
URL: https://github.com/apache/airflow/pull/7037
 
 
   ---
   Link to JIRA issue: https://issues.apache.org/jira/browse/AIRFLOW-6360
   
   - [X ] Description above provides context of the change
   - [ X] Commit message starts with `[AIRFLOW-6360]`, where AIRFLOW-NNNN = JIRA ID*
   - [ X] Unit tests coverage for changes (not needed for documentation changes)
   - [X ] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [X ] Relevant documentation is updated including usage instructions.
   - [ X] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   (*) For document-only changes, no JIRA issue is needed. Commit message starts `[AIRFLOW-XXXX]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> config option to skip task_stats from getting completed dagruns/tis
> -------------------------------------------------------------------
>
>                 Key: AIRFLOW-6360
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6360
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: ui
>    Affects Versions: 1.10.6
>            Reporter: t oo
>            Assignee: t oo
>            Priority: Major
>
> task_stats endpoint to display 'recent tasks' is very slow when someone has many dagruns or many tasks in a dag.
> BEFORE
>  LastDagRun = (
>             session.query(DagRun.dag_id, sqla.func.max(DagRun.execution_date).label('execution_date'))
>                 .join(Dag, Dag.dag_id == DagRun.dag_id)
>                 .filter(DagRun.state != State.RUNNING)
>                 .filter(Dag.is_active == True)  # noqa: E712
>                 .filter(Dag.is_subdag == False)  # noqa: E712
>                 .group_by(DagRun.dag_id)
>                 .subquery('last_dag_run')
>         )
>         RunningDagRun = (
>             session.query(DagRun.dag_id, DagRun.execution_date)
>                 .join(Dag, Dag.dag_id == DagRun.dag_id)
>                 .filter(DagRun.state == State.RUNNING)
>                 .filter(Dag.is_active == True)  # noqa: E712
>                 .filter(Dag.is_subdag == False)  # noqa: E712
>                 .subquery('running_dag_run')
>         )
>         # Select all task_instances from active dag_runs.
>         # If no dag_run is active, return task instances from most recent dag_run.
>         LastTI = (
>             session.query(TI.dag_id.label('dag_id'), TI.state.label('state'))
>             .join(LastDagRun, and_(
>                 LastDagRun.c.dag_id == TI.dag_id,
>                 LastDagRun.c.execution_date == TI.execution_date))
>         )
>         RunningTI = (
>             session.query(TI.dag_id.label('dag_id'), TI.state.label('state'))
>             .join(RunningDagRun, and_(
>                 RunningDagRun.c.dag_id == TI.dag_id,
>                 RunningDagRun.c.execution_date == TI.execution_date))
>         )
>         UnionTI = union_all(LastTI, RunningTI).alias('union_ti')
>         qry = (
>             session.query(UnionTI.c.dag_id, UnionTI.c.state, sqla.func.count())
>             .group_by(UnionTI.c.dag_id, UnionTI.c.state)
>         )
> AFTER
> #we not interested in stats for dagruns already completed, only want active ones
>         RunningDagRun = (
>             session.query(DagRun.dag_id, DagRun.execution_date)
>                 .join(Dag, Dag.dag_id == DagRun.dag_id)
>                 .filter(DagRun.state == State.RUNNING,
>                 Dag.is_active,
>                 Dag.is_subdag == False)  # noqa: E712
>                 .subquery('running_dag_run')
>         )
>         # Select all task_instances from active dag_runs.
>         qry = (
>             session.query(TI.dag_id.label('dag_id'), TI.state.label('state'), sqla.func.count())
>             .join(RunningDagRun, and_(
>                 RunningDagRun.c.dag_id == TI.dag_id,
>                 RunningDagRun.c.execution_date == TI.execution_date))
>             .group_by(TI.dag_id, TI.state)
>         )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)