You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Tony Brookes (Jira)" <ji...@apache.org> on 2020/03/18 23:32:00 UTC

[jira] [Commented] (AIRFLOW-4540) Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure.

    [ https://issues.apache.org/jira/browse/AIRFLOW-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062142#comment-17062142 ] 

Tony Brookes commented on AIRFLOW-4540:
---------------------------------------

I was quite excited to see serialized dags in the latest version of Airflow when I downloaded it.  Alas it does not appear to address the main issue of this JIRA.  Namely that different INSTANCES of a dag produce different DAGs.  Irrespective of whether a DAG is for customer A or customer B, DAGs will look different whenever they are based on external influences.  For instance some tasks will run on a Sunday but not any other day of the week.  But the primary key of the serialized_dag table is dag_id, not dag_run_id (which should map back to the id column in the dag_run table.

Unfortunately this means an opportunity has been missed to make Airflow far more usable in larger enterprises where all schedulers have, for a very long time, been able to show you what tasks a schedule ran when it ran, irrespective of what it looks like now, even if the code which generate that dag run has long since changed.

Unless I am missing something we would still be rendering the Dag based on the last time it got stored in the table, irrespective of what tasks were actually run...

Any thoughts on further enhancing this so it renders the dag RUN instead?

 

> Allow historic DAG runs to be rendered in the UI based on what the database says they did, not the current DAG structure.
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-4540
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4540
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: DAG, ui
>            Reporter: Tony Brookes
>            Priority: Major
>
> Dags evolve over time.  Their structure changes.  Indeed because they are dynamically created in code they can change even if the code remains the same based on external factors which the code itself reacts to (generating different tasks.). All part of the wonderful advantages of the Airflow approach to generating Dags.
> However, when you look at prior runs of the Dags in the UI, the rendered graph is always based on evaluating the Dag code "right now."  So if the Dags have changed, or external factors have changed, then the graph can look nothing like it did when it was actually run.
> For example, we have evolved our Dags based on experience, changing names to be more meaningful, adding and removing operators, moving from hard coded generation to templates and dynamic generation of parallel tasks via code etc.  When you open older Dag runs in the UI, you see all sorts of strangeness where tasks which you no longer have simply vanish and tasks you have added show up (with their predecessor and successor links if they're there) which can make it look like a downstream task triggered even though it's upstream parent never ran.  Quite confusing, especially when trying to debug problems.
> I would love the ability to see what a complete Dag _*actually d**id*_.  Meaning, based on the data in the database, generate the graph based on those entries, completely irrespective of what the current Dag code says it should look like.  To fully support this might require some additional columns such as storing what the name of the operator class was etc, and perhaps the predecessor and successor task IDs.
> But from a production support standpoint, it would be incredibly valuable to see the "as was" view of historic Dag execution rather than the current "as is" view.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)