You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Matt Blaha (Jira)" <ji...@apache.org> on 2019/11/02 15:16:00 UTC

[jira] [Commented] (AIRFLOW-2319) Table "dag_run" has (bad) second index on (dag_id, execution_date)

    [ https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965398#comment-16965398 ] 

Matt Blaha commented on AIRFLOW-2319:
-------------------------------------

This is a big issue for me, as [~TrevorEdwards] mentioned above, a series of tasks in a single DAG that gets kicked off repeatedly with different parameters via external trigger. All of the external sources have to ensure execution time is at least one second apart, but in my case, they don't communicate with each other at all. Nothing I've read implies that this should be necessary when externally triggering jobs.

 

I manually removed the constraint and several thousand runs complete just fine.

 

If this is by design as [~ash] suggested above, could someone please elaborate on the why and suggest an alternative way to prevent a high volume of DAG runs with different parameters from failing? If they should fail, I agree with above, what would the appropriate error message be?

> Table "dag_run" has (bad) second index on (dag_id, execution_date)
> ------------------------------------------------------------------
>
>                 Key: AIRFLOW-2319
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2319
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun
>    Affects Versions: 1.9.0
>            Reporter: Andreas Költringer
>            Priority: Major
>
> Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} (multiple rows with the same {{(dag_id, execution_date)}}) raised the following error:
> {code:java}
> {models.py:1644} ERROR - No row was found for one(){code}
> This is weird as the {{session.add()}} and {{session.commit()}} is right before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}.
> Manually inspecting the database revealed that there is an extra index with {{unique}} constraint on the columns {{(dag_id, execution_date)}}:
> {code:java}
> sqlite> .schema dag_run
> CREATE TABLE dag_run (
>         id INTEGER NOT NULL, 
>         dag_id VARCHAR(250), 
>         execution_date DATETIME, 
>         state VARCHAR(50), 
>         run_id VARCHAR(250), 
>         external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date DATETIME, 
>         PRIMARY KEY (id), 
>         UNIQUE (dag_id, execution_date), 
>         UNIQUE (dag_id, run_id), 
>         CHECK (external_trigger IN (0, 1))
> );
> CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code}
> (On SQLite its a unique constraint, on MariaDB its also an index)
> The {{DagRun}} class in {{models.py}} does not reflect this, however it is in [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42]
> I looked for other migrations correting this, but could not find any. As this is not reflected in the model, I guess this is a bug?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)