You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Andreas Költringer (JIRA)" <ji...@apache.org> on 2019/01/18 19:08:00 UTC
[jira] [Comment Edited] (AIRFLOW-2319) Table "dag_run" has (bad)
second index on (dag_id, execution_date)
[ https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746574#comment-16746574 ]
Andreas Költringer edited comment on AIRFLOW-2319 at 1/18/19 7:07 PM:
----------------------------------------------------------------------
Well, as of now, on the master branch, the DagRun class contains the following code:
{code:java}
__table_args__ = (
Index('dag_id_state', dag_id, _state),
UniqueConstraint('dag_id', 'execution_date'),
UniqueConstraint('dag_id', 'run_id'),
)
{code}
See here: [https://github.com/apache/airflow/blob/master/airflow/models/__init__.py#L4694|https://github.com/apache/airflow/commit/b63b42429ffea52ec9c9a0ead7dbc258ea4c2900]
However, for version {{1.9}} (branch {{v1-9-stable}}), this is not there: [https://github.com/apache/airflow/blob/v1-9-stable/airflow/models.py#L4390]
Why would a migration added in 2015 (see [relevant commit|https://github.com/apache/airflow/commit/b63b42429ffea52ec9c9a0ead7dbc258ea4c2900]) add a unique constraint, that was added to the {{model.py}} file in June 2018 ([commit|https://github.com/apache/airflow/commit/680651f0ae2a314f8e9882a6bc38f4fa3795cdbe])?
Anyway: if this is indeed there by design, could somebody pls point out the rationale behind this to me? For us it would actually be a use case to have multiple DagRuns per (dag_id, exec_date). Thanks.
was (Author: akoeltringer):
Well, as of now, on the master branch, the DagRun class contains the following code:
{code:java}
__table_args__ = (
Index('dag_id_state', dag_id, _state),
UniqueConstraint('dag_id', 'execution_date'),
UniqueConstraint('dag_id', 'run_id'),
)
{code}
See here: [https://github.com/apache/airflow/blob/master/airflow/models/__init__.py#L4694|https://github.com/apache/airflow/commit/b63b42429ffea52ec9c9a0ead7dbc258ea4c2900]
However, for version {{1.9}} (branch {{v1-9-stable}}), this is not there: [https://github.com/apache/airflow/blob/v1-9-stable/airflow/models.py#L4390]
Why would a migration added in 2015 (see [relevant commit|https://github.com/apache/airflow/commit/b63b42429ffea52ec9c9a0ead7dbc258ea4c2900]) add a unique constraint, that was added to the {{model.py}} file somewhere in 2018?
Anyway: if this is indeed there by design, could somebody pls point out the rationale behind this to me? For us it would actually be a use case to have multiple DagRuns per (dag_id, exec_date). Thanks.
> Table "dag_run" has (bad) second index on (dag_id, execution_date)
> ------------------------------------------------------------------
>
> Key: AIRFLOW-2319
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2319
> Project: Apache Airflow
> Issue Type: Bug
> Components: DagRun
> Affects Versions: 1.9.0
> Reporter: Andreas Költringer
> Priority: Major
>
> Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} (multiple rows with the same {{(dag_id, execution_date)}}) raised the following error:
> {code:java}
> {models.py:1644} ERROR - No row was found for one(){code}
> This is weird as the {{session.add()}} and {{session.commit()}} is right before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}.
> Manually inspecting the database revealed that there is an extra index with {{unique}} constraint on the columns {{(dag_id, execution_date)}}:
> {code:java}
> sqlite> .schema dag_run
> CREATE TABLE dag_run (
> id INTEGER NOT NULL,
> dag_id VARCHAR(250),
> execution_date DATETIME,
> state VARCHAR(50),
> run_id VARCHAR(250),
> external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date DATETIME,
> PRIMARY KEY (id),
> UNIQUE (dag_id, execution_date),
> UNIQUE (dag_id, run_id),
> CHECK (external_trigger IN (0, 1))
> );
> CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code}
> (On SQLite its a unique constraint, on MariaDB its also an index)
> The {{DagRun}} class in {{models.py}} does not reflect this, however it is in [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42]
> I looked for other migrations correting this, but could not find any. As this is not reflected in the model, I guess this is a bug?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)