You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "George Leslie-Waksman (JIRA)" <ji...@apache.org> on 2018/08/08 07:36:00 UTC

[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance

    [ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572816#comment-16572816 ] 

George Leslie-Waksman commented on AIRFLOW-2870:
------------------------------------------------

The process to reproduce is as follows:
 # Start with an Airflow deployment that predates {{cc1e65623dc7_add_max_tries_column_to_task_instance.py}} (e.g. 1.8.1)
 # Run Airflow enough to populate task_instances in the metadata database (run one of the sample dags)
 # Install an Airflow version after {{27c6a30d7c24_add_executor_config_to_task_instance.py}} (e.g. 1.10rc3)
 # {{airflow upgradedb}}

This will fail with a message about the column "task_instance.executor_config" not existing.

My current understanding of what is happening:
 * When constructing a sqlalchemy orm query using a declarative model (i.e. {{TaskInstance}}), the database table must be consistent with the structure of that model.
 ** SQLAlchemy's mapper will query all columns known to the orm mapper (code side) and assume they exist in the database
 * When running a migration, the database table is in a transitionary state
 * The code in {{airflow/models.py}} reflects the state of the database after running ALL migrations through the present
* When we are using the 1.10rc3 code to run migrations and we reach {{cc1e65623dc7_add_max_tries_column_to_task_instance.py}}, we [import TaskInstance|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py#L36] as if it has all future columns and then [query the old schema|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py#L64]

Under typical circumstances, one can avoid this issue by performing migrations using alembic + SQLAlchemy core (no orm) and directly manipulating the tables. However, in this case, we need to populate information from a {{Task}} object that does not have a representation in the database.

We may be able to work around the database issues by manipulating SQLAlchemy's [column loading|http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#load-only-cols] but that may be tricky given the intertwined nature of Airflow's model code.

> Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
> --------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-2870
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: George Leslie-Waksman
>            Priority: Blocker
>
> Running migrations from below cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
>     context)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
>     cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance from the current code version, which has changes to the task_instance table that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an executor_config column that does not exist as of when cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because the migration branches on table existence/non-existence at a point that will hide the issue from new installs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)