You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "George Leslie-Waksman (JIRA)" <ji...@apache.org> on 2018/08/08 07:36:00 UTC
[jira] [Commented] (AIRFLOW-2870) Migrations fail when upgrading
from below cc1e65623dc7_add_max_tries_column_to_task_instance
[ https://issues.apache.org/jira/browse/AIRFLOW-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572816#comment-16572816 ]
George Leslie-Waksman commented on AIRFLOW-2870:
------------------------------------------------
The process to reproduce is as follows:
# Start with an Airflow deployment that predates {{cc1e65623dc7_add_max_tries_column_to_task_instance.py}} (e.g. 1.8.1)
# Run Airflow enough to populate task_instances in the metadata database (run one of the sample dags)
# Install an Airflow version after {{27c6a30d7c24_add_executor_config_to_task_instance.py}} (e.g. 1.10rc3)
# {{airflow upgradedb}}
This will fail with a message about the column "task_instance.executor_config" not existing.
My current understanding of what is happening:
* When constructing a sqlalchemy orm query using a declarative model (i.e. {{TaskInstance}}), the database table must be consistent with the structure of that model.
** SQLAlchemy's mapper will query all columns known to the orm mapper (code side) and assume they exist in the database
* When running a migration, the database table is in a transitionary state
* The code in {{airflow/models.py}} reflects the state of the database after running ALL migrations through the present
* When we are using the 1.10rc3 code to run migrations and we reach {{cc1e65623dc7_add_max_tries_column_to_task_instance.py}}, we [import TaskInstance|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py#L36] as if it has all future columns and then [query the old schema|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py#L64]
Under typical circumstances, one can avoid this issue by performing migrations using alembic + SQLAlchemy core (no orm) and directly manipulating the tables. However, in this case, we need to populate information from a {{Task}} object that does not have a representation in the database.
We may be able to work around the database issues by manipulating SQLAlchemy's [column loading|http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#load-only-cols] but that may be tricky given the intertwined nature of Airflow's model code.
> Migrations fail when upgrading from below cc1e65623dc7_add_max_tries_column_to_task_instance
> --------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-2870
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2870
> Project: Apache Airflow
> Issue Type: Bug
> Reporter: George Leslie-Waksman
> Priority: Blocker
>
> Running migrations from below cc1e65623dc7_add_max_tries_column_to_task_instance.py fail with:
> {noformat}
> INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO [alembic.runtime.migration] Will assume transactional DDL.
> INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance
> Traceback (most recent call last):
> File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
> context)
> File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute
> cursor.execute(statement, parameters)
> psycopg2.ProgrammingError: column task_instance.executor_config does not exist
> LINE 1: ...ued_dttm, task_instance.pid AS task_instance_pid, task_insta...
> {noformat}
> The failure is occurring because cc1e65623dc7_add_max_tries_column_to_task_instance.py imports TaskInstance from the current code version, which has changes to the task_instance table that are not expected by the migration.
> Specifically, 27c6a30d7c24_add_executor_config_to_task_instance.py adds an executor_config column that does not exist as of when cc1e65623dc7_add_max_tries_column_to_task_instance.py is run.
> It is worth noting that this will not be observed for new installs because the migration branches on table existence/non-existence at a point that will hide the issue from new installs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)