You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Chris Schmautz (Jira)" <ji...@apache.org> on 2020/01/22 18:16:00 UTC

[jira] [Commented] (AIRFLOW-6609) Airflow upgradedb fails serialized_dag table add on revision id d38e04c12aa2

    [ https://issues.apache.org/jira/browse/AIRFLOW-6609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021345#comment-17021345 ] 

Chris Schmautz commented on AIRFLOW-6609:
-----------------------------------------

Either checking ahead of table existence before the addition, or dropping the table ahead of the addition, works to mitigate the error.

Not sure why the error came up - resetdb numerous times at the lower revision with little effect. I was able to step incrementally through revisions until the 1.10.6 to 1.10.7 migration specifically.

> Airflow upgradedb fails serialized_dag table add on revision id d38e04c12aa2
> ----------------------------------------------------------------------------
>
>                 Key: AIRFLOW-6609
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6609
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: database
>    Affects Versions: 1.10.7
>            Reporter: Chris Schmautz
>            Priority: Major
>              Labels: database, postgres
>
> We're attempting an upgrade from 1.10.3 to 1.10.7 to use some of the great features available in later revisions; however, the upgrade from 1.10.6 to 1.10.7 is causing some heartburn.
> +Runtime environment:+
>  - Docker containers for each runtime segment (webserver, scheduler, flower, postgres, redis, worker)
>  - Using CeleryExecutor queued with Redis
>  - Using Postgres backend
>  
> +Steps to reproduce:+
>  1. Author base images relating to each version of Airflow between 1.10.3 and 1.10.7 (if you want the full regression we have done)
>  2. 'airflow initdb' on revision 1.10.3
>  3. Start up the containers, run some dags, produce metadata
>  4. Increment / swap out base image revision from 1.10.3 base to 1.10.4 base image
>  5. Run 'airflow upgradedb'
>  6. Validate success
>  n. Eventually you will get to the 1.10.6 revision, stepping up to 1.10.7, which produces the error below
>  
> {code:java}
> INFO  [alembic.runtime.migration] Running upgrade 6e96a59344a4 -> d38e04c12aa2, add serialized_dag table
> Revision ID: d38e04c12aa2
> Revises: 6e96a59344a4
> Create Date: 2019-08-01 14:39:35.616417
> Traceback (most recent call last):
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1246, in _execute_context
>     cursor, statement, parameters, context
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 581, in do_execute
>     cursor.execute(statement, parameters)
> psycopg2.errors.DuplicateTable: relation "serialized_dag" already exists
> The above exception was the direct cause of the following exception:Traceback (most recent call last):
>   File "/opt/anaconda/miniconda3/envs/airflow/bin/airflow", line 37, in <module>
>     args.func(args)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/utils/cli.py", line 75, in wrapper
>     return f(*args, **kwargs)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/bin/cli.py", line 1193, in upgradedb
>     db.upgradedb()
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/utils/db.py", line 376, in upgradedb
>     command.upgrade(config, 'heads')
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/command.py", line 298, in upgrade
>     script.run_env()
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/script/base.py", line 489, in run_env
>     util.load_python_file(self.dir, "env.py")
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/util/pyfiles.py", line 98, in load_python_file
>     module = load_module_py(module_id, path)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/util/compat.py", line 173, in load_module_py
>     spec.loader.exec_module(module)
>   File "<frozen importlib._bootstrap_external>", line 678, in exec_module
>   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/migrations/env.py", line 96, in <module>
>     run_migrations_online()
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/migrations/env.py", line 90, in run_migrations_online
>     context.run_migrations()
>   File "<string>", line 8, in run_migrations
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/runtime/environment.py", line 846, in run_migrations
>     self.get_context().run_migrations(**kw)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/runtime/migration.py", line 518, in run_migrations
>     step.migration_fn(**kw)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/airflow/migrations/versions/d38e04c12aa2_add_serialized_dag_table.py", line 54, in upgrade
>     sa.PrimaryKeyConstraint('dag_id'))
>   File "<string>", line 8, in create_table
>   File "<string>", line 3, in create_table
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/operations/ops.py", line 1250, in create_table
>     return operations.invoke(op)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/operations/base.py", line 345, in invoke
>     return fn(self, operation)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/operations/toimpl.py", line 101, in create_table
>     operations.impl.create_table(table)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/ddl/impl.py", line 252, in create_table
>     self._exec(schema.CreateTable(table))
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/alembic/ddl/impl.py", line 134, in _exec
>     return conn.execute(construct, *multiparams, **params)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 982, in execute
>     return meth(self, multiparams, params)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/sql/ddl.py", line 72, in _execute_on_connection
>     return connection._execute_ddl(self, multiparams, params)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1044, in _execute_ddl
>     compiled,
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1250, in _execute_context
>     e, statement, parameters, cursor, context
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1476, in _handle_dbapi_exception
>     util.raise_from_cause(sqlalchemy_exception, exc_info)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
>     reraise(type(exception), exception, tb=exc_tb, cause=cause)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
>     raise value.with_traceback(tb)
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1246, in _execute_context
>     cursor, statement, parameters, context
>   File "/opt/anaconda/miniconda3/envs/airflow/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 581, in do_execute
>     cursor.execute(statement, parameters)
> sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateTable) relation "serialized_dag" already exists[SQL:
> CREATE TABLE serialized_dag (
> 	dag_id VARCHAR(250) NOT NULL,
> 	fileloc VARCHAR(2000) NOT NULL,
> 	fileloc_hash INTEGER NOT NULL,
> 	data JSON NOT NULL,
> 	last_updated TIMESTAMP WITHOUT TIME ZONE NOT NULL,
> 	PRIMARY KEY (dag_id)
> )]
> (Background on this error at: http://sqlalche.me/e/f405)
> {code}
>  
>  
> It doesn't make much sense seeing [only one reference|https://github.com/apache/airflow/blob/1.10.7/airflow/migrations/versions/d38e04c12aa2_add_serialized_dag_table.py#L48] to this table addition in the codebase so... not sure why this migration is going awry.
> +Possible solutions:+
>  - Instead of bailing out, it may be more productive to issue warnings when these things fail instead. The intent of the migration process is to say 'you can't run on version x' but here I'm more confused about the migration outcome.
>  - Migrations could check ahead for patches being applied ahead of revision (we did this for a bug found in later revisions, for a different backend MSSQL); this could add more overhead but metadata upgrades could at least be then self-aware
>  - Something else I'm missing in the broader picture
>  
> If the db truly already has the table, end users would still be able to upgrade their version, so it's kind of odd to have an error changing revisions.. if things are already in place for the future revision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)