You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/20 23:14:26 UTC
[GitHub] [airflow] vitaly-krugl opened a new issue, #25209: Airflow database migration version scripts are unsafe, guaranteed to fail in the future
vitaly-krugl opened a new issue, #25209:
URL: https://github.com/apache/airflow/issues/25209
### Apache Airflow version
2.3.2
### What happened
Airflow uses unsafe practice in the implementation of multiple migration version scripts, including some recent ones that are guaranteed to fail in the future. Multiple migration version scripts import from Airflow itself, including imports of models.
The problem with this approach is that while the logic implemented by those imports at the time when the version script is created works fine, this may no longer be the case some number of releases later, because the implementation of those Airflow imports will change, so that when those functions/methods/classes are executed in the future (when migrating from an older version of airflow to a newer one), the new implementation will no longer match the state of the user's system which is being upgraded.
For example, if an older migration version script attempts to load a Model from database using the code of the target - newer - version of Airflow code, the fetching of the Model will if the new model class contains columns that didn't yet exist in the version of the system that is being upgraded by the user.
This unsafe practice is used in many version scripts, including some of the most recent ones:
```
$ git grep import | grep airflow | grep model
db_types.py: from airflow.models.base import StringID
env.py:from airflow import models, settings
versions/0023_1_8_2_add_max_tries_column_to_task_instance.py:from airflow.models import DagBag
versions/0054_1_10_10_add_dag_code_table.py:from airflow.models.dagcode import DagCode
versions/0061_2_0_0_increase_length_of_pool_name.py:from airflow.models.base import COLLATION_ARGS
versions/0064_2_0_0_add_unique_constraint_to_conn_id.py:from airflow.models.base import COLLATION_ARGS
versions/0073_2_0_0_prefix_dag_permissions.py:from airflow.www.fab_security.sqla.models import Action, Permission, Resource
versions/0092_2_2_0_add_data_interval_start_end_to_dagmodel_and_dagrun.py:from airflow.migrations.db_types import TIMESTAMP
versions/0099_2_3_0_add_task_log_filename_template_model.py:from airflow.migrations.utils import disable_sqlite_fkeys
versions/0099_2_3_0_add_task_log_filename_template_model.py:from airflow.utils.sqlalchemy import UtcDateTime
versions/0100_2_3_0_add_taskmap_and_map_id_on_taskinstance.py:from airflow.models.base import StringID
```
Please note that this issue is not purely theoretical! Here is an actual example from a production system: #25075. Please don't invalidate the current ticket due to the example being from an older version of Airflow because the same error-prone practice is found in more recent migration version scripts, so this type of failure is guaranteed to happen again.
### What you think should happen instead
The implementations of migration version scripts MUST NOT make use Airflow functions/methods/classes - such as Models - whose implementations could change in the future (as it did in the reported issue #25075) - such as columns having been renamed/added/removed in more recent versions - resulting in guaranteed database migration failures in the future.
Also, to ensure robust database migration, Airflow maintainers - at a minimum - need to automate testing of migrations from various versions starting with a database where all tables are populated. Such test code needs to test migrations/upgrades starting from tables populated by different versions of airflow to ensure that migrations are safe and reliable.
### How to reproduce
See issue #25075 as an example of reproducing. Please don't invalidate the current ticket due to the example being from an older version of Airflow because the same error-prone practice is found in more recent migration version scripts, so this type of failure is guaranteed to happen again.
### Operating System
OSX and Linux
### Versions of Apache Airflow Providers
apache-airflow-providers-ftp==2.1.2
apache-airflow-providers-http==2.1.2
apache-airflow-providers-imap==2.2.3
apache-airflow-providers-sqlite==2.1.3
### Deployment
Virtualenv installation
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #25209: Airflow database migration version scripts are unsafe, have failed, and are guaranteed to fail in the future
Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #25209:
URL: https://github.com/apache/airflow/issues/25209#issuecomment-1190916431
Of the examples you provided, only migrations 23, 54 and 73 a problematic (and you can see they are all fairly old). The rest are constants and helper functions that are intended to be used in migrations and guaranteed to not change. We should probably move them to somewhere else and add dev documentation to make this clear, but using them is not a problem on itself.
As for the actual problematic ones, feel free to fix them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] vitaly-krugl commented on issue #25209: Airflow database migration version scripts are unsafe, have failed, and are guaranteed to fail in the future
Posted by GitBox <gi...@apache.org>.
vitaly-krugl commented on issue #25209:
URL: https://github.com/apache/airflow/issues/25209#issuecomment-1190981965
One of the outcomes of this issue, besides fixing them and having adequate
tests, is having guidelines in a migration/versions/README, or something
like that, which discusses the unsafe practices to avoid. This would help
maintainers and reviewers alike.
On Wed, Jul 20, 2022, 8:55 PM Tzu-ping Chung ***@***.***>
wrote:
> Of the examples you provided, only migrations 23, 54 and 73 a problematic
> (and you can see they are all fairly old). The rest are constants and
> helper functions that are intended to be used in migrations and guaranteed
> to not change. We should probably move them to somewhere else and add dev
> documentation to make this clear, but using them is not a problem on itself.
>
> As for the actual problematic ones, feel free to fix them.
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/airflow/issues/25209#issuecomment-1190916431>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAK72KSEPXX7QPJK23U22PLVVCNXJANCNFSM54FPL3FA>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25209: Airflow database migration version scripts are unsafe, have failed, and are guaranteed to fail in the future
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25209:
URL: https://github.com/apache/airflow/issues/25209#issuecomment-1191915218
@vitaly-krugl - if you would like to propose some guidelines - feel free to make PR. Like everything else in Airflow.,
Airflow is created by > 2100 contributors and if you would lilke to contribute such README and guidelines. That's cool. Seems your experience and ideas and cool and it's the best you ca do to help the community to contribute a PR with such quidelines.
Also I would lilke to draw your attention, to the fact that we tend not only to raise problems when we notice them, and document them in READMES but also automate most of such checks. So if you are able to come up with a proposal on how to automate such checks and fail PRs that introduce bad practices, adding a pre-commit that performs such checks is ideal. We have already ~ 100 pre-commits like this, so you are most welcome to devlop and submit pre-commit that performs such checks. That would be fantastic contribution.
BTW. Converting this into discussion - this is not a bug "per se" in airflow. It's the usual (we had probably hundreds of them) discussions on how we can improve our dev environment - and usually each such discussion ended up as a PR from those who raised it, rather than "bug" issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #25209: Airflow database migration version scripts are unsafe, have failed, and are guaranteed to fail in the future
Posted by GitBox <gi...@apache.org>.
potiuk closed issue #25209: Airflow database migration version scripts are unsafe, have failed, and are guaranteed to fail in the future
URL: https://github.com/apache/airflow/issues/25209
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org