You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/20 23:14:26 UTC

[GitHub] [airflow] vitaly-krugl opened a new issue, #25209: Airflow database migration version scripts are unsafe, guaranteed to fail in the future

vitaly-krugl opened a new issue, #25209:
URL: https://github.com/apache/airflow/issues/25209

   ### Apache Airflow version
   
   2.3.2
   
   ### What happened
   
   Airflow uses unsafe practice in the implementation of multiple migration version scripts, including some recent ones that are guaranteed to fail in the future. Multiple migration version scripts import from Airflow itself, including imports of models.
   
   The problem with this approach is that while the logic implemented by those imports at the time when the version script is created works fine, this may no longer be the case some number of releases later, because the implementation of those Airflow imports will change, so that when those functions/methods/classes are executed in the future (when migrating from an older version of airflow to a newer one), the new implementation will no longer match the state of the  user's system which is being upgraded.
   
   For example, if an older migration version script attempts to load a Model from database using the code of the target - newer - version of Airflow code, the fetching of the Model will if the new model class contains columns that didn't yet exist in the version of the system that is being upgraded by the user.
   
   This unsafe practice is used in many version scripts, including some of the most recent ones:
   
   ```
   $ git grep  import | grep airflow | grep model
   db_types.py:    from airflow.models.base import StringID
   env.py:from airflow import models, settings
   versions/0023_1_8_2_add_max_tries_column_to_task_instance.py:from airflow.models import DagBag
   versions/0054_1_10_10_add_dag_code_table.py:from airflow.models.dagcode import DagCode
   versions/0061_2_0_0_increase_length_of_pool_name.py:from airflow.models.base import COLLATION_ARGS
   versions/0064_2_0_0_add_unique_constraint_to_conn_id.py:from airflow.models.base import COLLATION_ARGS
   versions/0073_2_0_0_prefix_dag_permissions.py:from airflow.www.fab_security.sqla.models import Action, Permission, Resource
   versions/0092_2_2_0_add_data_interval_start_end_to_dagmodel_and_dagrun.py:from airflow.migrations.db_types import TIMESTAMP
   versions/0099_2_3_0_add_task_log_filename_template_model.py:from airflow.migrations.utils import disable_sqlite_fkeys
   versions/0099_2_3_0_add_task_log_filename_template_model.py:from airflow.utils.sqlalchemy import UtcDateTime
   versions/0100_2_3_0_add_taskmap_and_map_id_on_taskinstance.py:from airflow.models.base import StringID
   ```
   
   Please note that this issue is not purely theoretical! Here is an actual example from a production system: #25075. Please don't invalidate the current ticket due to the example being from an older version of Airflow because the same error-prone practice is found in more recent migration version scripts, so this type of failure is guaranteed to happen again.
   
   ### What you think should happen instead
   
   The implementations of migration version scripts MUST NOT make use Airflow functions/methods/classes - such as Models - whose implementations could change in the future (as it did in the reported issue #25075) - such as columns having been renamed/added/removed in more recent versions - resulting in guaranteed database migration failures in the future.
   
   Also, to ensure robust database migration, Airflow maintainers - at a minimum - need to automate testing of migrations from various versions starting with a database where all tables are populated. Such test code needs to test migrations/upgrades starting from tables populated by different versions of airflow to ensure that migrations are safe and reliable.
   
   
   
   ### How to reproduce
   
   See issue #25075 as an example of reproducing. Please don't invalidate the current ticket due to the example being from an older version of Airflow because the same error-prone practice is found in more recent migration version scripts, so this type of failure is guaranteed to happen again.
   
   ### Operating System
   
   OSX and Linux
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-ftp==2.1.2
   apache-airflow-providers-http==2.1.2
   apache-airflow-providers-imap==2.2.3
   apache-airflow-providers-sqlite==2.1.3
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #25209: Airflow database migration version scripts are unsafe, have failed, and are guaranteed to fail in the future

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #25209:
URL: https://github.com/apache/airflow/issues/25209#issuecomment-1190916431

   Of the examples you provided, only migrations 23, 54 and 73 a problematic (and you can see they are all fairly old). The rest are constants and helper functions that are intended to be used in migrations and guaranteed to not change. We should probably move them to somewhere else and add dev documentation to make this clear, but using them is not a problem on itself.
   
   As for the actual problematic ones, feel free to fix them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vitaly-krugl commented on issue #25209: Airflow database migration version scripts are unsafe, have failed, and are guaranteed to fail in the future

Posted by GitBox <gi...@apache.org>.
vitaly-krugl commented on issue #25209:
URL: https://github.com/apache/airflow/issues/25209#issuecomment-1190981965

   One of the outcomes of this issue, besides fixing them and having adequate
   tests, is having guidelines in a migration/versions/README, or something
   like that, which discusses the unsafe practices to avoid. This would help
   maintainers and reviewers alike.
   
   On Wed, Jul 20, 2022, 8:55 PM Tzu-ping Chung ***@***.***>
   wrote:
   
   > Of the examples you provided, only migrations 23, 54 and 73 a problematic
   > (and you can see they are all fairly old). The rest are constants and
   > helper functions that are intended to be used in migrations and guaranteed
   > to not change. We should probably move them to somewhere else and add dev
   > documentation to make this clear, but using them is not a problem on itself.
   >
   > As for the actual problematic ones, feel free to fix them.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/airflow/issues/25209#issuecomment-1190916431>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAK72KSEPXX7QPJK23U22PLVVCNXJANCNFSM54FPL3FA>
   > .
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25209: Airflow database migration version scripts are unsafe, have failed, and are guaranteed to fail in the future

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25209:
URL: https://github.com/apache/airflow/issues/25209#issuecomment-1191915218

   @vitaly-krugl - if you would like to propose some guidelines - feel free to make PR. Like everything else in Airflow.,
   
   Airflow is created by > 2100 contributors and if you would lilke to contribute such README and guidelines. That's cool. Seems your experience and ideas and cool and it's the best you ca do to help the community to contribute a PR with such quidelines.
   
   Also I would lilke to draw your attention, to the fact that we tend not only to raise problems when we notice them, and document them in READMES but also automate most of such checks. So if you are able to come up with a proposal on how to automate such checks and fail PRs that introduce bad practices, adding a pre-commit that performs such checks is ideal. We have already ~ 100 pre-commits like this, so you are most welcome to devlop and submit pre-commit that performs such checks. That would be fantastic contribution.
   
   BTW. Converting this into discussion - this is not a bug "per se" in airflow. It's the usual (we had probably hundreds of them) discussions on how we can improve our dev environment - and usually each such discussion ended up as a PR from those who raised it, rather than "bug" issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #25209: Airflow database migration version scripts are unsafe, have failed, and are guaranteed to fail in the future

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #25209: Airflow database migration version scripts are unsafe, have failed, and are guaranteed to fail in the future
URL: https://github.com/apache/airflow/issues/25209


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org