You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/07 10:45:36 UTC

[GitHub] [airflow] ashb opened a new issue #8765: Create an `airflow upgrade-check` command in 1.10 to eas upgrade path to 2.0.

ashb opened a new issue #8765:
URL: https://github.com/apache/airflow/issues/8765


   To make it easier for users to upgrade from 1.10 to 2.0 (when it eventually comes out) we should create a single `upgrade-check` command in 1.10 that checks the following things:
   
   - [ ] Config values are in the right place.
   
     For this to work 1.10 needs to support both old and new style. 2.0 has deprecation warnings for all the old config names (hostname_callbable `:` to `.`, moving logging to it's own section etc) but if we backport those, we can have 2.0 just look at the new style if we wanted.
   - [ ] Check for dags using old imports
   
     This command should check that the `apache-airflow-backport-providers-$x` module is installed, and that the DAGs are using them.
   
   (Non exhaustive list, should be added to as we think of more.
   
   We could also have a mode that makes some of these changes in place (with confirmation from user) to automate it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-656725316


   I think some of the changes we can apply easily and safely. I think for example all the deprecation notices can be fixed rather easily - we already have information about it (we test all the depracations - I already used it in backport packages) - so we can super-easily and safely apply those in automated way.  Similarly some of the changes above (rename parameter names etc.). I would really like to provide such tool to the users where it is no-brainer and can be automated. 
   
   For most other changes I think we should simply detect potential problems and flag them - providing helpful hint how the problem could be fixed. 
   
   And I agree GDOc about it might be better than issues/splitting them now. We can even split the work among ourselves and make recommendations individually for each of those - happy to share this with the rest. Then we can review them and turn into issues.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-656721340


   @turbaszek Can you create Google Docs about it? Without a broader context, the titles of changes are of little help.
   
   I think we should now prepare a plan that will describe how we can detect breaking change and determine what to do when we detect them.
   ```
   # Move operators to providers package (Rule 1)
   
   We should load all DAG using DAGBag and check whether any deprecated operators are used. When it is detected, we can make changes automatically using Bowler.
   
   This rule resolves the following entries from UPDATING.md
   -....
   ```
   A lot of changes are backward compatible, and if not, we also need to detect and notify them.
   ```
   # Detectt provide_context in Python Operaator
   
   We should check ....when this happens, we should inform the user that in Airflow this code may cause problems.
   
   This rule resolves the following entries from UPDATING.md
   - Remove provide_context
   
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-662921022


   > * Complete the documentation to make the information for the user clearer, e.g. prepare a [migration guide for CLI](https://github.com/apache/airflow/issues/9952) that will compare new and old commands
   
   Yes I think the upgrade guide for CLI is a much better option than scrpits checking.
   
   > * Many problems will only happen during the execution of the task. We can [catch these exceptions](https://docs.python.org/3/library/warnings.html#warnings.catch_warnings) and then save them to the database to make them more accessible to the user. The infinite log file is not the best way to communicate with the end-user. Especially when we do not have one fie, but an infinite number of them.
   
   Interesting idea, but do you have any idea how to catch "these exceptions" (related to Airflow 2.0) not all?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-662929017


   So your idea is to collect all exceptions to db, not only those related to migration / Airflow2?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-691977374


   Base for `airflow upgrade-check` command is merged (#9276 ) so we can decide what rules we would like to implement. Once we have an initial list I will create issues so multiple people can work on this.
   
   Based on this gdoc:
   > https://docs.google.com/document/d/17tB9KZrH871q3AEafqR_i2I7Nrn-OT7le_P49G65VzM/edit?usp=sharing
   
   We may consider the following rules as a good start (those are major breaking changes):
   - `ConnTypeIsNotNullableRule` - Not-nullable conn_type column in connection table (done)
   - `UniqueConnIdRule` - Unique conn_id in connection table
   - `CutomOperatorUsesMetaclassRule` - BaseOperator uses metaclass
   - `UsingSQLFromBaseHookRule` - Remove SQL support in base_hook
   - `ChainBetwenDAGAndOperatorNotAllowedRule` - Assigning task to a DAG using bitwise shift (bit-shift) operators are no longer supported
   - `AirflowMacroPluginRemovedRule` - Removal of airflow.AirflowMacroPlugin class
   - `NoAdditionalArgsInOperatorsRule` - Additional arguments passed to BaseOperator cause an exception
   - `MesosExecutorRemovedRule` - Removal of Mesos Executor
   
   and config related changes:
   - `HostnameCallableRule` - Unify `hostname_callable` option in `core` section
   - `StatNameHandlerNotSupportedRule` - Drop plugin support for stat_name_handler
   - `LoggingConfigurationRule` - Logging configuration has been moved to new section
   - `NoGCPServiceAccountKeyInConfigRule` - Remove gcp_service_account_keys option in airflow.cfg file
   - `FernetEnabledRule` - Fernet is enabled by default
   - `KubernetesWorkerAnnotationsRule` - Changes to propagating Kubernetes worker annotations
   - `LegacyUIDeprecatedRule` - Deprecate legacy UI in favor of FAB RBAC UI
   - `TaskHandlersMovedRule` - GCSTaskHandler has been moved, WasbTaskHandler has been moved, StackdriverTaskHandler has been moved , S3TaskHandler has been moved, ElasticsearchTaskHandler has been moved, CloudwatchTaskHandler has been  moved
   - `SendGridMovedRule` - SendGrid emailer has been moved
   - `CustomExecutorsRequireFullPathRule` - Custom executors is loaded using full import path
   
   In case of operator related changes I would like to suggest a single rule `OperatorChangesRule`. It will use a map `old_operator_name -> list of possible problems` so we can create a single DagBag and scan all used operators and raise information about changes. It should also suggest what providers packages users should use. I think this is much better approach than having a rule for each operator...
   
   I think we should first **focus on the "check and warn" approach so we have any tool**. Once we have it we may consider **"check and apply changes"** - even, in this case, we should probably first focus on major problems (config, new imports, etc). The second step can use "report" generated by `airflow upgrade-check` to apply changes (still can be single command but run wit additional flag).
   
   Summoning all elders: @mik-laj @vikramkoka @ashb @potiuk @kaxil @feluelle @dimberman @houqp @ryw @Fokko 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-660077522


   > so we shouldn't change their files without confirmation (perhaps show a diff then ask y/n?)
   
   Bowler has this option 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-662912812


   @turbaszek I looked at this document. Great job!
   
   I have looked at what changes we have and I think one tool may not solve all our problems. I think that apart from this command, we should take additional action.
   - Complete the documentation to make the information for the user clearer, e.g. prepare a [migration guide for CLI](https://github.com/apache/airflow/issues/9952) that will compare new and old commands
   - Many problems will only happen during the execution of the task. We can [catch these exceptions](https://docs.python.org/3/library/warnings.html#warnings.catch_warnings) and then save them to the database to make them more accessible to the user. The infinite log file is not the best way to communicate with the end-user. Especially when we do not have one fie, but an infinite number of them.
   
   I also added other ideas to new section - "Other ideas" in your docs. 
   
   PS. I created "area:upgrade" label on Github to track the related issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-661885967


   I've drafted this doc:
   https://docs.google.com/document/d/17tB9KZrH871q3AEafqR_i2I7Nrn-OT7le_P49G65VzM/edit?usp=sharing
   
   The notes from UPDATING.md are split into few general groups (some are present in few places). I suppose that not all entries are important to users / will impact their Airflows. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-657146405


   We should definitely create the rules similar to what Kamil did in his PR and additionally have a flag that automatically fixes (where possible).
   
   Google Doc, Confluence (https://cwiki.apache.org/confluence/display/AIRFLOW/) is fine if we want to review and add comments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-656712322


   @ash @mik-laj @potiuk @kaxil WDYT?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-691994272


   Sounds great to start with. I am sure we will have more rules added by the users, but those seem like great to create issues for.
   And yeah. One rule to rule them all ("naming changes") sound great. But I'd name it "ImportChangesRule". We renamed Hooks, and Secret Backends and Sensors - not only operators :).
   
   Elder Jarek.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-662924492


   > Interesting idea, but do you have any idea how to catch "these exceptions" (related to Airflow 2.0) not all?
   
   I think we can detect all problems and display them to the user. Why do you want to hide some problems?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-662912812


   @turbaszek I looked at this document. Great job!
   
   I have looked at what changes we have and I think one tool may not solve all our problems. I think that apart from this command, we should take additional action.
   - Complete the documentation to make the information for the user clearer, e.g. prepare a migration guide for CLI that will compare new and old commands
   - Many problems will only happen during the execution of the task. We can [catch these exceptions](https://docs.python.org/3/library/warnings.html#warnings.catch_warnings) and then save them to the database to make them more accessible to the user. The infinite log file is not the best way to communicate with the end-user. Especially when we do not have one fie, but an infinite number of them.
   
   I also added other ideas to new section - "Other ideas" in your docs. 
   
   PS. I created "area:upgrade" label on Github to track the related issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-656725316


   I think some of the changes we can apply easily and safely. I think for example all the deprecation notices can be fixed rather easily - we already have information about it (we test all the depracations - I already used it in backport packages) - so we can super-easily and safely apply those in automated way.  Similarly some of the changes above (rename parameter names etc.). I would really like to provide such tool to the users where it is no-brainer and can be automated. 
   
   For most other changes I think we should simply detect potential problems and flag them - providing helpful hint how the problem could be fixed. 
   
   And I agree GDOc about it might be better than issues/splitting them currently we can even split the work among ourselves and make recommendations individually for each of those - happy to share this with the rest. Then we can review them and turn into issues.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-663004615


   I got it, I was misled by the "exceptions" but we are talking about warnings


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb closed issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
ashb closed issue #8765:
URL: https://github.com/apache/airflow/issues/8765


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-656712140


   I screened the UPDATING.md for "breaking" changes that can be check / communicated. I selected those explicitly saying "breaking" or sounding like that. In general, we can split them into three categories:
   - python code changes (related to DAGs etc)
   - config changes
   - cli changes (related to scripts etc)
   
   I ended up with a list of 59 of changes:
   ```
    'BaseOperator uses metaclass',
    'Not-nullable conn_type column in connection table',
    'Change signature of BigQueryGetDatasetTablesOperator',
    'Unify `hostname_callable` option in `core` section',
    'Changes in BigQueryHook',
    'Use project_id argument consistently across GCP hooks and operators',
    'GCSUploadSessionCompleteSensor signature change',
    'Remove SQL support in base_hook',
    'Changes to SalesforceHook',
    'Rename parameter name in PinotAdminHook.create_segment',
    'Rename parameter name in HiveMetastoreHook.get_partitions',
    'Remove unnecessary parameter in FTPHook.list_directory',
    'Remove unnecessary parameter in PostgresHook function copy_expert',
    'Change parameter name in OpsgenieAlertOperator',
    'Assigning task to a DAG using bitwise shift (bit-shift) operators are no longer supported',
    'Custom executors is loaded using full import path',
    'Removed sub-package imports from `airflow/__init__.py`',
    'Drop plugin support for stat_name_handler',
    'Standardize handling http exception in BigQuery',
    'Remove airflow.utils.file.TemporaryDirectory',
    'Chain and cross_downstream moved from helpers to BaseOperator',
    'Change python3 as Dataflow Hooks/Operators default interpreter',
    'Logging configuration has been moved to new section',
    'Simplification of CLI commands',
    'Remove gcp_service_account_keys option in airflow.cfg file',
    'Removal of airflow.AirflowMacroPlugin class',
    ' Change default aws_conn_id in EMR operators',
    'Removal of redirect_stdout, redirect_stderr',
    'Changes to SQLSensor',
    'Idempotency in BigQuery operators',
    'AWS Batch Operator',
    'Additional arguments passed to BaseOperator cause an exception',
    'Simplification of the TriggerDagRunOperator',
    'Changes in Google Cloud Platform related hooks',
    'Fernet is enabled by default',
    'Changes to Google PubSub Operators, Hook and Sensor',
    'Removed Hipchat integration',
    'The gcp_conn_id parameter in GKEPodOperator is required',
    'Changes to propagating Kubernetes worker annotations',
    'Remove provide_context',
    'Variables removed from the task instance context',
    'Moved provide_gcp_credential_file decorator to GoogleBaseHook',
    'Changes to S3Hook',
    'Changes to Google Transfer Operator',
    'Changes in  Google Cloud Transfer Hook',
    'CLI reorganization',
    'Removal of Mesos Executor',
    'Changes to SalesforceHook',
    'HTTPHook verify default value changed from False to True.',
    'Changes to GoogleCloudStorageHook',
    'Changes to CloudantHook',
    'Unify default conn_id for Google Cloud Platform',
    'Changes to sensor imports',
    'Deprecate legacy UI in favor of FAB RBAC UI',
    'CLI Changes',
    'Unification of `do_xcom_push` flag',
    'Changes to Dataproc related Operators',
    'Changes to skipping behaviour of LatestOnlyOperator',
    'Change default snowflake_conn_id for Snowflake hook and operators'
   ```
   
   I like the idea proposed by @mik-laj in #9467 with defining `Rules` and that's something we should follow I think.
   
   Questions that should be addressed before we move on:
   - do we plan to only "warn" or should we consider also "auto-refactor"?
   - if there's backward compatibility do we want to include such check?
   
   If we are ok with the approach proposed by Kamil I can evaluate those changes and create issues so the work can be split.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-662973049


   We can catch all the warnings, but those related to migration fall into one of two categories: DeprecationWarning, FutureWarning.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-660037705


   Gdoc fine for building up, but anything we point end users at should be in our official docs.
   
   A mode to doc things automatically is desirable, but probably not on by default - as much as I wish everyone used git, I know some people out there don't, so we shouldn't change their files without confirmation (perhaps show a diff then ask y/n?)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #8765: Create an `airflow upgrade-check` command in 1.10 to ease upgrade path to 2.0.

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #8765:
URL: https://github.com/apache/airflow/issues/8765#issuecomment-695341113


   Yup I am happy with the suggested list


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org