You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/27 11:38:59 UTC

[GitHub] [airflow] Taragolis opened a new pull request, #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Taragolis opened a new pull request, #25336:
URL: https://github.com/apache/airflow/pull/25336

   Follow up #25256
   
   Right now `region_name` and `config` (botocore.config.Config) could be set in different places with this precedence rules (high to low):
   * Hooks methods arguments
   * Hooks attribute
   * In connection
   
   With this PR I tried to do in one place (AwsConnectionWrapper) by:
   1. Allow specify init values for `botocore_config` and `region_name` and keep them in wrapper
   2. Allow create AwsConnectionWrapper from another AwsConnectionWrapper
   
   Also this PR might introduce some breaking changes, previously if `config` specified in Connection than it always overrides config which explicit set in Hook arguments
   
   https://github.com/apache/airflow/blob/432977be0cd1e95f623fa5edda2a227798fa2939/airflow/providers/amazon/aws/hooks/base_aws.py#L399-L402
   
   https://github.com/apache/airflow/blob/28db8c10b2422d99217658a039cc6dc45a38ff51/airflow/providers/amazon/aws/hooks/base_aws.py#L454-L460


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1199172909

   We just have to make sure that we describe it in the changelog (possibly with some instructions what to do) if we decide to classify it as bug fix - explaining why we think it is a bug-fix, not a breaking change. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1199748032

   Sorry, just catching up on this convo.   I'm inclined to agree, I think it feels like a bug fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1199833974

   List of bug fixes and improvements which could affect users.
   
   ---
   
   `config` - [botocore.config.Config](https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html) after this PR will follow this precedence rules:
      1. `config` argument in hooks methods `get_client_type` and `get_resource_type` - Operators/Sensors in providers packages doesn't use this methods.
      2. `config` argument in hook initialisation - Operators/Sensors in providers packages do not use this argument in hook initialisation right now.
      3. Construct from `config_kwargs` in Connection if this key exists and not empty
   
   Since 1.0.0 version of amazon-provider use this rules
      1. `config` argument which set in methods `get_client_type` and `get_resource_type`
      2. If `config_kwargs` key exists in Connection then construct Config and overwrite `config` Hook attribute 
      3. If `config_kwargs` key not exists in Connection then use argument hook `config` argument
   
   In additional I personally think that `config` too broad name and better rename in next PR to `botocore_config` and deprecate `config`. Some Operators use their own `config`: 
   * [airflow.providers.amazon.aws.operators.glue_crawler.GlueCrawlerOperator](https://github.com/apache/airflow/blob/9bc0d880789ae6cccad61c8f3f899bda38688c30/airflow/providers/amazon/aws/operators/glue_crawler.py#L39)
   * [airflow.providers.amazon.aws.operators.sagemaker.SageMakerBaseOperator](https://github.com/apache/airflow/blob/9bc0d880789ae6cccad61c8f3f899bda38688c30/airflow/providers/amazon/aws/operators/sagemaker.py#L39)
   
   ---
   
   Fallback to boto3 default Credential strategy only if provided `aws_conn_id` not exists and inform the user. Personally I think that in some point of the future if provided `aws_conn_id` not exists, than better just raise an error, rather than fallback to default boto3 strategy. If user want do not want use connection Hook already support `aws_conn_id=None` and it is documented feature.
   
   Right now Hook catch too broad `AirflowException`, it could be a reason why some methods in Session Factory raises something like `RuntimeError`. `AirflowException` raises in:
      1. If user set not supported auth web identity method. Now only `google` supported. So i think it is legit to stop execution on this point.
      2. In legacy configurations files, this function migrated from S3Hook to AwsHook in Airflow 1.9 and never been documented.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
eladkal commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1199140660

   >  if the next AWS provider release will get a major bump (and if so, we might also want to remove some deprecations).
   
   I can take care of it if we decide to release major version
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1196921652

   A loot of errors. Ah... just forget that AwsHook fallback to default strategy in case if Airflow Connection not exists.
   
   @potiuk Interesting why test on pg/Sqlite failed and on MySQL/MSSQL pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1199094601

   > @potiuk Interesting why test on pg/Sqlite failed and on MySQL/MSSQL pass 
   
   We do have some intermittent errors more often we would like to have and continuously working on improving - but as in all large and complex system, this is a constant, almost daily struggle and uphill battle :) - but we have longer and longer periods of mostly-green PRs and I hope it will get only better over time as we add some optimisations and improvements :).
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1196829639

   errors :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1202420620

   @ferruzzi @vincbeck @o-nikolas WDYT about deprecate switch to default Boto3 Credential Strategy if user specified not existed `aws_conn_id`?
   Right now `aws_conn_id=None` and `aws_conn_id="conn_no_exists"` do the same.
   However `aws_conn_id=None` is documented and `aws_conn_id="conn_no_exists"` not documented.
   
   I thought it would be nice if we warn user that need to specify existed aws_conn_id or explicit set to None.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1199197634

   > I would suggest might be create create major release. Even if I could prof that it just bug fixes I also have at least 4-5 different changes in session factory creation or aws base hook which could also unlock use connection properties in aws secrets backends (#25326).
   > 
   > Many of them introduce "small bug fix" or "small changing behaviour" which can in general classified as 'breaking changes"
   
   This is fine and if we do so, we will remove most "easy" deprecations (following our more aggressive deprecation removal policy). 
   
   I think this might also give stronger signal to @ferruzzi @vincbeck @o-nikolas that they might want to (similarly as Google team) want to think about cherry-picking some bugfixes to some past version of the provider following https://github.com/apache/airflow/blob/main/README.md#release-process-for-providers  - happy to assist with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
vincbeck commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1199324372

   I am fine with that


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1201712783

   <img width="1046" alt="Screenshot 2022-08-01 at 22 57 13" src="https://user-images.githubusercontent.com/595491/182245396-75ba82e1-adab-47a9-bb73-b0b77ce077ec.png">
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1201585337

   > We do have some intermittent errors more often we would like to have and continuously working on improving - but as in all large and complex systems, this is a constant, almost daily struggle and uphill battle
   
   I noticed your reply just now. Failed tests showed me that my initial changes was wrong so it wasn't false positive. However tests only failed on this checks:
   - `Postgres10,Py3.7: Always Providers[amazon,apache.hive,google,mysql,postgres]`
   - `Sqlite Py3.7: Always Providers[amazon,apache.hive,google,mysql,postgres]`
   
   But not on this checks:
   - `MSSQL2017-latest, Py3.7: Always Providers[amazon,apache.hive,google,mysql,postgres]`
   - `MySQL5.7, Py3.7: Always Providers[amazon,apache.hive,google,mysql,postgres]`
   
   Seems like providers tests only run for PostgreSQL and SQLite backends. In this case, the behavior is expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #25336:
URL: https://github.com/apache/airflow/pull/25336


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1199097409

   @vincbeck @Taragolis - I have not look at very detail of that - the code looks good - but the question is is the "backwards-compatibility" problem mentioned above enough to be a "breaking change" or merely an inconvenience? WDYT? That will determine if the next AWS provider release will get a major bump (and if so, we might also want to remove some deprecations). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1199171639

   > > @vincbeck @Taragolis - I have not look at very detail of that - the code looks good - but the question is is the "backwards-compatibility" problem mentioned above enough to be a "breaking change" or merely an inconvenience? WDYT?
   > 
   > As always it is depends on. Personally for me it might be "bug fixes" and "stop catch to broad exceptions" however someone code might not work anymore. In most cases user should get an error, so it not happen implicit... except changing behaviour with `config`.
   
   Yep. Agree "breaking change" and "bug fix" has a blurry border :) . For me the best criteria I have to decide is  will it make our users fix a problem they did not realise they had when the change is implemented. If we fix an actually wrong user behaviour and assumptions that worked "accidentally" before and were not really intentional - then this is not a breaking change.
   
   Which I hear from your description is the case right ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1199194280

   I would suggest might be create create major release. Even if I could prof that it just bug fixes I also have at least 4-5 different changes in session factory creation or aws base hook which could also unlock use connection properties in aws secrets backends (#25326).
   
   Many of them introduce "small bug fix" or "small changing behaviour" which can in general classified as 'breaking changes"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] vincbeck commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
vincbeck commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1202680594

   I agree, I dont like too having a "hidden"mechanism which takes the default boto3 connection if the connection specified does not exist (or equal `None`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1196987195

   Add explicit check for `aws_conn_id` and fallback to `None` in case if it missing - almost the same as it worked before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1197075298

   New possible backward incompatibility: Previously Hook switch to default boto3 strategy if AirflowException raised, after recent changes only if `aws_conn_id` not Found (AirflowNotFoundException).
   
   Places where it could happen 
   
   **Assume role by unsupported federation**
   https://github.com/apache/airflow/blob/5d4abbd58c33e7dfa8505e307d43420459d3df55/airflow/providers/amazon/aws/hooks/base_aws.py#L276-L293
   
   **Legacy config file parser**
   https://github.com/apache/airflow/blob/5d4abbd58c33e7dfa8505e307d43420459d3df55/airflow/providers/amazon/aws/utils/connection_wrapper.py#L248-L252


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1205537303

   Add information about deprecation fallback in case of conn_id not exists


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1199157165

   >@vincbeck @Taragolis - I have not look at very detail of that - the code looks good - but the question is is the "backwards-compatibility" problem mentioned above enough to be a "breaking change" or merely an inconvenience? WDYT?
   
   As always it is depends on. Personally for me it might be "bug fixes" and "stop catch to broad exceptions" however someone code might not work anymore. In most cases user should get an error, so it not happen implicit... except changing behaviour with `config`.
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25336: Resolve Amazon Hook's `region_name` and `config` in wrapper

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25336:
URL: https://github.com/apache/airflow/pull/25336#issuecomment-1201711974

   > Seems like providers tests only run for PostgreSQL and SQLite backends. In this case, the behavior is expected.
   
   Correct. If you unfold "Determine how tests are run" entry - it explains it. 
   MySQL and MsSQL use much more memory than PostgreSQL/Sqlite that's why we skip provider tests for those.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org