You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/07 22:00:25 UTC

[GitHub] [airflow] pankajastro opened a new pull request, #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

pankajastro opened a new pull request, #24306:
URL: https://github.com/apache/airflow/pull/24306

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragement file, named `{pr_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149984579

   I'm just worry that docstring told that this argument for EMR Connection, but we assume that it could be AWS Connection
   
   https://github.com/apache/airflow/blob/01a52ccb53bc805308bd085dc51ae4674dd5eb4a/airflow/providers/amazon/aws/operators/emr.py#L288-L289
   
   I think if we want keep it simpler, allow user to skip set `emr_conn_id` and do not allow use other connection type by mistake we could simply do this
   
   ```python
   config = {}
   if self.emr_conn_id:
       emr_conn = self.get_connection(self.emr_conn_id)
       if emr_conn.conn_type != 'emr':
           raise AirflowError(
               f"Expected 'emr' connection type for `emr_conn_id`={self.emr_conn_id} but got {emr_conn.conn_type!r}"
           )
       config = emr_conn.extra_dejson.copy()
   ```
   
   WDYT @potiuk @eladkal @pankajastro 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1150009182

   > Basically, emr_conn_id should not mandatory.
   
   I mean if user set `emr_conn_id="this_connection_not_exists"` this connection not exists and should raise an error rather than set to empty extras. And `emr_conn_id=None` - is not mandatory.
   
   
   I think it also required make changes in operator
   https://github.com/apache/airflow/blob/287fc4bdeb107fc81653d3a6721f9f5b87b90529/airflow/providers/amazon/aws/operators/emr.py#L304-L305


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #24306: Fix: `emr_conn_id` should be optional in `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1150222260

   > Then this will be a breaking change if someone has created emr_conn_id with the name emr_default and not passing emr_conn_id param in operator, isn't it?
   
   Yeah, you right, for example EMR example DAG doesn't set `emr_conn_id`, it probably won't hurt if it uses empty connection. But if user previously stored some values in `emr_conn_id` it will
   
   https://github.com/apache/airflow/blob/047a6162b0b4cbf07fe2fd978e335839a7d3900b/airflow/providers/amazon/aws/example_dags/example_emr.py#L77-L80
   
   So it would be nice update docstring about behaviour of  `emr_conn_id` as well as for `aws_conn_id` (it could be grabbed from AwsBaseHook)
   
   https://github.com/apache/airflow/blob/047a6162b0b4cbf07fe2fd978e335839a7d3900b/airflow/providers/amazon/aws/hooks/base_aws.py#L378-L383


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: `emr_conn_id` should be optional in `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1150075545

   > BTW, AwsBaseHook handle case if aws_conn_id=None and use default boto3 behaviour
   
   Yes, we can also set it here. but that too will be breaking change for someone not passing aws_conn_id in this operator param and have created a connection with the name `aws_default` https://github.com/apache/airflow/blob/287fc4bdeb107fc81653d3a6721f9f5b87b90529/airflow/providers/amazon/aws/operators/emr.py#L304


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149829153

   Is it actually an error? Possibly due to lack of documentation of this connection type 
   
   As far as I remember since Airflow 1.10 (and probably earlier) Amazon Elastic Map Reduce connection stored only kwargs for [run_job_flow](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html#EMR.Client.run_job_flow) and `aws_conn_id` uses for actual `boto3` emr client connection
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1150033369

   > I mean if user set `emr_conn_id="this_connection_not_exists"` this connection not exists and should raise an error rather than set to empty extras. And `emr_conn_id=None` - is not mandatory.
   > 
   > I think it also required make changes in operator
   > 
   > https://github.com/apache/airflow/blob/287fc4bdeb107fc81653d3a6721f9f5b87b90529/airflow/providers/amazon/aws/operators/emr.py#L304-L305
   
   Ok, So behaviour should be like
   - if aws_conn_id does not exist raise an error like earlier 
   - if emr_conn_id container auth param like `aws_access_key_id` etc raise error like earlier
   - if emr_conn_id does not exist then make `config` as empty that way emr_conn_id will not be mandatory
   
   We can't keep aws_conn_id and emr_conn_id default values None because if the user do pass aws_conn_id then we accuse that the value is `aws_default` across the operator/sensor


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: `emr_conn_id` should be optional in `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1152348168

   Hey guys, just wanted to check if I need to address anything more here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
eladkal commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149891659

   The hook states:
   https://github.com/apache/airflow/blob/e54ca47262579742fc3c297c7f8d4c48f2437f82/airflow/providers/amazon/aws/hooks/emr.py#L27-L33


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: `emr_conn_id` should be optional in `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1150147574

   > You could use `aws_conn_id: Optional[str] = 'aws_default'` and it won't break anything
   > 
   > >
   
   Then the default value won't be None it will be aws_default, right?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149934903

   > emr_conn_id is used in only EmrCreateJobFlowOperator and I feel it expecting that emr_conn_id extra filed should contain JSON request body for run_job_flow API. But keeping the boto3 API request body in DAG looks more convenient to me than storing it in connection and if I'm passing job_flow_overrides param in the task in that case then emr_conn_id should not be mandatory.
   
   If only case case make `emr_conn_id` non mandatory than probably better just change receiving config
   
   ```python
   config = {}
   if self.emr_conn_id:
       emr_conn = self.get_connection(self.emr_conn_id)
       config = emr_conn.extra_dejson.copy()
   ```
   
   Assume that user provide AWS Connection rather than EMR Connection personally for me it is not good point.
   I could pass `aws_conn_id=None` and assume that Airflow will use Instance Profile / Task Role
   
   > If the user will store aws_access_key_id in emr_conn_id extra then I'm assuming that intention to use this connection as authentication rather than using it in building request body.
   
   If user deploy Airflow in AWS Cloud and follow AWS Best Practices he/she won't use `aws_access_key_id` and most probably use Instance Profile / Task Role
   
   Examples:
    - User use `role_arn` in EMR Connection and expected that he/she switch to this role, however error raise
    - User use `region_name` in EMR Connection and expected that region changed, however error raise
   
   For me it is not clear why we only need to check `aws_access_key_id` ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149957040

   > For me it is not clear why we only need to check `aws_access_key_id` ?
   
   I'm checking this since I do not want a user to use emr_conn_id to authenticate and `aws_access_key_id` is required if you want to keep the AWS auth param in conn. In case they have kept the auth param in emr_conn_id then ignore it. But I'm ok to remove that if block check if you want. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149860154

   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149808824

   @potiuk static check failing in this PR as well in other with error 
   ```
   Run pydocstyle.........................................................................Failed
   - hook id: pydocstyle
   - exit code: 1
   airflow/www/security.py:627 in private method `_sync_dag_view_permissions`:
           D202: No blank lines allowed after function docstring (found 1)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149813896

   Ah indeed - it looks like some cross-merged commits without rebase ....


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149876650

   > Is it actually an error? Possibly due to lack of documentation of this connection type
   > 
   > As far as I remember since Airflow 1.10 (and probably earlier) Amazon Elastic Map Reduce connection stored only kwargs for [run_job_flow](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html#EMR.Client.run_job_flow) and `aws_conn_id` uses for actual `boto3` emr client connection
   
   emr_conn_id is used in only EmrCreateJobFlowOperator and I feel it expecting that emr_conn_id extra filed should contain JSON request body for run_job_flow API. But keeping the boto3 API request body in DAG looks more convenient to me than storing it in connection and if I'm passing job_flow_overrides param in the task in that case then emr_conn_id should not be mandatory. Here, I'm doing 
   - check if aws_conn_id is available, use it for AWS credential
   - if aws_conn_id is not available assume that credential is in emr_conn_id
   - if emr_conn_id conn extra is containing request body override with job_flow_overrides request body
   - if emr_conn_id does not exist or contain AWS credentials then use job_flow_overrides as the request body
   After this change, if emr_conn_id is not available then it will use aws_conn_id for auth and job_flow_overrides for the request body earlier it was failing by error `conn_id` not found


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
eladkal commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1150032766

   @vincbeck @ferruzzi @o-nikolas any thoughts on this one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149857972

   > Is it actually an error? Possibly due to lack of documentation of this connection type
   
   I believe there were quite a lot of refactorings to AwsBaseHook since and I guess this is result of it - looks legit for me actually.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149722343

   Hey @pankajastro - I am about to release RC2 for providers for May and would like to include that one - would it be possibe to fix the failing tests quickly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149929777

   If I am judging well - I think @eladkal  also suggested that - I think it's not a "regression" - rather improvement so I am ok with merging it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #24306: Fix: `emr_conn_id` should be optional in `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1150159140

   > > You could use `aws_conn_id: Optional[str] = 'aws_default'` and it won't break anything
   > > >
   > 
   > Then the default value won't be None it will be aws_default, right?
   
   yep


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149924398

   > * If user will store `aws_access_key_id ` the error not happen (breaking changes?)
   >   https://github.com/apache/airflow/blob/287fc4bdeb107fc81653d3a6721f9f5b87b90529/airflow/providers/amazon/aws/hooks/emr.py#L101
   
   
   
   > 
   
   If the user will store aws_access_key_id in emr_conn_id extra then I'm assuming that intention to use this connection as authentication rather than using it in building request body. If you want to throw an error like the earlier case I can do that but in current case you can store the AWS JSON in emr_conn_id extra then you do not need to pass aws_conn_id pram in task


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #24306: Fix: `emr_conn_id` should be optional in `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1150053649

   > We can't keep aws_conn_id and emr_conn_id default values None because if the user do pass aws_conn_id then we accuse that the value is aws_default across the operator/sensor.
   
   BTW, `AwsBaseHook` handle case if `aws_conn_id=None` and use default boto3 behaviour 
   https://github.com/apache/airflow/blob/01a52ccb53bc805308bd085dc51ae4674dd5eb4a/airflow/providers/amazon/aws/hooks/base_aws.py#L418-L422
   
   So it doesn't raise any error now - only static checks warning in IDE
   ```
   AIRFLOW_CTX_DAG_ID=example_emr_job_flow_manual_steps
   AIRFLOW_CTX_TASK_ID=create_job_flow
   AIRFLOW_CTX_EXECUTION_DATE=2022-06-07T10:28:15.046932+00:00
   AIRFLOW_CTX_TRY_NUMBER=8
   AIRFLOW_CTX_DAG_RUN_ID=manual__2022-06-07T10:28:15.046932+00:00
   [2022-06-08, 15:10:55 UTC] {emr.py:300} INFO - Creating JobFlow using aws-conn-id: None, emr-conn-id: emr_default
   [2022-06-08, 15:10:55 UTC] {base.py:68} INFO - Using connection ID 'emr_default' for task execution.
   [2022-06-08, 15:10:55 UTC] {credentials.py:1180} INFO - Found credentials in environment variables.
   [2022-06-08, 15:10:56 UTC] {emr.py:314} INFO - JobFlow with id j-1Y1LXW4LWC1Y1 created
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149925868

   > If user store other extra arguments from AWS Connection such as region_name, profile_name, config_kwargs, session_kwargs, and etc the error will raised
   
   In this case the behaviour will depend on AWS API like earlier 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kaxil merged pull request #24306: Fix: `emr_conn_id` should be optional in `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
kaxil merged PR #24306:
URL: https://github.com/apache/airflow/pull/24306


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dacort commented on pull request #24306: Fix: `emr_conn_id` should be optional in `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
dacort commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1160783845

   I was also OOO, but looks like this is already taken care of. Feel free to ping me in the future for any EMR-related questions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149754169

   > Hey @pankajastro - I am about to release RC2 for providers for May and would like to include that one - would it be possibe to fix the failing tests quickly?
   
   Fixed it, the test passing on local. let's see how it goes in CI. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
eladkal commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149822473

   Fix: https://github.com/apache/airflow/pull/24322


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149883919

   > I believe there were quite a lot of refactorings to AwsBaseHook since and I guess this is result of it - looks legit for me actually.
   
   I've just check in 1.10.4 (I used to work with this version for about 1 year) and looks like it have the same behaviour https://github.com/apache/airflow/blob/1.10.4/airflow/contrib/hooks/emr_hook.py
   
   If user stored some parameters in EMR Connection which not accepted by `run_job_flow` it would receive an error. 
   
   However after this changes:
   - If user will store `aws_access_key_id ` the error not happen (breaking changes?)
   https://github.com/apache/airflow/blob/287fc4bdeb107fc81653d3a6721f9f5b87b90529/airflow/providers/amazon/aws/hooks/emr.py#L101 
   - If user store other extra arguments from AWS Connection such as `region_name`, `profile_name`, `config_kwargs`, `session_kwargs`, and etc the error will raised
   
   I'm never been happy with behaviour that I need provide emr connection in cases when I do not required it. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: Unable to locate credentials and Unknown parameter in input for `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1149999972

   > * `emr_conn_id` not exists - raise an error (same as current implementation)
   
   If you want to keep this behaviour then some users will have to create emr connection having no data, which I don't want. Basically, emr_conn_id should not mandatory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: `emr_conn_id` should be optional in `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1150171541

   > 
   > yep
   
   So, if I'll change the [__init__](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/operators/emr.py#L301) method of the operator to 
   ```
   def __init__(
           self,
           *,
           aws_conn_id: str = 'aws_default',
           emr_conn_id: Optional[str] = 'emr_default',
           job_flow_overrides: Optional[Union[str, Dict[str, Any]]] = None,
           region_name: Optional[str] = None,
           **kwargs,
       ):
   ```
   Then from the code execution point of view, this is effectively no change 
   and if I'll do 
   ```
   def __init__(
           self,
           *,
           aws_conn_id: str = 'aws_default',
           emr_conn_id: Optional[str] = None,
           job_flow_overrides: Optional[Union[str, Dict[str, Any]]] = None,
           region_name: Optional[str] = None,
           **kwargs,
       ):
   ```
   Then this will be a breaking change if someone has created emr_conn_id with the name emr_default and not passing emr_conn_id param in operator, isn't it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pankajastro commented on pull request #24306: Fix: `emr_conn_id` should be optional in `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
pankajastro commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1150285733

   Updated docstring here 140a26d224863ad4be9c0baa0cb4af7b88487e9b


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on pull request #24306: Fix: `emr_conn_id` should be optional in `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
o-nikolas commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1150286192

   I unfortunately don't have the time to catch up on this issue today, but dropping in to CC @dacort (who is the AWS Developer Advocate for EMR)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #24306: Fix: `emr_conn_id` should be optional in `EmrCreateJobFlowOperator`

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #24306:
URL: https://github.com/apache/airflow/pull/24306#issuecomment-1150085106

   > Yes, we can also set it here. but that too will be breaking change for someone not passing aws_conn_id in this operator param and have created a connection with the name aws_default
   
   Quite a lot of operators and sensors in amazon-provider uses different annotations and default value for `aws_conn_id`
   
   - `aws_conn_id: str = 'aws_default'`
   - `aws_conn_id: Optional[str] = 'aws_default'`
   - `aws_conn_id: Optional[str] = None`
   
   Due to the fact all of them (or most) uses hooks based on `AwsBaseHook` so provide `None` is fine but in case it all workers use the same credentials
   
   You could use `aws_conn_id: Optional[str] = 'aws_default'` and it won't break anything
   
   
   > if aws_conn_id does not exist raise an error like earlier
   if emr_conn_id contain auth param like aws_access_key_id etc raise error like earlier
   if emr_conn_id does not exist then make config as empty dict that way emr_conn_id will not be mandatory
   
   My suggestion still the same
   1. Make it allow provide `None` for `emr_conn_id`
   2. Do not touch  anything related to `aws_conn_id`
   3. Check type of `emr_conn_id` if it `not None`
     - Connection not `emr` - raise an error
     - Connection not exists - raise an error which is default for BaseHook.get_connection
   4. User provide in connection wrong arguments? boto3 will raise an error


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org