You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Alexandre Blanchard (JIRA)" <ji...@apache.org> on 2019/08/06 13:21:00 UTC
[jira] [Created] (AIRFLOW-5126) Read aws_session_token in
extra_config of the aws hook
Alexandre Blanchard created AIRFLOW-5126:
--------------------------------------------
Summary: Read aws_session_token in extra_config of the aws hook
Key: AIRFLOW-5126
URL: https://issues.apache.org/jira/browse/AIRFLOW-5126
Project: Apache Airflow
Issue Type: Improvement
Components: hooks
Affects Versions: 1.10.3
Reporter: Alexandre Blanchard
Hi,
Thanks for the great software.
At my company, we enforce security around our aws account and all accounts must have mfa activated. To use airflow with my account, I generate a session token with an expiration date using the command
{code:java}
aws sts assume-role --role-arn <the-role-i-want-use> --role-session-name testing --serial-number <my-personal-mfa-arn> --token-code <code-on-my-mfa-device>
--duration-seconds 18000{code}
This way I retrieve all I need to connect to aws: a aws_access_key_id, a aws_secret_access_key and a aws_session_token.
Currently I'm using boto3 directly in my dag and it's working great. I would like to use a connection managed by airflow but when I set the parameters this way:
{code:java}
airflow connections --add \
--conn_id s3_log \
--conn_type s3 \
--conn_login "<aws_access_key_id>" \
--conn_password "<aws_secret_access_key>" \
--conn_extra "{ \
\"aws_session_token\": \"<aws_session_token>\" \
}"
{code}
With a hook using this connection, I get the error:
{code:java}
[2019-08-06 12:31:28,157] {__init__.py:1580} ERROR - An error occurred (403) when calling the HeadObject operation: Forbidden
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/models/__init__.py", line 1441, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 112, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 117, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/root/airflow/dags/s3Dag.py", line 48, in download_raw_data
dataObject = s3hook.get_key("poc/raw_data.csv.gz", s3_bucket)
File "/usr/local/lib/python3.7/site-packages/airflow/hooks/S3_hook.py", line 217, in get_key
obj.load()
File "/usr/local/lib/python3.7/site-packages/boto3/resources/factory.py", line 505, in do_action
response = action(self, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/boto3/resources/action.py", line 83, in __call__
response = getattr(parent.meta.client, operation_name)(**params)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
{code}
Reading the code of the hook (https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/hooks/aws_hook.py#L90), I understand that the session token is not read from the extra config. The only case a session token is passed to the boto3 client is when we assume a role. In my case I want to use a role I have already assumed.
So my suggestion is to read the session token from the extra config and use it to connect to aws.
Do you think it is the right way to do it ? Does this workflow make sense ?
I am ready to contribute if my suggestion is accepted.
Regards
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)