You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Alexandre Blanchard (JIRA)" <ji...@apache.org> on 2019/08/06 13:21:00 UTC

[jira] [Created] (AIRFLOW-5126) Read aws_session_token in extra_config of the aws hook

Alexandre Blanchard created AIRFLOW-5126:
--------------------------------------------

             Summary: Read aws_session_token in extra_config of the aws hook
                 Key: AIRFLOW-5126
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5126
             Project: Apache Airflow
          Issue Type: Improvement
          Components: hooks
    Affects Versions: 1.10.3
            Reporter: Alexandre Blanchard


Hi,

Thanks for the great software.

At my company, we enforce security around our aws account and all accounts must have mfa activated. To use airflow with my account, I generate a session token with an expiration date using the command
{code:java}
aws sts assume-role --role-arn <the-role-i-want-use> --role-session-name testing --serial-number <my-personal-mfa-arn> --token-code <code-on-my-mfa-device>
 --duration-seconds 18000{code}
This way I retrieve all I need to connect to aws: a aws_access_key_id, a aws_secret_access_key and a aws_session_token. 

Currently I'm using boto3 directly in my dag and it's working great. I would like to use a connection managed by airflow but when I set the parameters this way:
{code:java}
airflow connections --add \
 --conn_id s3_log \
 --conn_type s3 \
 --conn_login "<aws_access_key_id>" \
 --conn_password "<aws_secret_access_key>" \
 --conn_extra "{ \
   \"aws_session_token\": \"<aws_session_token>\" \
}"
{code}
With a hook using this connection, I get the error:
{code:java}
[2019-08-06 12:31:28,157] {__init__.py:1580} ERROR - An error occurred (403) when calling the HeadObject operation: Forbidden
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/__init__.py", line 1441, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 112, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 117, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/root/airflow/dags/s3Dag.py", line 48, in download_raw_data
    dataObject = s3hook.get_key("poc/raw_data.csv.gz", s3_bucket)
  File "/usr/local/lib/python3.7/site-packages/airflow/hooks/S3_hook.py", line 217, in get_key
    obj.load()
  File "/usr/local/lib/python3.7/site-packages/boto3/resources/factory.py", line 505, in do_action
    response = action(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(**params)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
{code}
Reading the code of the hook (https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/hooks/aws_hook.py#L90), I understand that the session token is not read from the extra config. The only case a session token is passed to the boto3 client is when we assume a role. In my case I want to use a role I have already assumed.

So my suggestion is to read the session token from the extra config and use it to connect to aws.

Do you think it is the right way to do it ? Does this workflow make sense ?

I am ready to contribute if my suggestion is accepted.

Regards



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)