You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/04 00:48:08 UTC

[GitHub] [airflow] ConstantinoSchillebeeckx opened a new issue #16148: Downloading files from S3 broken in 2.1.0

ConstantinoSchillebeeckx opened a new issue #16148:
URL: https://github.com/apache/airflow/issues/16148


   **Apache Airflow version**: 2.0.2 and 2.1.0
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: running locally
   - **OS** (e.g. from /etc/os-release):
   - **Kernel** (e.g. `uname -a`): Darwin CSchillebeeckx-0589.local 19.6.0 Darwin Kernel Version 19.6.0: Tue Jan 12 22:13:05 PST 2021; root:xnu-6153.141.16~1/RELEASE_X86_64 x86_64
   - **Install tools**: pip
   - **Others**: Running everything in Docker including Redis and Celery
   
   **What happened**:
   
   I'm seeing issues with downloading files from S3 on 2.1.0; a file is created after download, however the file content is empty!
   
   **What you expected to happen**:
   
   Non-empty files :)
   
   **How to reproduce it**:
   
   The DAG I'm running:
   ```python
   # -*- coding: utf-8 -*-
   import os
   import logging
   
   from airflow import DAG
   from airflow.operators.python import PythonOperator
   from airflow.utils.dates import days_ago
   from airflow.providers.amazon.aws.hooks.s3 import S3Hook
   
   
   def download_file_from_s3():
   
       # authed with ENVIRONMENT variables
       s3_hook = S3Hook()
   
       bucket = 'some-secret-bucket'
       key = 'tmp.txt'
   
       with open('/tmp/s3_hook.txt', 'w') as f:
           s3_hook.get_resource_type("s3").Bucket(bucket).Object(key).download_file(f.name)
           logging.info(f"File downloaded: {f.name}")
   
           with open(f.name, 'r') as f_in:
               logging.info(f"FILE CONTENT {f_in.read()}")
   
   
   dag = DAG(
       "tmp",
       catchup=False,
       default_args={
           "start_date": days_ago(1),
       },
       schedule_interval=None,
   )
   
   download_file_from_s3 = PythonOperator(
       task_id="download_file_from_s3", python_callable=download_file_from_s3, dag=dag
   )
   ```
   
   The logged output from 2.0.2
   ```
   *** Fetching from: http://ba1b92003f54:8793/log/tmp/download_file_from_s3ile/2021-05-28T17:25:58.851532+00:00/1.log
   
   [2021-05-28 10:26:04,227] {executor_loader.py:82} DEBUG - Loading core executor: CeleryExecutor
   [2021-05-28 10:26:04,239] {__init__.py:51} DEBUG - Loading core task runner: StandardTaskRunner
   [2021-05-28 10:26:04,252] {base_task_runner.py:62} DEBUG - Planning to run as the  user
   [2021-05-28 10:26:04,255] {taskinstance.py:595} DEBUG - Refreshing TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]> from DB
   [2021-05-28 10:26:04,264] {taskinstance.py:630} DEBUG - Refreshed TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]>
   [2021-05-28 10:26:04,264] {taskinstance.py:892} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]> dependency 'Trigger Rule' PASSED: True, The task instance did not have any upstream tasks.
   [2021-05-28 10:26:04,265] {taskinstance.py:892} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]> dependency 'Task Instance Not Running' PASSED: True, Task is not in running state.
   [2021-05-28 10:26:04,279] {taskinstance.py:892} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]> dependency 'Previous Dagrun State' PASSED: True, The task did not have depends_on_past set.
   [2021-05-28 10:26:04,280] {taskinstance.py:892} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]> dependency 'Not In Retry Period' PASSED: True, The task instance was not marked for retrying.
   [2021-05-28 10:26:04,280] {taskinstance.py:892} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]> dependency 'Task Instance State' PASSED: True, Task state queued was valid.
   [2021-05-28 10:26:04,280] {taskinstance.py:877} INFO - Dependencies all met for <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]>
   [2021-05-28 10:26:04,281] {taskinstance.py:892} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]> dependency 'Trigger Rule' PASSED: True, The task instance did not have any upstream tasks.
   [2021-05-28 10:26:04,291] {taskinstance.py:892} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]> dependency 'Pool Slots Available' PASSED: True, ('There are enough open slots in %s to execute the task', 'default_pool')
   [2021-05-28 10:26:04,301] {taskinstance.py:892} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]> dependency 'Previous Dagrun State' PASSED: True, The task did not have depends_on_past set.
   [2021-05-28 10:26:04,301] {taskinstance.py:892} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]> dependency 'Not In Retry Period' PASSED: True, The task instance was not marked for retrying.
   [2021-05-28 10:26:04,301] {taskinstance.py:892} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]> dependency 'Task Concurrency' PASSED: True, Task concurrency is not set.
   [2021-05-28 10:26:04,301] {taskinstance.py:877} INFO - Dependencies all met for <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [queued]>
   [2021-05-28 10:26:04,301] {taskinstance.py:1068} INFO -
   --------------------------------------------------------------------------------
   [2021-05-28 10:26:04,302] {taskinstance.py:1069} INFO - Starting attempt 1 of 1
   [2021-05-28 10:26:04,302] {taskinstance.py:1070} INFO -
   --------------------------------------------------------------------------------
   [2021-05-28 10:26:04,317] {taskinstance.py:1089} INFO - Executing <Task(PythonOperator): download_file_from_s3ile> on 2021-05-28T17:25:58.851532+00:00
   [2021-05-28 10:26:04,324] {standard_task_runner.py:52} INFO - Started process 118 to run task
   [2021-05-28 10:26:04,331] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'tmp', 'download_file_from_s3ile', '2021-05-28T17:25:58.851532+00:00', '--job-id', '6', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/tmp_dag.py', '--cfg-path', '/tmp/tmpuz8u2gva', '--error-file', '/tmp/tmpms02c24z']
   [2021-05-28 10:26:04,333] {standard_task_runner.py:77} INFO - Job 6: Subtask download_file_from_s3ile
   [2021-05-28 10:26:04,334] {cli_action_loggers.py:66} DEBUG - Calling callbacks: [<function default_action_log at 0x7f348514f0e0>]
   [2021-05-28 10:26:04,350] {settings.py:210} DEBUG - Setting up DB connection pool (PID 118)
   [2021-05-28 10:26:04,351] {settings.py:243} DEBUG - settings.prepare_engine_args(): Using NullPool
   [2021-05-28 10:26:04,357] {taskinstance.py:595} DEBUG - Refreshing TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [None]> from DB
   [2021-05-28 10:26:04,377] {taskinstance.py:630} DEBUG - Refreshed TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [running]>
   [2021-05-28 10:26:04,391] {logging_mixin.py:104} INFO - Running <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [running]> on host ba1b92003f54
   [2021-05-28 10:26:04,395] {taskinstance.py:595} DEBUG - Refreshing TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [running]> from DB
   [2021-05-28 10:26:04,401] {taskinstance.py:630} DEBUG - Refreshed TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [running]>
   [2021-05-28 10:26:04,406] {taskinstance.py:658} DEBUG - Clearing XCom data
   [2021-05-28 10:26:04,413] {taskinstance.py:665} DEBUG - XCom data cleared
   [2021-05-28 10:26:04,438] {taskinstance.py:1283} INFO - Exporting the following env vars:
   AIRFLOW_CTX_DAG_OWNER=airflow
   AIRFLOW_CTX_DAG_ID=tmp
   AIRFLOW_CTX_TASK_ID=download_file_from_s3ile
   AIRFLOW_CTX_EXECUTION_DATE=2021-05-28T17:25:58.851532+00:00
   AIRFLOW_CTX_DAG_RUN_ID=manual__2021-05-28T17:25:58.851532+00:00
   [2021-05-28 10:26:04,438] {__init__.py:146} DEBUG - Preparing lineage inlets and outlets
   [2021-05-28 10:26:04,438] {__init__.py:190} DEBUG - inlets: [], outlets: []
   [2021-05-28 10:26:04,439] {base_aws.py:362} INFO - Airflow Connection: aws_conn_id=aws_default
   [2021-05-28 10:26:04,446] {base_aws.py:385} WARNING - Unable to use Airflow Connection for credentials.
   [2021-05-28 10:26:04,446] {base_aws.py:386} INFO - Fallback on boto3 credential strategy
   [2021-05-28 10:26:04,446] {base_aws.py:391} INFO - Creating session using boto3 credential strategy region_name=None
   [2021-05-28 10:26:04,448] {hooks.py:417} DEBUG - Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
   [2021-05-28 10:26:04,450] {hooks.py:417} DEBUG - Changing event name from before-call.apigateway to before-call.api-gateway
   [2021-05-28 10:26:04,451] {hooks.py:417} DEBUG - Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
   [2021-05-28 10:26:04,452] {hooks.py:417} DEBUG - Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
   [2021-05-28 10:26:04,453] {hooks.py:417} DEBUG - Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
   [2021-05-28 10:26:04,453] {hooks.py:417} DEBUG - Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
   [2021-05-28 10:26:04,454] {hooks.py:417} DEBUG - Changing event name from docs.*.autoscaling.CreateLaunchConfiguration.complete-section to docs.*.auto-scaling.CreateLaunchConfiguration.complete-section
   [2021-05-28 10:26:04,457] {hooks.py:417} DEBUG - Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
   [2021-05-28 10:26:04,457] {hooks.py:417} DEBUG - Changing event name from docs.*.logs.CreateExportTask.complete-section to docs.*.cloudwatch-logs.CreateExportTask.complete-section
   [2021-05-28 10:26:04,457] {hooks.py:417} DEBUG - Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
   [2021-05-28 10:26:04,457] {hooks.py:417} DEBUG - Changing event name from docs.*.cloudsearchdomain.Search.complete-section to docs.*.cloudsearch-domain.Search.complete-section
   [2021-05-28 10:26:04,471] {loaders.py:174} DEBUG - Loading JSON file: /usr/local/lib/python3.7/site-packages/boto3/data/s3/2006-03-01/resources-1.json
   [2021-05-28 10:26:04,477] {credentials.py:1961} DEBUG - Looking for credentials via: env
   [2021-05-28 10:26:04,477] {credentials.py:1087} INFO - Found credentials in environment variables.
   [2021-05-28 10:26:04,477] {loaders.py:174} DEBUG - Loading JSON file: /usr/local/lib/python3.7/site-packages/botocore/data/endpoints.json
   [2021-05-28 10:26:04,483] {hooks.py:210} DEBUG - Event choose-service-name: calling handler <function handle_service_name_alias at 0x7f347f1165f0>
   [2021-05-28 10:26:04,494] {loaders.py:174} DEBUG - Loading JSON file: /usr/local/lib/python3.7/site-packages/botocore/data/s3/2006-03-01/service-2.json
   [2021-05-28 10:26:04,505] {hooks.py:210} DEBUG - Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f347f1bd170>
   [2021-05-28 10:26:04,505] {hooks.py:210} DEBUG - Event creating-client-class.s3: calling handler <function lazy_call.<locals>._handler at 0x7f3453f7f170>
   [2021-05-28 10:26:04,506] {hooks.py:210} DEBUG - Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f347f1b9ef0>
   [2021-05-28 10:26:04,510] {endpoint.py:291} DEBUG - Setting s3 timeout as (60, 60)
   [2021-05-28 10:26:04,511] {loaders.py:174} DEBUG - Loading JSON file: /usr/local/lib/python3.7/site-packages/botocore/data/_retry.json
   [2021-05-28 10:26:04,512] {client.py:164} DEBUG - Registering retry handlers for service: s3
   [2021-05-28 10:26:04,513] {factory.py:66} DEBUG - Loading s3:s3
   [2021-05-28 10:26:04,515] {factory.py:66} DEBUG - Loading s3:Bucket
   [2021-05-28 10:26:04,515] {model.py:358} DEBUG - Renaming Bucket attribute name
   [2021-05-28 10:26:04,516] {hooks.py:210} DEBUG - Event creating-resource-class.s3.Bucket: calling handler <function lazy_call.<locals>._handler at 0x7f3453ecbe60>
   [2021-05-28 10:26:04,517] {factory.py:66} DEBUG - Loading s3:Object
   [2021-05-28 10:26:04,519] {hooks.py:210} DEBUG - Event creating-resource-class.s3.Object: calling handler <function lazy_call.<locals>._handler at 0x7f3453ecb3b0>
   [2021-05-28 10:26:04,520] {utils.py:599} DEBUG - Acquiring 0
   [2021-05-28 10:26:04,521] {tasks.py:194} DEBUG - DownloadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f34531fcc50>}) about to wait for the following futures []
   [2021-05-28 10:26:04,521] {tasks.py:203} DEBUG - DownloadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f34531fcc50>}) done waiting for dependent futures
   [2021-05-28 10:26:04,521] {tasks.py:147} DEBUG - Executing task DownloadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f34531fcc50>}) with kwargs {'client': <botocore.client.S3 object at 0x7f3453215d10>, 'config': <boto3.s3.transfer.TransferConfig object at 0x7f3453181390>, 'osutil': <s3transfer.utils.OSUtils object at 0x7f3453181510>, 'request_executor': <s3transfer.futures.BoundedExecutor object at 0x7f3453181190>, 'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f34531fcc50>, 'io_executor': <s3transfer.futures.BoundedExecutor object at 0x7f34531fced0>}
   [2021-05-28 10:26:04,522] {hooks.py:210} DEBUG - Event before-parameter-build.s3.HeadObject: calling handler <function sse_md5 at 0x7f347f133a70>
   [2021-05-28 10:26:04,523] {hooks.py:210} DEBUG - Event before-parameter-build.s3.HeadObject: calling handler <function validate_bucket_name at 0x7f347f1339e0>
   [2021-05-28 10:26:04,523] {hooks.py:210} DEBUG - Event before-parameter-build.s3.HeadObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7f345321aa90>>
   [2021-05-28 10:26:04,523] {hooks.py:210} DEBUG - Event before-parameter-build.s3.HeadObject: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x7f34531d0250>>
   [2021-05-28 10:26:04,523] {hooks.py:210} DEBUG - Event before-parameter-build.s3.HeadObject: calling handler <function generate_idempotent_uuid at 0x7f347f133830>
   [2021-05-28 10:26:04,524] {hooks.py:210} DEBUG - Event before-call.s3.HeadObject: calling handler <function add_expect_header at 0x7f347f133d40>
   [2021-05-28 10:26:04,524] {hooks.py:210} DEBUG - Event before-call.s3.HeadObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7f345321aa90>>
   [2021-05-28 10:26:04,525] {hooks.py:210} DEBUG - Event before-call.s3.HeadObject: calling handler <function inject_api_version_header_if_needed at 0x7f347f13b0e0>
   [2021-05-28 10:26:04,525] {endpoint.py:101} DEBUG - Making request for OperationModel(name=HeadObject) with params: {'url_path': '[REDACT]', 'query_string': {}, 'method': 'HEAD', 'headers': {'User-Agent': 'Boto3/1.15.18 Python/3.7.10 Linux/5.10.25-linuxkit Botocore/1.18.18 Resource'}, 'body': b'', 'url': 'https://s3.amazonaws.com/[REDACT]/tmp.txt', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0x7f345321a710>, 'has_streaming_input': False, 'auth_type': None, 'signing': {'bucket': '[REDACT]'}}}
   [2021-05-28 10:26:04,526] {hooks.py:210} DEBUG - Event request-created.s3.HeadObject: calling handler <function signal_not_transferring at 0x7f347ee7de60>
   [2021-05-28 10:26:04,526] {hooks.py:210} DEBUG - Event request-created.s3.HeadObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f3453215e90>>
   [2021-05-28 10:26:04,527] {hooks.py:210} DEBUG - Event choose-signer.s3.HeadObject: calling handler <bound method ClientCreator._default_s3_presign_to_sigv2 of <botocore.client.ClientCreator object at 0x7f3453f046d0>>
   [2021-05-28 10:26:04,527] {hooks.py:210} DEBUG - Event choose-signer.s3.HeadObject: calling handler <function set_operation_specific_signer at 0x7f347f133710>
   [2021-05-28 10:26:04,527] {hooks.py:210} DEBUG - Event before-sign.s3.HeadObject: calling handler <bound method S3EndpointSetter.set_endpoint of <botocore.utils.S3EndpointSetter object at 0x7f34531d0710>>
   [2021-05-28 10:26:04,527] {utils.py:1639} DEBUG - Defaulting to S3 virtual host style addressing with path style addressing fallback.
   [2021-05-28 10:26:04,528] {utils.py:1018} DEBUG - Checking for DNS compatible bucket for: https://s3.amazonaws.com/[REDACT]/tmp.txt
   [2021-05-28 10:26:04,528] {utils.py:1036} DEBUG - URI updated to: https://[REDACT].s3.amazonaws.com/tmp.txt
   [2021-05-28 10:26:04,528] {auth.py:364} DEBUG - Calculating signature using v4 auth.
   [2021-05-28 10:26:04,529] {auth.py:365} DEBUG - CanonicalRequest:
   HEAD
   /tmp.txt
   [REDACT]
   [2021-05-28 10:26:04,529] {hooks.py:210} DEBUG - Event request-created.s3.HeadObject: calling handler <function signal_transferring at 0x7f347ee8a320>
   [2021-05-28 10:26:04,529] {endpoint.py:187} DEBUG - Sending http request: <AWSPreparedRequest stream_output=False, method=HEAD, url=https://[REDACT].s3.amazonaws.com/tmp.txt, headers={'User-Agent': b'Boto3/1.15.18 Python/3.7.10 Linux/5.10.25-linuxkit Botocore/1.18.18 Resource', 'X-Amz-Date': b'20210528T172604Z', 'X-Amz-Content-SHA256': b'[REDACT]', 'Authorization': b'[REDACT]', SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=[REDACT]'}>
   [2021-05-28 10:26:04,531] {connectionpool.py:943} DEBUG - Starting new HTTPS connection (1): [REDACT].s3.amazonaws.com:443
   [2021-05-28 10:26:05,231] {connectionpool.py:442} DEBUG - https://[REDACT].s3.amazonaws.com:443 "HEAD /tmp.txt HTTP/1.1" 200 0
   [2021-05-28 10:26:05,232] {parsers.py:233} DEBUG - Response headers: {'x-amz-id-2': 'o[REDACT]', 'x-amz-request-id': '[REDACT]', 'Date': 'Fri, 28 May 2021 17:26:06 GMT', 'Last-Modified': 'Thu, 27 May 2021 20:37:55 GMT', 'ETag': '"[REDACT]"', 'x-amz-server-side-encryption': 'AES256', 'x-amz-version-id': '[REDACT]', 'Accept-Ranges': 'bytes', 'Content-Type': 'text/plain', 'Content-Length': '5', 'Server': 'AmazonS3'}
   [2021-05-28 10:26:05,232] {parsers.py:234} DEBUG - Response body:
   b''
   [2021-05-28 10:26:05,234] {hooks.py:210} DEBUG - Event needs-retry.s3.HeadObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f345321ab50>
   [2021-05-28 10:26:05,235] {retryhandler.py:187} DEBUG - No retry needed.
   [2021-05-28 10:26:05,235] {hooks.py:210} DEBUG - Event needs-retry.s3.HeadObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f345321aa90>>
   [2021-05-28 10:26:05,236] {futures.py:318} DEBUG - Submitting task ImmediatelyWriteIOGetObjectTask(transfer_id=0, {'bucket': '[REDACT]', 'key': 'tmp.txt', 'extra_args': {}}) to executor <s3transfer.futures.BoundedExecutor object at 0x7f3453181190> for transfer request: 0.
   [2021-05-28 10:26:05,236] {utils.py:599} DEBUG - Acquiring 0
   [2021-05-28 10:26:05,236] {tasks.py:194} DEBUG - ImmediatelyWriteIOGetObjectTask(transfer_id=0, {'bucket': '[REDACT]', 'key': 'tmp.txt', 'extra_args': {}}) about to wait for the following futures []
   [2021-05-28 10:26:05,237] {tasks.py:203} DEBUG - ImmediatelyWriteIOGetObjectTask(transfer_id=0, {'bucket': '[REDACT]', 'key': 'tmp.txt', 'extra_args': {}}) done waiting for dependent futures
   [2021-05-28 10:26:05,237] {tasks.py:147} DEBUG - Executing task ImmediatelyWriteIOGetObjectTask(transfer_id=0, {'bucket': '[REDACT]', 'key': 'tmp.txt', 'extra_args': {}}) with kwargs {'client': <botocore.client.S3 object at 0x7f3453215d10>, 'bucket': '[REDACT]', 'key': 'tmp.txt', 'fileobj': <s3transfer.utils.DeferredOpenFile object at 0x7f34531fc890>, 'extra_args': {}, 'callbacks': [], 'max_attempts': 5, 'download_output_manager': <s3transfer.download.DownloadFilenameOutputManager object at 0x7f34531fc7d0>, 'io_chunksize': 262144, 'bandwidth_limiter': None}
   [2021-05-28 10:26:05,238] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <function sse_md5 at 0x7f347f133a70>
   [2021-05-28 10:26:05,238] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <function validate_bucket_name at 0x7f347f1339e0>
   [2021-05-28 10:26:05,238] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7f345321aa90>>
   [2021-05-28 10:26:05,238] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x7f34531d0250>>
   [2021-05-28 10:26:05,238] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <function generate_idempotent_uuid at 0x7f347f133830>
   [2021-05-28 10:26:05,239] {hooks.py:210} DEBUG - Event before-call.s3.GetObject: calling handler <function add_expect_header at 0x7f347f133d40>
   [2021-05-28 10:26:05,239] {hooks.py:210} DEBUG - Event before-call.s3.GetObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7f345321aa90>>
   [2021-05-28 10:26:05,239] {hooks.py:210} DEBUG - Event before-call.s3.GetObject: calling handler <function inject_api_version_header_if_needed at 0x7f347f13b0e0>
   [2021-05-28 10:26:05,240] {utils.py:612} DEBUG - Releasing acquire 0/None
   [2021-05-28 10:26:05,240] {endpoint.py:101} DEBUG - Making request for OperationModel(name=GetObject) with params: {'url_path': '/[REDACT]/tmp.txt', 'query_string': {}, 'method': 'GET', 'headers': {'User-Agent': 'Boto3/1.15.18 Python/3.7.10 Linux/5.10.25-linuxkit Botocore/1.18.18 Resource'}, 'body': b'', 'url': '[REDACT]', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0x7f345321a710>, 'has_streaming_input': False, 'auth_type': None, 'signing': {'bucket': '[REDACT]'}}}
   [2021-05-28 10:26:05,241] {hooks.py:210} DEBUG - Event request-created.s3.GetObject: calling handler <function signal_not_transferring at 0x7f347ee7de60>
   [2021-05-28 10:26:05,241] {hooks.py:210} DEBUG - Event request-created.s3.GetObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f3453215e90>>
   [2021-05-28 10:26:05,241] {hooks.py:210} DEBUG - Event choose-signer.s3.GetObject: calling handler <bound method ClientCreator._default_s3_presign_to_sigv2 of <botocore.client.ClientCreator object at 0x7f3453f046d0>>
   [2021-05-28 10:26:05,242] {hooks.py:210} DEBUG - Event choose-signer.s3.GetObject: calling handler <function set_operation_specific_signer at 0x7f347f133710>
   [2021-05-28 10:26:05,242] {hooks.py:210} DEBUG - Event before-sign.s3.GetObject: calling handler <bound method S3EndpointSetter.set_endpoint of <botocore.utils.S3EndpointSetter object at 0x7f34531d0710>>
   [2021-05-28 10:26:05,242] {utils.py:1018} DEBUG - Checking for DNS compatible bucket for: [REDACT]
   [2021-05-28 10:26:05,242] {utils.py:1036} DEBUG - URI updated to: [REDACT]
   [2021-05-28 10:26:05,243] {auth.py:364} DEBUG - Calculating signature using v4 auth.
   [2021-05-28 10:26:05,243] {auth.py:365} DEBUG - CanonicalRequest:
   GET
   /tmp.txt
   [REDACT]
   [2021-05-28 10:26:05,243] {hooks.py:210} DEBUG - Event request-created.s3.GetObject: calling handler <function signal_transferring at 0x7f347ee8a320>
   [2021-05-28 10:26:05,243] {endpoint.py:187} DEBUG - Sending http request: <AWSPreparedRequest stream_output=True, method=GET, url=[REDACT], headers={'User-Agent': b'Boto3/1.15.18 Python/3.7.10 Linux/5.10.25-linuxkit Botocore/1.18.18 Resource', 'X-Amz-Date': b'20210528T172605Z', 'X-Amz-Content-SHA256': b'[REDACT]', 'Authorization': b'[REDACT], SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=[REDACT]'}>
   [2021-05-28 10:26:05,402] {connectionpool.py:442} DEBUG - https://[REDACT].s3.amazonaws.com:443 "GET /tmp.txt HTTP/1.1" 200 5
   [2021-05-28 10:26:05,402] {parsers.py:233} DEBUG - Response headers: {'x-amz-id-2': '[REDACT]', 'x-amz-request-id': '[REDACT]', 'Date': 'Fri, 28 May 2021 17:26:06 GMT', 'Last-Modified': 'Thu, 27 May 2021 20:37:55 GMT', 'ETag': '"[REDACT]"', 'x-amz-server-side-encryption': 'AES256', 'x-amz-version-id': '[REDACT]', 'Accept-Ranges': 'bytes', 'Content-Type': 'text/plain', 'Content-Length': '5', 'Server': 'AmazonS3'}
   [2021-05-28 10:26:05,403] {parsers.py:234} DEBUG - Response body:
   <botocore.response.StreamingBody object at 0x7f345310d090>
   [2021-05-28 10:26:05,404] {hooks.py:210} DEBUG - Event needs-retry.s3.GetObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f345321ab50>
   [2021-05-28 10:26:05,404] {retryhandler.py:187} DEBUG - No retry needed.
   [2021-05-28 10:26:05,404] {hooks.py:210} DEBUG - Event needs-retry.s3.GetObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f345321aa90>>
   [2021-05-28 10:26:05,405] {tasks.py:194} DEBUG - IOWriteTask(transfer_id=0, {'offset': 0}) about to wait for the following futures []
   [2021-05-28 10:26:05,406] {tasks.py:203} DEBUG - IOWriteTask(transfer_id=0, {'offset': 0}) done waiting for dependent futures
   [2021-05-28 10:26:05,406] {tasks.py:147} DEBUG - Executing task IOWriteTask(transfer_id=0, {'offset': 0}) with kwargs {'fileobj': <s3transfer.utils.DeferredOpenFile object at 0x7f34531fc890>, 'offset': 0}
   [2021-05-28 10:26:05,407] {tasks.py:194} DEBUG - IORenameFileTask(transfer_id=0, {'final_filename': '/tmp/s3_hook.txt'}) about to wait for the following futures []
   [2021-05-28 10:26:05,407] {tasks.py:203} DEBUG - IORenameFileTask(transfer_id=0, {'final_filename': '/tmp/s3_hook.txt'}) done waiting for dependent futures
   [2021-05-28 10:26:05,408] {tasks.py:147} DEBUG - Executing task IORenameFileTask(transfer_id=0, {'final_filename': '/tmp/s3_hook.txt'}) with kwargs {'fileobj': <s3transfer.utils.DeferredOpenFile object at 0x7f34531fc890>, 'final_filename': '/tmp/s3_hook.txt', 'osutil': <s3transfer.utils.OSUtils object at 0x7f3453181510>}
   [2021-05-28 10:26:05,409] {utils.py:612} DEBUG - Releasing acquire 0/None
   [2021-05-28 10:26:05,412] {tmp_dag.py:21} INFO - File downloaded: /tmp/s3_hook.txt
   [2021-05-28 10:26:05,413] {tmp_dag.py:24} INFO - FILE CONTENT test
   
   [2021-05-28 10:26:05,413] {python.py:118} INFO - Done. Returned value was: None
   [2021-05-28 10:26:05,413] {__init__.py:107} DEBUG - Lineage called with inlets: [], outlets: []
   [2021-05-28 10:26:05,413] {taskinstance.py:595} DEBUG - Refreshing TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [running]> from DB
   [2021-05-28 10:26:05,421] {taskinstance.py:630} DEBUG - Refreshed TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [running]>
   [2021-05-28 10:26:05,423] {taskinstance.py:1192} INFO - Marking task as SUCCESS. dag_id=tmp, task_id=download_file_from_s3ile, execution_date=20210528T172558, start_date=20210528T172604, end_date=20210528T172605
   [2021-05-28 10:26:05,423] {taskinstance.py:1891} DEBUG - Task Duration set to 1.141694
   [2021-05-28 10:26:05,455] {dagrun.py:491} DEBUG - number of tis tasks for <DagRun tmp @ 2021-05-28 17:25:58.851532+00:00: manual__2021-05-28T17:25:58.851532+00:00, externally triggered: True>: 0 task(s)
   [2021-05-28 10:26:05,456] {taskinstance.py:1246} INFO - 0 downstream tasks scheduled from follow-on schedule check
   [2021-05-28 10:26:05,457] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
   [2021-05-28 10:26:05,510] {local_task_job.py:146} INFO - Task exited with return code 0
   [2021-05-28 10:26:05,511] {taskinstance.py:595} DEBUG - Refreshing TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [running]> from DB
   [2021-05-28 10:26:05,524] {taskinstance.py:630} DEBUG - Refreshed TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:25:58.851532+00:00 [success]>
   ```
   
   ⚠️ notice the file content (`test`) is properly shown in the log
   
   The logged output from 2.1.0
   ```*** Log file does not exist: /usr/local/airflow/logs/tmp/download_file_from_s3ile/2021-05-28T17:36:09.750993+00:00/1.log
   *** Fetching from: http://f2ffe4375669:8793/log/tmp/download_file_from_s3ile/2021-05-28T17:36:09.750993+00:00/1.log
   
   [2021-05-28 10:36:14,758] {executor_loader.py:82} DEBUG - Loading core executor: CeleryExecutor
   [2021-05-28 10:36:14,769] {__init__.py:51} DEBUG - Loading core task runner: StandardTaskRunner
   [2021-05-28 10:36:14,779] {base_task_runner.py:62} DEBUG - Planning to run as the  user
   [2021-05-28 10:36:14,781] {taskinstance.py:594} DEBUG - Refreshing TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]> from DB
   [2021-05-28 10:36:14,788] {taskinstance.py:629} DEBUG - Refreshed TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]>
   [2021-05-28 10:36:14,789] {taskinstance.py:891} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]> dependency 'Trigger Rule' PASSED: True, The task instance did not have any upstream tasks.
   [2021-05-28 10:36:14,789] {taskinstance.py:891} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]> dependency 'Task Instance Not Running' PASSED: True, Task is not in running state.
   [2021-05-28 10:36:14,789] {taskinstance.py:891} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]> dependency 'Task Instance State' PASSED: True, Task state queued was valid.
   [2021-05-28 10:36:14,793] {taskinstance.py:891} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]> dependency 'Not In Retry Period' PASSED: True, The task instance was not marked for retrying.
   [2021-05-28 10:36:14,793] {taskinstance.py:891} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]> dependency 'Previous Dagrun State' PASSED: True, The task did not have depends_on_past set.
   [2021-05-28 10:36:14,793] {taskinstance.py:876} INFO - Dependencies all met for <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]>
   [2021-05-28 10:36:14,793] {taskinstance.py:891} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]> dependency 'Trigger Rule' PASSED: True, The task instance did not have any upstream tasks.
   [2021-05-28 10:36:14,800] {taskinstance.py:891} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]> dependency 'Pool Slots Available' PASSED: True, ('There are enough open slots in %s to execute the task', 'default_pool')
   [2021-05-28 10:36:14,808] {taskinstance.py:891} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]> dependency 'Not In Retry Period' PASSED: True, The task instance was not marked for retrying.
   [2021-05-28 10:36:14,810] {taskinstance.py:891} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]> dependency 'Previous Dagrun State' PASSED: True, The task did not have depends_on_past set.
   [2021-05-28 10:36:14,810] {taskinstance.py:891} DEBUG - <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]> dependency 'Task Concurrency' PASSED: True, Task concurrency is not set.
   [2021-05-28 10:36:14,810] {taskinstance.py:876} INFO - Dependencies all met for <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [queued]>
   [2021-05-28 10:36:14,810] {taskinstance.py:1067} INFO -
   --------------------------------------------------------------------------------
   [2021-05-28 10:36:14,810] {taskinstance.py:1068} INFO - Starting attempt 1 of 1
   [2021-05-28 10:36:14,811] {taskinstance.py:1069} INFO -
   --------------------------------------------------------------------------------
   [2021-05-28 10:36:14,823] {taskinstance.py:1087} INFO - Executing <Task(PythonOperator): download_file_from_s3ile> on 2021-05-28T17:36:09.750993+00:00
   [2021-05-28 10:36:14,830] {standard_task_runner.py:52} INFO - Started process 116 to run task
   [2021-05-28 10:36:14,836] {standard_task_runner.py:76} INFO - Running: ['***', 'tasks', 'run', 'tmp', 'download_file_from_s3ile', '2021-05-28T17:36:09.750993+00:00', '--job-id', '8', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/tmp_dag.py', '--cfg-path', '/tmp/tmplhbjfxop', '--error-file', '/tmp/tmpdbeh5gr9']
   [2021-05-28 10:36:14,839] {standard_task_runner.py:77} INFO - Job 8: Subtask download_file_from_s3ile
   [2021-05-28 10:36:14,841] {cli_action_loggers.py:66} DEBUG - Calling callbacks: [<function default_action_log at 0x7f2e2920f5f0>]
   [2021-05-28 10:36:14,860] {settings.py:210} DEBUG - Setting up DB connection pool (PID 116)
   [2021-05-28 10:36:14,860] {settings.py:246} DEBUG - settings.prepare_engine_args(): Using NullPool
   [2021-05-28 10:36:14,864] {taskinstance.py:594} DEBUG - Refreshing TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [None]> from DB
   [2021-05-28 10:36:14,883] {taskinstance.py:629} DEBUG - Refreshed TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [running]>
   [2021-05-28 10:36:14,893] {logging_mixin.py:104} INFO - Running <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [running]> on host f2ffe4375669
   [2021-05-28 10:36:14,896] {taskinstance.py:594} DEBUG - Refreshing TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [running]> from DB
   [2021-05-28 10:36:14,902] {taskinstance.py:629} DEBUG - Refreshed TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [running]>
   [2021-05-28 10:36:14,917] {taskinstance.py:657} DEBUG - Clearing XCom data
   [2021-05-28 10:36:14,925] {taskinstance.py:664} DEBUG - XCom data cleared
   [2021-05-28 10:36:14,947] {taskinstance.py:1282} INFO - Exporting the following env vars:
   AIRFLOW_CTX_DAG_OWNER=***
   AIRFLOW_CTX_DAG_ID=tmp
   AIRFLOW_CTX_TASK_ID=download_file_from_s3ile
   AIRFLOW_CTX_EXECUTION_DATE=2021-05-28T17:36:09.750993+00:00
   AIRFLOW_CTX_DAG_RUN_ID=manual__2021-05-28T17:36:09.750993+00:00
   [2021-05-28 10:36:14,948] {__init__.py:146} DEBUG - Preparing lineage inlets and outlets
   [2021-05-28 10:36:14,948] {__init__.py:190} DEBUG - inlets: [], outlets: []
   [2021-05-28 10:36:14,949] {base_aws.py:362} INFO - Airflow Connection: aws_conn_id=aws_default
   [2021-05-28 10:36:14,958] {base_aws.py:385} WARNING - Unable to use Airflow Connection for credentials.
   [2021-05-28 10:36:14,958] {base_aws.py:386} INFO - Fallback on boto3 credential strategy
   [2021-05-28 10:36:14,958] {base_aws.py:391} INFO - Creating session using boto3 credential strategy region_name=None
   [2021-05-28 10:36:14,960] {hooks.py:417} DEBUG - Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
   [2021-05-28 10:36:14,962] {hooks.py:417} DEBUG - Changing event name from before-call.apigateway to before-call.api-gateway
   [2021-05-28 10:36:14,962] {hooks.py:417} DEBUG - Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
   [2021-05-28 10:36:14,965] {hooks.py:417} DEBUG - Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
   [2021-05-28 10:36:14,965] {hooks.py:417} DEBUG - Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
   [2021-05-28 10:36:14,965] {hooks.py:417} DEBUG - Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
   [2021-05-28 10:36:14,966] {hooks.py:417} DEBUG - Changing event name from docs.*.autoscaling.CreateLaunchConfiguration.complete-section to docs.*.auto-scaling.CreateLaunchConfiguration.complete-section
   [2021-05-28 10:36:14,968] {hooks.py:417} DEBUG - Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
   [2021-05-28 10:36:14,969] {hooks.py:417} DEBUG - Changing event name from docs.*.logs.CreateExportTask.complete-section to docs.*.cloudwatch-logs.CreateExportTask.complete-section
   [2021-05-28 10:36:14,969] {hooks.py:417} DEBUG - Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
   [2021-05-28 10:36:14,969] {hooks.py:417} DEBUG - Changing event name from docs.*.cloudsearchdomain.Search.complete-section to docs.*.cloudsearch-domain.Search.complete-section
   [2021-05-28 10:36:14,982] {loaders.py:174} DEBUG - Loading JSON file: /usr/local/lib/python3.7/site-packages/boto3/data/s3/2006-03-01/resources-1.json
   [2021-05-28 10:36:14,986] {credentials.py:1961} DEBUG - Looking for credentials via: env
   [2021-05-28 10:36:14,986] {credentials.py:1087} INFO - Found credentials in environment variables.
   [2021-05-28 10:36:14,987] {loaders.py:174} DEBUG - Loading JSON file: /usr/local/lib/python3.7/site-packages/botocore/data/endpoints.json
   [2021-05-28 10:36:14,992] {hooks.py:210} DEBUG - Event choose-service-name: calling handler <function handle_service_name_alias at 0x7f2e22e7b7a0>
   [2021-05-28 10:36:15,002] {loaders.py:174} DEBUG - Loading JSON file: /usr/local/lib/python3.7/site-packages/botocore/data/s3/2006-03-01/service-2.json
   [2021-05-28 10:36:15,010] {hooks.py:210} DEBUG - Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f2e22ea5320>
   [2021-05-28 10:36:15,010] {hooks.py:210} DEBUG - Event creating-client-class.s3: calling handler <function lazy_call.<locals>._handler at 0x7f2df976ee60>
   [2021-05-28 10:36:15,011] {hooks.py:210} DEBUG - Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f2e22ea50e0>
   [2021-05-28 10:36:15,015] {endpoint.py:291} DEBUG - Setting s3 timeout as (60, 60)
   [2021-05-28 10:36:15,017] {loaders.py:174} DEBUG - Loading JSON file: /usr/local/lib/python3.7/site-packages/botocore/data/_retry.json
   [2021-05-28 10:36:15,017] {client.py:164} DEBUG - Registering retry handlers for service: s3
   [2021-05-28 10:36:15,018] {factory.py:66} DEBUG - Loading s3:s3
   [2021-05-28 10:36:15,019] {factory.py:66} DEBUG - Loading s3:Bucket
   [2021-05-28 10:36:15,020] {model.py:358} DEBUG - Renaming Bucket attribute name
   [2021-05-28 10:36:15,021] {hooks.py:210} DEBUG - Event creating-resource-class.s3.Bucket: calling handler <function lazy_call.<locals>._handler at 0x7f2df9762d40>
   [2021-05-28 10:36:15,021] {factory.py:66} DEBUG - Loading s3:Object
   [2021-05-28 10:36:15,022] {hooks.py:210} DEBUG - Event creating-resource-class.s3.Object: calling handler <function lazy_call.<locals>._handler at 0x7f2df977fa70>
   [2021-05-28 10:36:15,023] {utils.py:599} DEBUG - Acquiring 0
   [2021-05-28 10:36:15,024] {tasks.py:194} DEBUG - DownloadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f2df921f790>}) about to wait for the following futures []
   [2021-05-28 10:36:15,024] {tasks.py:203} DEBUG - DownloadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f2df921f790>}) done waiting for dependent futures
   [2021-05-28 10:36:15,025] {tasks.py:147} DEBUG - Executing task DownloadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f2df921f790>}) with kwargs {'client': <botocore.client.S3 object at 0x7f2df9721390>, 'config': <boto3.s3.transfer.TransferConfig object at 0x7f2df921fe90>, 'osutil': <s3transfer.utils.OSUtils object at 0x7f2df921ffd0>, 'request_executor': <s3transfer.futures.BoundedExecutor object at 0x7f2df921fcd0>, 'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f2df921f790>, 'io_executor': <s3transfer.futures.BoundedExecutor object at 0x7f2df921fa50>}
   [2021-05-28 10:36:15,025] {hooks.py:210} DEBUG - Event before-parameter-build.s3.HeadObject: calling handler <function sse_md5 at 0x7f2e22e18c20>
   [2021-05-28 10:36:15,025] {hooks.py:210} DEBUG - Event before-parameter-build.s3.HeadObject: calling handler <function validate_bucket_name at 0x7f2e22e18b90>
   [2021-05-28 10:36:15,025] {hooks.py:210} DEBUG - Event before-parameter-build.s3.HeadObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7f2df926ed10>>
   [2021-05-28 10:36:15,026] {hooks.py:210} DEBUG - Event before-parameter-build.s3.HeadObject: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x7f2df92be3d0>>
   [2021-05-28 10:36:15,026] {hooks.py:210} DEBUG - Event before-parameter-build.s3.HeadObject: calling handler <function generate_idempotent_uuid at 0x7f2e22e189e0>
   [2021-05-28 10:36:15,027] {hooks.py:210} DEBUG - Event before-call.s3.HeadObject: calling handler <function add_expect_header at 0x7f2e22e18ef0>
   [2021-05-28 10:36:15,027] {hooks.py:210} DEBUG - Event before-call.s3.HeadObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7f2df926ed10>>
   [2021-05-28 10:36:15,027] {hooks.py:210} DEBUG - Event before-call.s3.HeadObject: calling handler <function inject_api_version_header_if_needed at 0x7f2e22e1f290>
   [2021-05-28 10:36:15,027] {endpoint.py:101} DEBUG - Making request for OperationModel(name=HeadObject) with params: {'url_path': '/[REDACT]/tmp.txt', 'query_string': {}, 'method': 'HEAD', 'headers': {'User-Agent': 'Boto3/1.15.18 Python/3.7.10 Linux/5.10.25-linuxkit Botocore/1.18.18 Resource'}, 'body': [], 'url': 'https://s3.amazonaws.com/[REDACT]/tmp.txt', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0x7f2df92be350>, 'has_streaming_input': False, 'auth_type': None, 'signing': {'bucket': '[REDACT]'}}}
   [2021-05-28 10:36:15,028] {hooks.py:210} DEBUG - Event request-created.s3.HeadObject: calling handler <function signal_not_transferring at 0x7f2e22b9f170>
   [2021-05-28 10:36:15,029] {hooks.py:210} DEBUG - Event request-created.s3.HeadObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f2df92b7a10>>
   [2021-05-28 10:36:15,029] {hooks.py:210} DEBUG - Event choose-signer.s3.HeadObject: calling handler <bound method ClientCreator._default_s3_presign_to_sigv2 of <botocore.client.ClientCreator object at 0x7f2df96db510>>
   [2021-05-28 10:36:15,029] {hooks.py:210} DEBUG - Event choose-signer.s3.HeadObject: calling handler <function set_operation_specific_signer at 0x7f2e22e188c0>
   [2021-05-28 10:36:15,029] {hooks.py:210} DEBUG - Event before-sign.s3.HeadObject: calling handler <bound method S3EndpointSetter.set_endpoint of <botocore.utils.S3EndpointSetter object at 0x7f2df92756d0>>
   [2021-05-28 10:36:15,029] {utils.py:1639} DEBUG - Defaulting to S3 virtual host style addressing with path style addressing fallback.
   [2021-05-28 10:36:15,029] {utils.py:1018} DEBUG - Checking for DNS compatible bucket for: https://s3.amazonaws.com/[REDACT]/tmp.txt
   [2021-05-28 10:36:15,030] {utils.py:1036} DEBUG - URI updated to: https://[REDACT].s3.amazonaws.com/tmp.txt
   [2021-05-28 10:36:15,030] {auth.py:364} DEBUG - Calculating signature using v4 auth.
   [2021-05-28 10:36:15,030] {auth.py:365} DEBUG - CanonicalRequest:
   HEAD
   /tmp.txt
   [REDACT]
   [2021-05-28 10:36:15,031] {hooks.py:210} DEBUG - Event request-created.s3.HeadObject: calling handler <function signal_transferring at 0x7f2e22baa4d0>
   [2021-05-28 10:36:15,031] {endpoint.py:187} DEBUG - Sending http request: <AWSPreparedRequest stream_output=False, method=HEAD, url=https://[REDACT].s3.amazonaws.com/tmp.txt, headers={'User-Agent': b'Boto3/1.15.18 Python/3.7.10 Linux/5.10.25-linuxkit Botocore/1.18.18 Resource', 'X-Amz-Date': b'20210528T173615Z', 'X-Amz-Content-SHA256': b'[REDACT]', 'Authorization': b'[REDACT], SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=[REDACT]'}>
   [2021-05-28 10:36:15,032] {connectionpool.py:943} DEBUG - Starting new HTTPS connection (1): [REDACT].s3.amazonaws.com:443
   [2021-05-28 10:36:15,695] {connectionpool.py:442} DEBUG - https://[REDACT].s3.amazonaws.com:443 "HEAD /tmp.txt HTTP/1.1" 200 0
   [2021-05-28 10:36:15,696] {parsers.py:233} DEBUG - Response headers: ['x-amz-id-2', 'x-amz-request-id', 'Date', 'Last-Modified', 'ETag', 'x-amz-server-side-encryption', 'x-amz-version-id', 'Accept-Ranges', 'Content-Type', 'Content-Length', 'Server']
   [2021-05-28 10:36:15,696] {parsers.py:234} DEBUG - Response body:
   []
   [2021-05-28 10:36:15,697] {hooks.py:210} DEBUG - Event needs-retry.s3.HeadObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f2df926ef90>
   [2021-05-28 10:36:15,698] {retryhandler.py:187} DEBUG - No retry needed.
   [2021-05-28 10:36:15,698] {hooks.py:210} DEBUG - Event needs-retry.s3.HeadObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f2df926ed10>>
   [2021-05-28 10:36:15,698] {futures.py:318} DEBUG - Submitting task ImmediatelyWriteIOGetObjectTask(transfer_id=0, {'bucket': '[REDACT]', 'key': 'tmp.txt', 'extra_args': {}}) to executor <s3transfer.futures.BoundedExecutor object at 0x7f2df921fcd0> for transfer request: 0.
   [2021-05-28 10:36:15,698] {utils.py:599} DEBUG - Acquiring 0
   [2021-05-28 10:36:15,699] {tasks.py:194} DEBUG - ImmediatelyWriteIOGetObjectTask(transfer_id=0, {'bucket': '[REDACT]', 'key': 'tmp.txt', 'extra_args': {}}) about to wait for the following futures []
   [2021-05-28 10:36:15,699] {tasks.py:203} DEBUG - ImmediatelyWriteIOGetObjectTask(transfer_id=0, {'bucket': '[REDACT]', 'key': 'tmp.txt', 'extra_args': {}}) done waiting for dependent futures
   [2021-05-28 10:36:15,699] {tasks.py:147} DEBUG - Executing task ImmediatelyWriteIOGetObjectTask(transfer_id=0, {'bucket': '[REDACT]', 'key': 'tmp.txt', 'extra_args': {}}) with kwargs {'client': <botocore.client.S3 object at 0x7f2df9721390>, 'bucket': '[REDACT]', 'key': 'tmp.txt', 'fileobj': <s3transfer.utils.DeferredOpenFile object at 0x7f2df97dc0d0>, 'extra_args': {}, 'callbacks': [], 'max_attempts': 5, 'download_output_manager': <s3transfer.download.DownloadFilenameOutputManager object at 0x7f2df976b310>, 'io_chunksize': 262144, 'bandwidth_limiter': None}
   [2021-05-28 10:36:15,699] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <function sse_md5 at 0x7f2e22e18c20>
   [2021-05-28 10:36:15,700] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <function validate_bucket_name at 0x7f2e22e18b90>
   [2021-05-28 10:36:15,700] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7f2df926ed10>>
   [2021-05-28 10:36:15,700] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x7f2df92be3d0>>
   [2021-05-28 10:36:15,700] {hooks.py:210} DEBUG - Event before-parameter-build.s3.GetObject: calling handler <function generate_idempotent_uuid at 0x7f2e22e189e0>
   [2021-05-28 10:36:15,700] {hooks.py:210} DEBUG - Event before-call.s3.GetObject: calling handler <function add_expect_header at 0x7f2e22e18ef0>
   [2021-05-28 10:36:15,700] {hooks.py:210} DEBUG - Event before-call.s3.GetObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7f2df926ed10>>
   [2021-05-28 10:36:15,701] {hooks.py:210} DEBUG - Event before-call.s3.GetObject: calling handler <function inject_api_version_header_if_needed at 0x7f2e22e1f290>
   [2021-05-28 10:36:15,701] {endpoint.py:101} DEBUG - Making request for OperationModel(name=GetObject) with params: {'url_path': '/[REDACT]/tmp.txt', 'query_string': {}, 'method': 'GET', 'headers': {'User-Agent': 'Boto3/1.15.18 Python/3.7.10 Linux/5.10.25-linuxkit Botocore/1.18.18 Resource'}, 'body': [], 'url': 'https://s3.amazonaws.com/[REDACT]/tmp.txt', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0x7f2df92be350>, 'has_streaming_input': False, 'auth_type': None, 'signing': {'bucket': '[REDACT]'}}}
   [2021-05-28 10:36:15,701] {hooks.py:210} DEBUG - Event request-created.s3.GetObject: calling handler <function signal_not_transferring at 0x7f2e22b9f170>
   [2021-05-28 10:36:15,701] {hooks.py:210} DEBUG - Event request-created.s3.GetObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f2df92b7a10>>
   [2021-05-28 10:36:15,701] {hooks.py:210} DEBUG - Event choose-signer.s3.GetObject: calling handler <bound method ClientCreator._default_s3_presign_to_sigv2 of <botocore.client.ClientCreator object at 0x7f2df96db510>>
   [2021-05-28 10:36:15,702] {hooks.py:210} DEBUG - Event choose-signer.s3.GetObject: calling handler <function set_operation_specific_signer at 0x7f2e22e188c0>
   [2021-05-28 10:36:15,702] {hooks.py:210} DEBUG - Event before-sign.s3.GetObject: calling handler <bound method S3EndpointSetter.set_endpoint of <botocore.utils.S3EndpointSetter object at 0x7f2df92756d0>>
   [2021-05-28 10:36:15,702] {utils.py:1018} DEBUG - Checking for DNS compatible bucket for: https://s3.amazonaws.com/[REDACT]/tmp.txt
   [2021-05-28 10:36:15,702] {utils.py:1036} DEBUG - URI updated to: https://[REDACT].s3.amazonaws.com/tmp.txt
   [2021-05-28 10:36:15,702] {utils.py:612} DEBUG - Releasing acquire 0/None
   [2021-05-28 10:36:15,702] {auth.py:364} DEBUG - Calculating signature using v4 auth.
   [2021-05-28 10:36:15,703] {auth.py:365} DEBUG - CanonicalRequest:
   GET
   /tmp.txt
   [REDACT]
   [2021-05-28 10:36:15,703] {hooks.py:210} DEBUG - Event request-created.s3.GetObject: calling handler <function signal_transferring at 0x7f2e22baa4d0>
   [2021-05-28 10:36:15,703] {endpoint.py:187} DEBUG - Sending http request: <AWSPreparedRequest stream_output=True, method=GET, url=https://[REDACT].s3.amazonaws.com/tmp.txt, headers={'User-Agent': b'Boto3/1.15.18 Python/3.7.10 Linux/5.10.25-linuxkit Botocore/1.18.18 Resource', 'X-Amz-Date': b'20210528T173615Z', 'X-Amz-Content-SHA256': b'[REDACT]', 'Authorization': b'[REDACT], SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=[REDACT]'}>
   [2021-05-28 10:36:15,879] {connectionpool.py:442} DEBUG - https://[REDACT].s3.amazonaws.com:443 "GET /tmp.txt HTTP/1.1" 200 5
   [2021-05-28 10:36:15,879] {parsers.py:233} DEBUG - Response headers: ['x-amz-id-2', 'x-amz-request-id', 'Date', 'Last-Modified', 'ETag', 'x-amz-server-side-encryption', 'x-amz-version-id', 'Accept-Ranges', 'Content-Type', 'Content-Length', 'Server']
   [2021-05-28 10:36:15,879] {parsers.py:234} DEBUG - Response body:
   [[116, 101, 115, 116, 10]]
   [2021-05-28 10:36:15,883] {hooks.py:210} DEBUG - Event needs-retry.s3.GetObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f2df926ef90>
   [2021-05-28 10:36:15,883] {retryhandler.py:187} DEBUG - No retry needed.
   [2021-05-28 10:36:15,883] {hooks.py:210} DEBUG - Event needs-retry.s3.GetObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f2df926ed10>>
   [2021-05-28 10:36:15,883] {tasks.py:194} DEBUG - IOWriteTask(transfer_id=0, {'offset': 0}) about to wait for the following futures []
   [2021-05-28 10:36:15,885] {tasks.py:203} DEBUG - IOWriteTask(transfer_id=0, {'offset': 0}) done waiting for dependent futures
   [2021-05-28 10:36:15,885] {tasks.py:147} DEBUG - Executing task IOWriteTask(transfer_id=0, {'offset': 0}) with kwargs {'fileobj': <s3transfer.utils.DeferredOpenFile object at 0x7f2df97dc0d0>, 'offset': 0}
   [2021-05-28 10:36:15,885] {tasks.py:194} DEBUG - IORenameFileTask(transfer_id=0, {'final_filename': '/tmp/s3_hook.txt'}) about to wait for the following futures []
   [2021-05-28 10:36:15,886] {tasks.py:203} DEBUG - IORenameFileTask(transfer_id=0, {'final_filename': '/tmp/s3_hook.txt'}) done waiting for dependent futures
   [2021-05-28 10:36:15,886] {tasks.py:147} DEBUG - Executing task IORenameFileTask(transfer_id=0, {'final_filename': '/tmp/s3_hook.txt'}) with kwargs {'fileobj': <s3transfer.utils.DeferredOpenFile object at 0x7f2df97dc0d0>, 'final_filename': '/tmp/s3_hook.txt', 'osutil': <s3transfer.utils.OSUtils object at 0x7f2df921ffd0>}
   [2021-05-28 10:36:15,886] {utils.py:612} DEBUG - Releasing acquire 0/None
   [2021-05-28 10:36:15,887] {tmp_dag.py:21} INFO - File downloaded: /tmp/s3_hook.txt
   [2021-05-28 10:36:15,888] {tmp_dag.py:24} INFO - FILE CONTENT
   [2021-05-28 10:36:15,888] {python.py:151} INFO - Done. Returned value was: None
   [2021-05-28 10:36:15,888] {__init__.py:107} DEBUG - Lineage called with inlets: [], outlets: []
   [2021-05-28 10:36:15,888] {taskinstance.py:594} DEBUG - Refreshing TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [running]> from DB
   [2021-05-28 10:36:15,893] {taskinstance.py:629} DEBUG - Refreshed TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [running]>
   [2021-05-28 10:36:15,894] {taskinstance.py:1191} INFO - Marking task as SUCCESS. dag_id=tmp, task_id=download_file_from_s3ile, execution_date=20210528T173609, start_date=20210528T173614, end_date=20210528T173615
   [2021-05-28 10:36:15,894] {taskinstance.py:1888} DEBUG - Task Duration set to 1.100586
   [2021-05-28 10:36:15,915] {dagrun.py:490} DEBUG - number of tis tasks for <DagRun tmp @ 2021-05-28 17:36:09.750993+00:00: manual__2021-05-28T17:36:09.750993+00:00, externally triggered: True>: 0 task(s)
   [2021-05-28 10:36:15,917] {taskinstance.py:1245} INFO - 0 downstream tasks scheduled from follow-on schedule check
   [2021-05-28 10:36:15,917] {cli_action_loggers.py:84} DEBUG - Calling callbacks: []
   [2021-05-28 10:36:15,939] {local_task_job.py:151} INFO - Task exited with return code 0
   [2021-05-28 10:36:15,939] {taskinstance.py:594} DEBUG - Refreshing TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [running]> from DB
   [2021-05-28 10:36:15,949] {taskinstance.py:629} DEBUG - Refreshed TaskInstance <TaskInstance: tmp.download_file_from_s3ile 2021-05-28T17:36:09.750993+00:00 [success]>
   ```
   ⚠️ notice the file content is **NOT** properly shown in the log
   
   **Anything else we need to know**:
   
   pip freeze for 2.0.2:
   ```
   adal==1.2.7
   aiohttp==3.7.4.post0
   alembic==1.6.5
   amqp==2.6.1
   ansiwrap==0.8.4
   apache-airflow==2.0.2
   apache-airflow-providers-amazon==1.2.0
   apache-airflow-providers-celery==1.0.1
   apache-airflow-providers-databricks==1.0.1
   apache-airflow-providers-ftp==1.1.0
   apache-airflow-providers-google==1.0.0
   apache-airflow-providers-http==1.1.1
   apache-airflow-providers-imap==1.0.1
   apache-airflow-providers-jdbc==1.0.1
   apache-airflow-providers-mongo==1.0.1
   apache-airflow-providers-mysql==1.0.2
   apache-airflow-providers-papermill==1.0.2
   apache-airflow-providers-postgres==1.0.1
   apache-airflow-providers-redis==1.0.1
   apache-airflow-providers-salesforce==1.0.1
   apache-airflow-providers-slack==3.0.0
   apache-airflow-providers-snowflake==1.1.1
   apache-airflow-providers-sqlite==1.0.2
   apache-airflow-providers-ssh==1.2.0
   apispec==3.3.2
   appdirs==1.4.4
   argcomplete==1.12.3
   asn1crypto==1.4.0
   async-generator==1.10
   async-timeout==3.0.1
   attrs==20.3.0
   Authlib==0.15.3
   avro-python3==1.10.0
   azure-common==1.1.27
   azure-core==1.14.0
   azure-datalake-store==0.0.52
   azure-storage-blob==12.8.1
   Babel==2.9.1
   backcall==0.2.0
   bcrypt==3.2.0
   billiard==3.6.4.0
   black==21.5b1
   blinker==1.4
   boto3==1.15.18
   botocore==1.18.18
   cached-property==1.5.2
   cachetools==4.2.2
   cattrs==1.7.0
   celery==4.4.7
   Cerberus==1.3.2
   certifi==2020.12.5
   cffi==1.14.5
   chardet==3.0.4
   click==7.1.2
   clickclick==20.10.2
   colorama==0.4.4
   colorlog==5.0.1
   commonmark==0.9.1
   connexion==2.7.0
   croniter==0.3.37
   cryptography==3.4.7
   cycler==0.10.0
   databricks-cli==0.14.3
   databricks-connect==7.3.8
   decorator==5.0.9
   defusedxml==0.7.1
   dill==0.3.3
   dnspython==1.16.0
   docutils==0.17.1
   email-validator==1.1.2
   entrypoints==0.3
   Flask==1.1.4
   Flask-AppBuilder==3.3.0
   Flask-Babel==1.0.0
   Flask-Bcrypt==0.7.1
   Flask-Caching==1.10.1
   Flask-JWT-Extended==3.25.1
   Flask-Login==0.4.1
   Flask-OpenID==1.2.5
   Flask-SQLAlchemy==2.5.1
   Flask-WTF==0.14.3
   flower==0.9.5
   fsspec==2021.5.0
   gcsfs==2021.5.0
   google-ads==7.0.0
   google-api-core==1.26.0
   google-api-python-client==1.12.8
   google-auth==1.27.0
   google-auth-httplib2==0.1.0
   google-auth-oauthlib==0.4.4
   google-cloud-automl==1.0.1
   google-cloud-bigquery==2.17.0
   google-cloud-bigquery-datatransfer==1.1.1
   google-cloud-bigquery-storage==2.4.0
   google-cloud-bigtable==1.7.0
   google-cloud-container==1.0.1
   google-cloud-core==1.6.0
   google-cloud-datacatalog==0.7.0
   google-cloud-dataproc==1.1.1
   google-cloud-dlp==1.0.0
   google-cloud-kms==1.4.0
   google-cloud-language==1.3.0
   google-cloud-logging==1.15.1
   google-cloud-memcache==0.3.0
   google-cloud-monitoring==1.1.0
   google-cloud-os-login==1.0.0
   google-cloud-pubsub==1.7.0
   google-cloud-redis==1.0.0
   google-cloud-secret-manager==1.0.0
   google-cloud-spanner==1.19.1
   google-cloud-speech==1.3.2
   google-cloud-storage==1.38.0
   google-cloud-tasks==1.5.0
   google-cloud-texttospeech==1.0.1
   google-cloud-translate==1.7.0
   google-cloud-videointelligence==1.16.1
   google-cloud-vision==1.0.0
   google-crc32c==1.1.2
   google-resumable-media==1.3.0
   googleapis-common-protos==1.53.0
   graphviz==0.16
   grpc-google-iam-v1==0.12.3
   grpcio==1.38.0
   grpcio-gcp==0.2.2
   gunicorn==19.10.0
   httplib2==0.19.1
   humanize==3.5.0
   idna==2.10
   importlib-metadata==1.7.0
   importlib-resources==1.5.0
   inflection==0.5.1
   iniconfig==1.1.1
   ipykernel==5.4.3
   ipython==7.23.1
   ipython-genutils==0.2.0
   iso8601==0.1.14
   isodate==0.6.0
   itsdangerous==1.1.0
   JayDeBeApi==1.2.3
   jedi==0.18.0
   Jinja2==2.11.3
   jmespath==0.10.0
   joblib==1.0.1
   JPype1==1.2.1
   jsonschema==3.2.0
   jupyter-client==6.1.12
   jupyter-core==4.7.1
   kiwisolver==1.3.1
   kombu==4.6.11
   lazy-object-proxy==1.6.0
   libcst==0.3.19
   lockfile==0.12.2
   Mako==1.1.4
   Markdown==3.3.4
   MarkupSafe==1.1.1
   marshmallow==3.12.1
   marshmallow-enum==1.5.1
   marshmallow-oneofschema==2.1.0
   marshmallow-sqlalchemy==0.23.1
   matplotlib==3.3.4
   matplotlib-inline==0.1.2
   msrest==0.6.21
   multidict==5.1.0
   mypy-extensions==0.4.3
   mysql-connector-python==8.0.22
   mysqlclient==1.3.14
   natsort==7.1.1
   nbclient==0.5.3
   nbformat==5.1.3
   nest-asyncio==1.5.1
   nteract-scrapbook==0.4.2
   numpy==1.20.3
   oauthlib==3.1.0
   openapi-schema-validator==0.1.5
   openapi-spec-validator==0.3.1
   oscrypto==1.2.1
   packaging==20.9
   pandas==1.2.4
   pandas-gbq==0.15.0
   papermill==2.3.3
   paramiko==2.7.2
   parso==0.8.2
   pathspec==0.8.1
   pendulum==2.1.2
   pexpect==4.8.0
   pickleshare==0.7.5
   Pillow==8.2.0
   prison==0.1.3
   prometheus-client==0.8.0
   prompt-toolkit==3.0.18
   proto-plus==1.18.1
   protobuf==3.17.1
   psutil==5.8.0
   psycopg2-binary==2.8.6
   ptyprocess==0.7.0
   py4j==0.10.9
   pyarrow==4.0.0
   pyasn1==0.4.8
   pyasn1-modules==0.2.8
   pycparser==2.20
   pycryptodomex==3.10.1
   pydata-google-auth==1.2.0
   Pygments==2.9.0
   PyJWT==1.7.1
   pymongo==3.11.4
   PyNaCl==1.4.0
   pyOpenSSL==20.0.1
   pyparsing==2.4.7
   pyrsistent==0.17.3
   pysftp==0.2.9
   python-daemon==2.3.0
   python-dateutil==2.8.1
   python-editor==1.0.4
   python-nvd3==0.15.0
   python-slugify==4.0.1
   python3-openid==3.2.0
   pytz==2021.1
   pytzdata==2020.1
   PyYAML==5.4.1
   pyzmq==22.1.0
   redis==3.5.3
   regex==2021.4.4
   requests==2.25.1
   requests-oauthlib==1.3.0
   rich==9.2.0
   rsa==4.7.2
   s3transfer==0.3.7
   scikit-learn==0.24.1
   scipy==1.6.3
   setproctitle==1.2.2
   simple-salesforce==1.11.1
   six==1.16.0
   slack-sdk==3.5.1
   snowflake-connector-python==2.4.3
   snowflake-sqlalchemy==1.2.4
   SQLAlchemy==1.3.23
   SQLAlchemy-JSONField==1.0.0
   SQLAlchemy-Utils==0.37.4
   sqlparse==0.4.1
   sshtunnel==0.1.5
   swagger-ui-bundle==0.0.8
   tableauserverclient==0.15.0
   tabulate==0.8.9
   tenacity==6.2.0
   termcolor==1.1.0
   text-unidecode==1.3
   textwrap3==0.9.2
   threadpoolctl==2.1.0
   toml==0.10.2
   tornado==6.1
   tqdm==4.61.0
   traitlets==5.0.5
   typed-ast==1.4.3
   typing-extensions==3.10.0.0
   typing-inspect==0.6.0
   unicodecsv==0.14.1
   uritemplate==3.0.1
   urllib3==1.25.11
   vine==1.3.0
   watchtower==0.7.3
   wcwidth==0.2.5
   Werkzeug==1.0.1
   WTForms==2.3.3
   yarl==1.6.3
   zipp==3.4.1
   ```
   
   pip freeze for 2.1.0:
   ```
   adal==1.2.7
   aiohttp==3.7.4.post0
   alembic==1.6.5
   amqp==2.6.1
   ansiwrap==0.8.4
   apache-airflow==2.1.0
   apache-airflow-providers-amazon==1.2.0
   apache-airflow-providers-celery==1.0.1
   apache-airflow-providers-databricks==1.0.1
   apache-airflow-providers-ftp==1.1.0
   apache-airflow-providers-google==1.0.0
   apache-airflow-providers-http==1.1.1
   apache-airflow-providers-imap==1.0.1
   apache-airflow-providers-jdbc==1.0.1
   apache-airflow-providers-mongo==1.0.1
   apache-airflow-providers-mysql==1.0.2
   apache-airflow-providers-papermill==1.0.2
   apache-airflow-providers-postgres==1.0.1
   apache-airflow-providers-redis==1.0.1
   apache-airflow-providers-salesforce==1.0.1
   apache-airflow-providers-slack==3.0.0
   apache-airflow-providers-snowflake==1.1.1
   apache-airflow-providers-sqlite==1.0.2
   apache-airflow-providers-ssh==1.2.0
   apispec==3.3.2
   appdirs==1.4.4
   argcomplete==1.12.3
   asn1crypto==1.4.0
   async-generator==1.10
   async-timeout==3.0.1
   attrs==20.3.0
   Authlib==0.15.3
   avro-python3==1.10.0
   azure-common==1.1.27
   azure-core==1.14.0
   azure-datalake-store==0.0.52
   azure-storage-blob==12.8.1
   Babel==2.9.1
   backcall==0.2.0
   bcrypt==3.2.0
   billiard==3.6.4.0
   black==21.5b1
   blinker==1.4
   boto3==1.15.18
   botocore==1.18.18
   cached-property==1.5.2
   cachetools==4.2.2
   cattrs==1.7.0
   celery==4.4.7
   Cerberus==1.3.2
   certifi==2020.12.5
   cffi==1.14.5
   chardet==3.0.4
   click==7.1.2
   clickclick==20.10.2
   colorama==0.4.4
   colorlog==5.0.1
   commonmark==0.9.1
   croniter==1.0.13
   cryptography==3.4.7
   cycler==0.10.0
   databricks-cli==0.14.3
   databricks-connect==7.3.8
   decorator==5.0.9
   defusedxml==0.7.1
   dill==0.3.3
   dnspython==1.16.0
   docutils==0.17.1
   email-validator==1.1.2
   entrypoints==0.3
   Flask==1.1.4
   Flask-AppBuilder==3.3.0
   Flask-Babel==1.0.0
   Flask-Bcrypt==0.7.1
   Flask-Caching==1.10.1
   Flask-JWT-Extended==3.25.1
   Flask-Login==0.4.1
   Flask-OpenID==1.2.5
   Flask-SQLAlchemy==2.5.1
   Flask-WTF==0.14.3
   flower==0.9.5
   fsspec==2021.5.0
   gcsfs==2021.5.0
   google-ads==7.0.0
   google-api-core==1.26.0
   google-api-python-client==1.12.8
   google-auth==1.27.0
   google-auth-httplib2==0.1.0
   google-auth-oauthlib==0.4.4
   google-cloud-automl==1.0.1
   google-cloud-bigquery==2.17.0
   google-cloud-bigquery-datatransfer==1.1.1
   google-cloud-bigquery-storage==2.4.0
   google-cloud-bigtable==1.7.0
   google-cloud-container==1.0.1
   google-cloud-core==1.6.0
   google-cloud-datacatalog==0.7.0
   google-cloud-dataproc==1.1.1
   google-cloud-dlp==1.0.0
   google-cloud-kms==1.4.0
   google-cloud-language==1.3.0
   google-cloud-logging==1.15.1
   google-cloud-memcache==0.3.0
   google-cloud-monitoring==1.1.0
   google-cloud-os-login==1.0.0
   google-cloud-pubsub==1.7.0
   google-cloud-redis==1.0.0
   google-cloud-secret-manager==1.0.0
   google-cloud-spanner==1.19.1
   google-cloud-speech==1.3.2
   google-cloud-storage==1.38.0
   google-cloud-tasks==1.5.0
   google-cloud-texttospeech==1.0.1
   google-cloud-translate==1.7.0
   google-cloud-videointelligence==1.16.1
   google-cloud-vision==1.0.0
   google-crc32c==1.1.2
   google-resumable-media==1.3.0
   googleapis-common-protos==1.53.0
   graphviz==0.16
   grpc-google-iam-v1==0.12.3
   grpcio==1.38.0
   grpcio-gcp==0.2.2
   gunicorn==20.1.0
   h11==0.12.0
   httpcore==0.13.3
   httplib2==0.19.1
   httpx==0.18.1
   humanize==3.5.0
   idna==2.10
   importlib-metadata==1.7.0
   importlib-resources==1.5.0
   inflection==0.5.1
   iniconfig==1.1.1
   ipykernel==5.4.3
   ipython==7.23.1
   ipython-genutils==0.2.0
   iso8601==0.1.14
   isodate==0.6.0
   itsdangerous==1.1.0
   JayDeBeApi==1.2.3
   jedi==0.18.0
   Jinja2==2.11.3
   jmespath==0.10.0
   joblib==1.0.1
   JPype1==1.2.1
   jsonschema==3.2.0
   jupyter-client==6.1.12
   jupyter-core==4.7.1
   kiwisolver==1.3.1
   kombu==4.6.11
   lazy-object-proxy==1.6.0
   libcst==0.3.19
   lockfile==0.12.2
   Mako==1.1.4
   Markdown==3.3.4
   MarkupSafe==1.1.1
   marshmallow==3.12.1
   marshmallow-enum==1.5.1
   marshmallow-oneofschema==2.1.0
   marshmallow-sqlalchemy==0.23.1
   matplotlib==3.3.4
   matplotlib-inline==0.1.2
   msrest==0.6.21
   multidict==5.1.0
   mypy-extensions==0.4.3
   mysql-connector-python==8.0.22
   mysqlclient==1.3.14
   nbclient==0.5.3
   nbformat==5.1.3
   nest-asyncio==1.5.1
   nteract-scrapbook==0.4.2
   numpy==1.20.3
   oauthlib==3.1.0
   openapi-schema-validator==0.1.5
   openapi-spec-validator==0.3.1
   oscrypto==1.2.1
   packaging==20.9
   pandas==1.2.4
   pandas-gbq==0.15.0
   papermill==2.3.3
   paramiko==2.7.2
   parso==0.8.2
   pathspec==0.8.1
   pendulum==2.1.2
   pexpect==4.8.0
   pickleshare==0.7.5
   Pillow==8.2.0
   prison==0.1.3
   prometheus-client==0.8.0
   prompt-toolkit==3.0.18
   proto-plus==1.18.1
   protobuf==3.17.1
   psutil==5.8.0
   psycopg2-binary==2.8.6
   ptyprocess==0.7.0
   py4j==0.10.9
   pyarrow==3.0.0
   pyasn1==0.4.8
   pyasn1-modules==0.2.8
   pycparser==2.20
   pycryptodomex==3.10.1
   pydata-google-auth==1.2.0
   Pygments==2.9.0
   PyJWT==1.7.1
   pymongo==3.11.4
   PyNaCl==1.4.0
   pyOpenSSL==20.0.1
   pyparsing==2.4.7
   pyrsistent==0.17.3
   pysftp==0.2.9
   python-daemon==2.3.0
   python-dateutil==2.8.1
   python-editor==1.0.4
   python-nvd3==0.15.0
   python-slugify==4.0.1
   python3-openid==3.2.0
   pytz==2021.1
   pytzdata==2020.1
   PyYAML==5.4.1
   pyzmq==22.1.0
   redis==3.5.3
   regex==2021.4.4
   requests==2.25.1
   requests-oauthlib==1.3.0
   rfc3986==1.5.0
   rich==10.2.2
   rsa==4.7.2
   s3transfer==0.3.7
   scikit-learn==0.24.1
   scipy==1.6.3
   setproctitle==1.2.2
   simple-salesforce==1.11.1
   six==1.16.0
   slack-sdk==3.5.1
   sniffio==1.2.0
   snowflake-connector-python==2.4.3
   snowflake-sqlalchemy==1.2.4
   SQLAlchemy==1.3.23
   SQLAlchemy-JSONField==1.0.0
   SQLAlchemy-Utils==0.37.4
   sqlparse==0.4.1
   sshtunnel==0.1.5
   swagger-ui-bundle==0.0.8
   tableauserverclient==0.15.0
   tabulate==0.8.9
   tenacity==6.2.0
   termcolor==1.1.0
   text-unidecode==1.3
   textwrap3==0.9.2
   threadpoolctl==2.1.0
   toml==0.10.2
   tornado==6.1
   tqdm==4.61.0
   traitlets==5.0.5
   typed-ast==1.4.3
   typing-extensions==3.10.0.0
   typing-inspect==0.6.0
   unicodecsv==0.14.1
   uritemplate==3.0.1
   urllib3==1.25.11
   vine==1.3.0
   watchtower==0.7.3
   wcwidth==0.2.5
   Werkzeug==1.0.1
   WTForms==2.3.3
   yarl==1.6.3
   zipp==3.4.1
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-851028624


   You have bad indentation in your last example:
   
   ```
       with open('/tmp/s3_hook.txt', 'w') as f:
           s3.Bucket(bucket).Object(key).download_file(f.name)
           logging.info(f"File downloaded: {f.name}")
   
           with open(f.name, 'r') as f_in:
               logging.info(f"FILE CONTENT {f_in.read()}")
   ```
   
   Should be:
   
   ```
       with open('/tmp/s3_hook.txt', 'w') as f:
           s3.Bucket(bucket).Object(key).download_file(f.name)
           logging.info(f"File downloaded: {f.name}")
   
       with open(f.name, 'r') as f_in:
           logging.info(f"FILE CONTENT {f_in.read()}")
   ```
   
   You are trying to read the file before it has been closed and flushed (which will happen when you exit "with" clause.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-851547784


   maybe dump all the environment variables in both worker run Python operator, and when you "exec" into the same container. This is the only way I can see those can differ. Also - just to check again - make sure you are using the same user/group as the workers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-851020750


   `f.seek(0)`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ConstantinoSchillebeeckx commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
ConstantinoSchillebeeckx commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-851029394


   I've updated the indentation and I'm still seeing the same behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-855281405


   > we should remove the "iterable" handling and just limit it to cases where those are explicit 'lists' of objects rather than Iterable objects ?
   
   Alternatively we can only handle instances implementing the [collection protocol](https://docs.python.org/3/library/collections.abc.html#collections.abc.Collection). An object implementing `__len__` and `__contains__` can very likely be iterated through multiple times.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-850854349


   > ````
   >         s3_hook.get_resource_type("s3").Bucket(bucket).Object(key).download_file(f.name)
   > ````
   
   These are not Airflow methods, but you are using the AWS library.  My impression is that you forgot to flush the data to disk (`f.flush()`),  so you can't read it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ConstantinoSchillebeeckx commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
ConstantinoSchillebeeckx commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-854295591


   > It may be related to: #15599
   
   I was also thinking it might be related to the `filter` in the logger; I tried specifying my [own logger class](https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-tasks.html#advanced-configuration) but could never get it to be picked up 😢 
   
   Is there a quick and dirty way I can reset the logger to maybe `basicConfig` within the DAG to test this out?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ConstantinoSchillebeeckx edited a comment on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
ConstantinoSchillebeeckx edited a comment on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-851543703


   > Are you running the manual part in the same instances/containers as Airflow ? 
   
   I am indeed; I spin up all my docker services (locally), and then login to the worker service to execute the small script shown above (and confirm that I can correctly download the S3 file).
   
   > I believe the reason is environmental (of where/how your workers are run) not Airflow itself.
   
   I'm not sure how this could be; I'm using the same `docker-compose.yaml` for both Airflow environments (hence the same environmental configuration) - the only change between them is the upgrade from 2.0.2 to 2.1.0 in my requirements.txt
   
   Thanks for the link to a possible solution; this one confuses me too as S3 buckets are specified globally (i.e. have no region).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-855285884


   Oh that's a good idea. An iterator's repr is extremely unlikely to consume it (and I'd argue it's a bug if it does).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-854297323


   Python does not require compilation, which means you can try to overwrite the file in image. 
   Here is example:
   https://github.com/mik-laj/airlfow-testing-example/blob/main/Dockerfile


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj closed issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
mik-laj closed issue #16148:
URL: https://github.com/apache/airflow/issues/16148


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ConstantinoSchillebeeckx commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
ConstantinoSchillebeeckx commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-851020065


   An update:
   1. Using `apache-airflow-providers-amazon==1.4.0`: didn't work 🔴 
   2. Using constraints-files: didn't work 🔴 
   3. Adding `f.flush()`: didn't work 🔴 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-855266414


   I think adding lists and primitive strings/sets/ tuples /lista dicts recurrently should be quite safe 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-855291949


   I am not sure _repr is a good idea. 
   
   If anything, then I guess we should use str() output rather than __str or __repr. I think it is more important what str() does (which is __str first and __repr.second). The str() is printable representation which is what format/log is likely to use.
   
   But none of this is perfect though without parsing the actual format string. %s is likely most used and it works str + repr fallback. %r/!r is __repr. So in order to do it 'correctly' we'd have to parse the formatting string. Which i think we should not do.
   
   In this case trying to only mask primitive objects and their lists/tuples/sets sounds like much better idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-855240855


   Thanks for being so persistent @ConstantinoSchillebeeckx . Great investigation! It's a very subtle problem it seems.
   
   I believe @mik-laj is right. We seem to have indeed a problem here introduced by #15599. In SecretMasker all "iterable" objects are converted into lists by the SecretMasker https://github.com/apache/airflow/pull/15599/files#diff-63a47f524bd31519c96c4f7376b864a77eedf8d6cf59ef6e5c86d575d53a74c6R221 . Unfortunately, the StreamingBody is iterable: https://github.com/boto/botocore/blob/87facf71d42fa6a17d22ff946d76d3ea1d01f561/botocore/response.py#L89 - and it will read all the content of the body while converting it. The original Debug statement works fine without the SecretMasker because it uses %s to format the object, which will not call __iter()__ method (it will just print the object + address).
   
   This would only show itself if (like you) you want to debug botocore and you set the level of logger of botocore to "Debug". That's why seting the "botocore.parser" level to "Critical" fixed the issue.
   
   I think there is no easy way to avoid similar problems and we should remove the "iterable" handling and just limit it to cases where those are explicit 'lists' of objects rather than Iterable objects ? Simply adding list to (set, tuple) above should do the job..
   
   @ashb @ephraimbuddy  - WDYT?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ConstantinoSchillebeeckx commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
ConstantinoSchillebeeckx commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-851543703


   > Are you running the manual part in the same instances/containers as Airflow ? 
   
   I am indeed; I spin up all my docker services (locally), and then login to the worker service to execute the small script shown below (and confirm that I can correctly download the S3 file).
   
   > I believe the reason is environmental (of where/how your workers are run) not Airflow itself.
   
   I'm not sure how this could be; I'm using the same `docker-compose.yaml` for both Airflow environments (hence the same environmental configuration) - the only change between them is the upgrade from 2.0.2 to 2.1.0 in my requirements.txt
   
   Thanks for the link to a possible solution; this one confuses me too as S3 buckets are specified globally (i.e. have no region).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ConstantinoSchillebeeckx commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
ConstantinoSchillebeeckx commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-852136842


   Another update: I've dumped all the environment variables when running the DAG (both in 2.0.2 and 2.1.0) and the values are the same in both instances. Furthermore, I've logged into the worker service (as the airflow same user) and found the environment variables to also be the same there. So I've got 3 sets of environment variables, all the same value, however only one set (the one associated with 2.1.0) produces erroneous behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-854278723






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-850715117


   @ConstantinoSchillebeeckx, can you try with the latest amazon provider, `apache-airflow-providers-amazon==1.4.0`?
   
   It seems like you aren't using constraints in your environment, which we highly suggest:
   https://airflow.apache.org/docs/apache-airflow/stable/installation.html#constraints-files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ConstantinoSchillebeeckx edited a comment on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
ConstantinoSchillebeeckx edited a comment on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-851023108


   > f.seek(0)?
   
   Didn't work either
   
   ---
   
   I was curious to see if I could take Airflow out of the equation, so I created a new virtual env and installed:
   ```
   pip install apache-airflow==2.1.0 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.1.0/constraints-3.7.txt"
   pip install apache-airflow-providers-amazon==1.4.0
   ```
   
   Then I executed the following script:
   ```python
   # -*- coding: utf-8 -*-
   import boto3
   
   def download_file_from_s3():
   
       s3 = boto3.resource('s3')
   
       bucket = 'secret-bucket'
       key = 'tmp.txt'
   
       with open('/tmp/s3_hook.txt', 'w') as f:
           s3.Bucket(bucket).Object(key).download_file(f.name)
           print(f"File downloaded: {f.name}")
   
   
           with open(f.name, 'r') as f_in:
               print(f"FILE CONTENT {f_in.read()}")
   
   
   download_file_from_s3()
   ```
   
   ![image](https://user-images.githubusercontent.com/8518288/120111201-a674cf80-c136-11eb-8d31-448d56aa7be3.png)
   
   So, `boto3` is not the issue here?
   
   Finally, as a sanity check, I updated the DAG to match the script above:
   ```python
   # -*- coding: utf-8 -*-
   import os
   import boto3
   import logging
   
   from airflow import DAG
   from airflow.operators.python import PythonOperator
   from airflow.utils.dates import days_ago
   from airflow.providers.amazon.aws.hooks.s3 import S3Hook
   
   
   def download_file_from_s3():
   
       # authed with ENVIRONMENT variables
       s3 = boto3.resource('s3')
   
       bucket = 'secret-bucket'
       key = 'tmp.txt'
   
       with open('/tmp/s3_hook.txt', 'w') as f:
           s3.Bucket(bucket).Object(key).download_file(f.name)
           logging.info(f"File downloaded: {f.name}")
   
           with open(f.name, 'r') as f_in:
               logging.info(f"FILE CONTENT {f_in.read()}")
   
   dag = DAG(
       "tmp",
       catchup=False,
       default_args={
           "start_date": days_ago(1),
       },
       schedule_interval=None,
   )
   
   download_file_from_s3ile = PythonOperator(
       task_id="download_file_from_s3ile", python_callable=download_file_from_s3, dag=dag
   )
   ```
   
   This resulted in the same, erroneous (empty downloaded file) behavior. 😢 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-855283667


   Should we mask the elements?  What do you think about using `repr` function and mask text representation of object?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ConstantinoSchillebeeckx commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
ConstantinoSchillebeeckx commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-854269174


   I've dug a bit more into this and found a few more clues. 
   
   [This line](https://github.com/boto/botocore/blob/3b87b23002d99df480b147252897b45fd10241a1/botocore/parsers.py#L234) in the `botocore` library logs the downloaded file response body at a DEBUG level (this is the same level I've been testing in). You'll notice in my logs above that that response is different between Airflow versions:
   
   **2.0.2**
   > [2021-05-28 10:26:05,403] {parsers.py:234} DEBUG - Response body:
   <botocore.response.StreamingBody object at 0x7f345310d090>
   
   **2.1.0**
   > [2021-05-28 10:36:15,879] {parsers.py:234} DEBUG - Response body:
   [[116, 101, 115, 116, 10]]
   
   I've done each of the following (independently) and it yielded a fix:
   * commented out the the line linked above
   * setting a different logging level on the specific logger: `logging.getLogger('botocore.parsers').setLevel(logging.CRITICAL)`
   
   Given that I'm using the same version of botocore (1.18.18), I'm wondering if there's something about the Airflow logger that's triggering the streaming of the [StreamingBody](https://github.com/boto/botocore/blob/23ee17f5446c78167ff442302471f9928c3b4b7c/botocore/response.py#L29) object during the logging statement (and thus resulting in an empty body when requesting the content again).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-855249587


   Yeah, looking at Iterables was a mistake since many are non-rewindable or have side effects such as this.
   
   I was trying to protect against secrets in object representations and data frames etc here, but that might just be a caveat we have to document.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ConstantinoSchillebeeckx commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
ConstantinoSchillebeeckx commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-854269174






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-855283667


   Should we mask the elements?  What do you think about using repr method and mask text representation of object?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-855291949


   I am not sure _repr is a good idea. 
   
   If anything, then I guess we should use str() output rather than __str or __repr. I think it is more important what str() does (which is __str first and __repr.second). The str() is printable representation which is what format/log uses. 
   
   But none of this is perfect though without parsing the actual format string. %s is likely most used and it works str + repr fallback. %r/!r is __repr. So in order to do it 'correctly' we'd have to parse the formatting string. Which i think we should not do.
   
   In this case trying to only mask primitive objects and their lists/tuples/sets sounds like much better idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-851028624


   You have bad indentation in your last example:
   
   ```
       with open('/tmp/s3_hook.txt', 'w') as f:
           s3.Bucket(bucket).Object(key).download_file(f.name)
           logging.info(f"File downloaded: {f.name}")
   
           with open(f.name, 'r') as f_in:
               logging.info(f"FILE CONTENT {f_in.read()}")
   ```
   
   Should be:
   
   ```
       with open('/tmp/s3_hook.txt', 'w') as f:
           s3.Bucket(bucket).Object(key).download_file(f.name)
           logging.info(f"File downloaded: {f.name}")
   
       with open(f.name, 'r') as f_in:
           logging.info(f"FILE CONTENT {f_in.read()}")
   ```
   
   You are trying to read the file before it has been closed and flushed (which will happen when you exit "with" clause).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-851033394


   I believe the reason is environmental (of where/how your workers are run) not Airflow itself.
   
   Are you running the manual part in the same instances/containers as Airflow ? If not can you run them there? I can imagine it might come from bad configuration of your service account/env variables used to authenticate to S3 or some configuration of your bucket (for example buckets can have configuration allowing downloading data only from specific IP addresses).
   
   There is also similar problem: https://github.com/sous-chefs/aws/issues/317 where the problem is (apparently) caused by connecting from instance in a different region than the bucket. You need to try to run the code in the same machine, same user and the same environment variables that your airflow user is run with. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #16148:
URL: https://github.com/apache/airflow/issues/16148


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-854278723


   It may be related to: https://github.com/apache/airflow/pull/15599
   CC: @ashb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-855240855


   Thanks for being so persistent @ConstantinoSchillebeeckx . Great investigation! It's a very subtle problem it seems.
   
   I believe @mik-laj is right. We seem to have indeed a problem here introduced by #15599. In SecretMasker all "iterable" objects are converted into lists by the SecretMasker https://github.com/apache/airflow/pull/15599/files#diff-63a47f524bd31519c96c4f7376b864a77eedf8d6cf59ef6e5c86d575d53a74c6R221 . Unfortunately, the StreamingBody is iterable: https://github.com/boto/botocore/blob/87facf71d42fa6a17d22ff946d76d3ea1d01f561/botocore/response.py#L89 - and it will read all the content of the body while converting it. The original Debug statement works fine because it uses %s to format the object, which will not call __iter()__ method (it will just print the object + address).
   
   This would only show itself if (like you) you want to debug botocore and you set the level of logger of botocore to "Debug". That's wny seting the "botocore.parser" level to "Critical" fixed the issue.
   
   I think there is no easy way to avoid similar problems and we should remove the "iterable" handling and just limit it to cases where those are explicit 'lists' of objects rather than Iterable objects ? Simply adding list to (set, tuple) above should do the job..
   
   @ashb @ephraimbuddy  - WDYT?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-855283667


   Should we mask the elements?  What do you think about using repr method and mask the result?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-855240855


   Thanks for being so persistent @ConstantinoSchillebeeckx . Great investigation! It's a very subtle problem it seems.
   
   I believe @mik-laj is right. We seem to have indeed a problem here introduced by #15599. In SecretMasker all "iterable" objects are converted into lists by the SecretMasker https://github.com/apache/airflow/pull/15599/files#diff-63a47f524bd31519c96c4f7376b864a77eedf8d6cf59ef6e5c86d575d53a74c6R221 . Unfortunately, the StreamingBody is iterable: https://github.com/boto/botocore/blob/87facf71d42fa6a17d22ff946d76d3ea1d01f561/botocore/response.py#L89 - and it will read all the content of the body while converting it. The original Debug statement works fine without the SecretMasker because it uses %s to format the object, which will not call __iter()__ method (it will just print the object + address).
   
   This would only show itself if (like you) you want to debug botocore and you set the level of logger of botocore to "Debug". That's wny seting the "botocore.parser" level to "Critical" fixed the issue.
   
   I think there is no easy way to avoid similar problems and we should remove the "iterable" handling and just limit it to cases where those are explicit 'lists' of objects rather than Iterable objects ? Simply adding list to (set, tuple) above should do the job..
   
   @ashb @ephraimbuddy  - WDYT?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ConstantinoSchillebeeckx commented on issue #16148: Downloading files from S3 broken in 2.1.0

Posted by GitBox <gi...@apache.org>.
ConstantinoSchillebeeckx commented on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-851023108


   > f.seek(0)?
   
   Didn't work either
   
   ---
   
   I was curious to see if I could take Airflow out of the equation, so I created a new virtual env and installed:
   ```
   pip install apache-airflow==2.1.0 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.1.0/constraints-3.7.txt"
   pip install apache-airflow-providers-amazon==1.4.0
   ```
   
   Then I executed the following script:
   ```python
   # -*- coding: utf-8 -*-
   import boto3
   
   def download_file_from_s3():
   
       s3 = boto3.resource('s3')
   
       bucket = 'mindstrong-dwh-processing'
       key = 'tmp.txt'
   
       with open('/tmp/s3_hook.txt', 'w') as f:
           s3.Bucket(bucket).Object(key).download_file(f.name)
           print(f"File downloaded: {f.name}")
   
   
           with open(f.name, 'r') as f_in:
               print(f"FILE CONTENT {f_in.read()}")
   
   
   download_file_from_s3()
   ```
   
   ![image](https://user-images.githubusercontent.com/8518288/120111201-a674cf80-c136-11eb-8d31-448d56aa7be3.png)
   
   So, `boto3` is not the issue here?
   
   Finally, as a sanity check, I updated the DAG to match the script above:
   ```python
   # -*- coding: utf-8 -*-
   import os
   import boto3
   import logging
   
   from airflow import DAG
   from airflow.operators.python import PythonOperator
   from airflow.utils.dates import days_ago
   from airflow.providers.amazon.aws.hooks.s3 import S3Hook
   
   
   def download_file_from_s3():
   
       # authed with ENVIRONMENT variables
       s3 = boto3.resource('s3')
   
       bucket = 'mindstrong-dwh-processing'
       key = 'tmp.txt'
   
       with open('/tmp/s3_hook.txt', 'w') as f:
           s3.Bucket(bucket).Object(key).download_file(f.name)
           logging.info(f"File downloaded: {f.name}")
   
           with open(f.name, 'r') as f_in:
               logging.info(f"FILE CONTENT {f_in.read()}")
   
   dag = DAG(
       "tmp",
       catchup=False,
       default_args={
           "start_date": days_ago(1),
       },
       schedule_interval=None,
   )
   
   download_file_from_s3ile = PythonOperator(
       task_id="download_file_from_s3ile", python_callable=download_file_from_s3, dag=dag
   )
   ```
   
   This resulted in the same, erroneous (empty downloaded file) behavior. 😢 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org