You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/05/31 14:46:20 UTC

[GitHub] [airflow] ConstantinoSchillebeeckx edited a comment on issue #16148: Downloading files from S3 broken in 2.1.0

ConstantinoSchillebeeckx edited a comment on issue #16148:
URL: https://github.com/apache/airflow/issues/16148#issuecomment-851023108


   > f.seek(0)?
   
   Didn't work either
   
   ---
   
   I was curious to see if I could take Airflow out of the equation, so I created a new virtual env and installed:
   ```
   pip install apache-airflow==2.1.0 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.1.0/constraints-3.7.txt"
   pip install apache-airflow-providers-amazon==1.4.0
   ```
   
   Then I executed the following script:
   ```python
   # -*- coding: utf-8 -*-
   import boto3
   
   def download_file_from_s3():
   
       s3 = boto3.resource('s3')
   
       bucket = 'secret-bucket'
       key = 'tmp.txt'
   
       with open('/tmp/s3_hook.txt', 'w') as f:
           s3.Bucket(bucket).Object(key).download_file(f.name)
           print(f"File downloaded: {f.name}")
   
   
           with open(f.name, 'r') as f_in:
               print(f"FILE CONTENT {f_in.read()}")
   
   
   download_file_from_s3()
   ```
   
   ![image](https://user-images.githubusercontent.com/8518288/120111201-a674cf80-c136-11eb-8d31-448d56aa7be3.png)
   
   So, `boto3` is not the issue here?
   
   Finally, as a sanity check, I updated the DAG to match the script above:
   ```python
   # -*- coding: utf-8 -*-
   import os
   import boto3
   import logging
   
   from airflow import DAG
   from airflow.operators.python import PythonOperator
   from airflow.utils.dates import days_ago
   from airflow.providers.amazon.aws.hooks.s3 import S3Hook
   
   
   def download_file_from_s3():
   
       # authed with ENVIRONMENT variables
       s3 = boto3.resource('s3')
   
       bucket = 'secret-bucket'
       key = 'tmp.txt'
   
       with open('/tmp/s3_hook.txt', 'w') as f:
           s3.Bucket(bucket).Object(key).download_file(f.name)
           logging.info(f"File downloaded: {f.name}")
   
           with open(f.name, 'r') as f_in:
               logging.info(f"FILE CONTENT {f_in.read()}")
   
   dag = DAG(
       "tmp",
       catchup=False,
       default_args={
           "start_date": days_ago(1),
       },
       schedule_interval=None,
   )
   
   download_file_from_s3ile = PythonOperator(
       task_id="download_file_from_s3ile", python_callable=download_file_from_s3, dag=dag
   )
   ```
   
   This resulted in the same, erroneous (empty downloaded file) behavior. 😢 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org