You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Edward Wells (JIRA)" <ji...@apache.org> on 2018/03/05 02:41:00 UTC

[jira] [Commented] (AIRFLOW-2045) S3FileTransformOperator doesn't adhere to current boto3 API

    [ https://issues.apache.org/jira/browse/AIRFLOW-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385505#comment-16385505 ] 

Edward Wells commented on AIRFLOW-2045:
---------------------------------------

[~zrubenstein] I was able to fix it by using {{download_file}} and passing the temporary file path- if you use {{download_fileobj}} it causes an error because it expects a file path string instead of a file object somewhere down the line.

It also makes calls to {{connection.close()}} but S3Hook doesn't have a connection attribute, and the S3 object returned by {{get_conn()}} doesn't have a {{close()}} method.

Downgrading boto3 to try to get to a version that has the {{get_contents_to_file}} method also causes an error in {{S3Hook}} because it expects a method that doesn't show up until boto3>=1.3

I'm not sure if there are going to be any bad side-effects from removing those {{close()}} calls on the connections entirely, I haven't had a chance to dig into it enough and don't have time right now to do enough testing to open a PR, but here's a link to my version with the fix so you can at least see what I'm talking about and maybe test it out:

[https://github.com/arcward/incubator-airflow/tree/AIRFLOW-2045]

 

Offending file: [https://github.com/arcward/incubator-airflow/blob/AIRFLOW-2045/airflow/operators/s3_file_transform_operator.py]

 

> S3FileTransformOperator doesn't adhere to current boto3 API
> -----------------------------------------------------------
>
>                 Key: AIRFLOW-2045
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2045
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: aws, boto3
>    Affects Versions: 1.9.1
>            Reporter: Zachary Rubenstein
>            Priority: Major
>
> When using S3FileTransformOperator with boto3 >= 1.5.0, I get an error:
>  
>  Traceback (most recent call last):
>  File "/home/airflow/data/airflow/.venv/bin/airflow", line 27, in <module>
>  args.func(args)
>  File "/home/airflow/data/airflow/.venv/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 528, in test
>  ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
>  File "/home/airflow/data/airflow/.venv/local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper
>  result = func(*args, **kwargs)
>  File "/home/airflow/data/airflow/.venv/local/lib/python2.7/site-packages/airflow/models.py", line 1584, in run
>  session=session)
>  File "/home/airflow/data/airflow/.venv/local/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper
>  result = func(*args, **kwargs)
>  File "/home/airflow/data/airflow/.venv/local/lib/python2.7/site-packages/airflow/models.py", line 1493, in _run_raw_task
>  result = task_copy.execute(context=context)
>  File "/home/airflow/data/airflow/.venv/local/lib/python2.7/site-packages/airflow/operators/s3_file_transform_operator.py", line 86, in execute
>  source_s3_key_object.get_contents_to_file(f_source)
> AttributeError: 's3.Object' object has no attribute 'get_contents_to_file'
> I believe the method has been renamed to download_fileobj (http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Object.download_fileobj)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)