You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/08/05 10:10:00 UTC

[jira] [Commented] (AIRFLOW-5072) gcs_hook causes out-of-memory error when downloading huge files

    [ https://issues.apache.org/jira/browse/AIRFLOW-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899967#comment-16899967 ] 

ASF GitHub Bot commented on AIRFLOW-5072:
-----------------------------------------

tkaymak commented on pull request #5685: [AIRFLOW-5072] gcs_hook's download() method should download only once
URL: https://github.com/apache/airflow/pull/5685
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> gcs_hook causes out-of-memory error when downloading huge files
> ---------------------------------------------------------------
>
>                 Key: AIRFLOW-5072
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5072
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: gcp
>    Affects Versions: 1.10.3
>            Reporter: Tobias Kaymak
>            Assignee: Tobias Kaymak
>            Priority: Major
>
> Possibly there is an "else" missing here, but the gcs_hook's `download` method *always* downloads a blob as a string even when a filename was supplied. This causes the method to take twice as long when a filename is supplied and for huge blobs it can even cause out-of-memory errors.
> I think that there is an else missing?
> [https://github.com/apache/airflow/blob/05c01a97497e992c7d8b05a39a7855343dee1603/airflow/contrib/hooks/gcs_hook.py#L176]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)