You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by "jack (Jira)" <ji...@apache.org> on 2019/11/21 18:26:00 UTC

[jira] [Commented] (AIRFLOW-3185) Add chunking to DBAPI_hook by implementing fetchmany and pandas chunksize

    [ https://issues.apache.org/jira/browse/AIRFLOW-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979489#comment-16979489 ] 

jack commented on AIRFLOW-3185:
-------------------------------

[~tomanizer] do you have a final version to PR?

> Add chunking to DBAPI_hook by implementing fetchmany and pandas chunksize
> -------------------------------------------------------------------------
>
>                 Key: AIRFLOW-3185
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3185
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hooks
>    Affects Versions: 1.10.0
>            Reporter: Thomas Haederle
>            Assignee: Thomas Haederle
>            Priority: Minor
>              Labels: easyfix
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> DbApiHook currently implements get_records and get_pandas_df, where both methods fetch all records into memory.
> We should implement two new methods which return a generator with a configurable chunksize:
> - def get_many_records(self, sql, parameters=None, chunksize=20, iterate_singles=False):
> - def get_pandas_df_chunks(self, sql, parameters=None, chunksize=20)
> this should work for all DB hooks which inherit from this class.
> We could also adapt existing methods, but that could be problematic because these methods will return a generator whereas the others return either records or dataframes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)