You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/05/19 16:02:00 UTC
[jira] [Commented] (AIRFLOW-2448) Enhance HiveCliHook.load_df to
work with datetime
[ https://issues.apache.org/jira/browse/AIRFLOW-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481681#comment-16481681 ]
ASF subversion and git services commented on AIRFLOW-2448:
----------------------------------------------------------
Commit 67b351183b0f85e9484f1f7f70e0b46300753b60 in incubator-airflow's branch refs/heads/master from [~sekikn]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=67b3511 ]
[AIRFLOW-2448] Enhance HiveCliHook.load_df to work with datetime
HiveCliHook.load_df can not handle DataFrame
which contains datetime for now.
This PR enhances it to work with datetime,
fixes some bug introduced by AIRFLOW-2441,
and addresses some flake8 issues.
Closes #3364 from sekikn/AIRFLOW-2448
> Enhance HiveCliHook.load_df to work with datetime
> -------------------------------------------------
>
> Key: AIRFLOW-2448
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2448
> Project: Apache Airflow
> Issue Type: Improvement
> Components: hive_hooks, hooks
> Reporter: Kengo Seki
> Assignee: Kengo Seki
> Priority: Major
> Fix For: 2.0.0
>
>
> I tried to load DataFrame which contains time-series data into Hive via HiveCliHook.load_df, but it failed:
> {code}
> In [1]: import pandas as pd
> In [2]: from datetime import datetime, timedelta
> In [3]: df = pd.DataFrame({"t": [datetime(2018, 1, 1) + timedelta(i) for i in range(0, 10)], "v": range(0, 10)})
> In [4]: df
> Out[4]:
> t v
> 0 2018-01-01 0
> 1 2018-01-02 1
> 2 2018-01-03 2
> 3 2018-01-04 3
> 4 2018-01-05 4
> 5 2018-01-06 5
> 6 2018-01-07 6
> 7 2018-01-08 7
> 8 2018-01-09 8
> 9 2018-01-10 9
> In [5]: from airflow.hooks.hive_hooks import HiveCliHook
> In [6]: hook = HiveCliHook()
> [2018-05-10 10:29:40,600] {base_hook.py:85} INFO - Using connection to: localhost
> In [7]: hook.load_df(df, "ts")
> ---------------------------------------------------------------------------
> KeyError Traceback (most recent call last)
> <ipython-input-7-7a7e58740159> in <module>()
> ----> 1 hook.load_df(df, "ts")
> /home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in load_df(self, df, table, create, recreate, field_dict, delimiter, encoding, pandas_kwargs, **kwargs)
> 335
> 336 if field_dict is None and (create or recreate):
> --> 337 field_dict = _infer_field_types_from_df(df)
> 338
> 339 df.to_csv(path_or_buf=f,
> /home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in _infer_field_types_from_df(df)
> 326 }
> 327
> --> 328 return dict((col, DTYPE_KIND_HIVE_TYPE[dtype.kind]) for col, dtype in df.dtypes.iteritems())
> 329
> 330 if pandas_kwargs is None:
> /home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in <genexpr>((col, dtype))
> 326 }
> 327
> --> 328 return dict((col, DTYPE_KIND_HIVE_TYPE[dtype.kind]) for col, dtype in df.dtypes.iteritems())
> 329
> 330 if pandas_kwargs is None:
> KeyError: 'M'
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)