You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kengo Seki (JIRA)" <ji...@apache.org> on 2018/05/10 14:35:00 UTC
[jira] [Created] (AIRFLOW-2448) Enhance HiveCliHook.load_df to work
with datetime
Kengo Seki created AIRFLOW-2448:
-----------------------------------
Summary: Enhance HiveCliHook.load_df to work with datetime
Key: AIRFLOW-2448
URL: https://issues.apache.org/jira/browse/AIRFLOW-2448
Project: Apache Airflow
Issue Type: Improvement
Components: hive_hooks, hooks
Reporter: Kengo Seki
Assignee: Kengo Seki
I tried to load DataFrame which contains time-series data into Hive via HiveCliHook.load_df, but it failed:
{code}
In [1]: import pandas as pd
In [2]: from datetime import datetime, timedelta
In [3]: df = pd.DataFrame({"t": [datetime(2018, 1, 1) + timedelta(i) for i in range(0, 10)], "v": range(0, 10)})
In [4]: df
Out[4]:
t v
0 2018-01-01 0
1 2018-01-02 1
2 2018-01-03 2
3 2018-01-04 3
4 2018-01-05 4
5 2018-01-06 5
6 2018-01-07 6
7 2018-01-08 7
8 2018-01-09 8
9 2018-01-10 9
In [5]: from airflow.hooks.hive_hooks import HiveCliHook
In [6]: hook = HiveCliHook()
[2018-05-10 10:29:40,600] {base_hook.py:85} INFO - Using connection to: localhost
In [7]: hook.load_df(df, "ts")
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-7-7a7e58740159> in <module>()
----> 1 hook.load_df(df, "ts")
/home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in load_df(self, df, table, create, recreate, field_dict, delimiter, encoding, pandas_kwargs, **kwargs)
335
336 if field_dict is None and (create or recreate):
--> 337 field_dict = _infer_field_types_from_df(df)
338
339 df.to_csv(path_or_buf=f,
/home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in _infer_field_types_from_df(df)
326 }
327
--> 328 return dict((col, DTYPE_KIND_HIVE_TYPE[dtype.kind]) for col, dtype in df.dtypes.iteritems())
329
330 if pandas_kwargs is None:
/home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in <genexpr>((col, dtype))
326 }
327
--> 328 return dict((col, DTYPE_KIND_HIVE_TYPE[dtype.kind]) for col, dtype in df.dtypes.iteritems())
329
330 if pandas_kwargs is None:
KeyError: 'M'
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)