You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kengo Seki (JIRA)" <ji...@apache.org> on 2018/05/10 14:35:00 UTC

[jira] [Created] (AIRFLOW-2448) Enhance HiveCliHook.load_df to work with datetime

Kengo Seki created AIRFLOW-2448:
-----------------------------------

             Summary: Enhance HiveCliHook.load_df to work with datetime
                 Key: AIRFLOW-2448
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2448
             Project: Apache Airflow
          Issue Type: Improvement
          Components: hive_hooks, hooks
            Reporter: Kengo Seki
            Assignee: Kengo Seki


I tried to load DataFrame which contains time-series data into Hive via HiveCliHook.load_df, but it failed:

{code}
In [1]: import pandas as pd

In [2]: from datetime import datetime, timedelta

In [3]: df = pd.DataFrame({"t": [datetime(2018, 1, 1) + timedelta(i) for i in range(0, 10)], "v": range(0, 10)})

In [4]: df
Out[4]: 
           t  v
0 2018-01-01  0
1 2018-01-02  1
2 2018-01-03  2
3 2018-01-04  3
4 2018-01-05  4
5 2018-01-06  5
6 2018-01-07  6
7 2018-01-08  7
8 2018-01-09  8
9 2018-01-10  9

In [5]: from airflow.hooks.hive_hooks import HiveCliHook

In [6]: hook = HiveCliHook()
[2018-05-10 10:29:40,600] {base_hook.py:85} INFO - Using connection to: localhost

In [7]: hook.load_df(df, "ts")
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-7-7a7e58740159> in <module>()
----> 1 hook.load_df(df, "ts")

/home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in load_df(self, df, table, create, recreate, field_dict, delimiter, encoding, pandas_kwargs, **kwargs)
    335 
    336                 if field_dict is None and (create or recreate):
--> 337                     field_dict = _infer_field_types_from_df(df)
    338 
    339                 df.to_csv(path_or_buf=f,

/home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in _infer_field_types_from_df(df)
    326             }
    327 
--> 328             return dict((col, DTYPE_KIND_HIVE_TYPE[dtype.kind]) for col, dtype in df.dtypes.iteritems())
    329 
    330         if pandas_kwargs is None:

/home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in <genexpr>((col, dtype))
    326             }
    327 
--> 328             return dict((col, DTYPE_KIND_HIVE_TYPE[dtype.kind]) for col, dtype in df.dtypes.iteritems())
    329 
    330         if pandas_kwargs is None:

KeyError: 'M'
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)