You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kaxil Naik (JIRA)" <ji...@apache.org> on 2018/05/19 16:02:00 UTC

[jira] [Resolved] (AIRFLOW-2448) Enhance HiveCliHook.load_df to work with datetime

     [ https://issues.apache.org/jira/browse/AIRFLOW-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kaxil Naik resolved AIRFLOW-2448.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

Issue resolved by pull request #3364
[https://github.com/apache/incubator-airflow/pull/3364]

> Enhance HiveCliHook.load_df to work with datetime
> -------------------------------------------------
>
>                 Key: AIRFLOW-2448
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2448
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hive_hooks, hooks
>            Reporter: Kengo Seki
>            Assignee: Kengo Seki
>            Priority: Major
>             Fix For: 2.0.0
>
>
> I tried to load DataFrame which contains time-series data into Hive via HiveCliHook.load_df, but it failed:
> {code}
> In [1]: import pandas as pd
> In [2]: from datetime import datetime, timedelta
> In [3]: df = pd.DataFrame({"t": [datetime(2018, 1, 1) + timedelta(i) for i in range(0, 10)], "v": range(0, 10)})
> In [4]: df
> Out[4]: 
>            t  v
> 0 2018-01-01  0
> 1 2018-01-02  1
> 2 2018-01-03  2
> 3 2018-01-04  3
> 4 2018-01-05  4
> 5 2018-01-06  5
> 6 2018-01-07  6
> 7 2018-01-08  7
> 8 2018-01-09  8
> 9 2018-01-10  9
> In [5]: from airflow.hooks.hive_hooks import HiveCliHook
> In [6]: hook = HiveCliHook()
> [2018-05-10 10:29:40,600] {base_hook.py:85} INFO - Using connection to: localhost
> In [7]: hook.load_df(df, "ts")
> ---------------------------------------------------------------------------
> KeyError                                  Traceback (most recent call last)
> <ipython-input-7-7a7e58740159> in <module>()
> ----> 1 hook.load_df(df, "ts")
> /home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in load_df(self, df, table, create, recreate, field_dict, delimiter, encoding, pandas_kwargs, **kwargs)
>     335 
>     336                 if field_dict is None and (create or recreate):
> --> 337                     field_dict = _infer_field_types_from_df(df)
>     338 
>     339                 df.to_csv(path_or_buf=f,
> /home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in _infer_field_types_from_df(df)
>     326             }
>     327 
> --> 328             return dict((col, DTYPE_KIND_HIVE_TYPE[dtype.kind]) for col, dtype in df.dtypes.iteritems())
>     329 
>     330         if pandas_kwargs is None:
> /home/sekikn/dev/incubator-airflow/airflow/hooks/hive_hooks.pyc in <genexpr>((col, dtype))
>     326             }
>     327 
> --> 328             return dict((col, DTYPE_KIND_HIVE_TYPE[dtype.kind]) for col, dtype in df.dtypes.iteritems())
>     329 
>     330         if pandas_kwargs is None:
> KeyError: 'M'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)