You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Fokko Driesprong (JIRA)" <ji...@apache.org> on 2018/05/08 09:53:00 UTC

[jira] [Resolved] (AIRFLOW-2412) Fix HiveCliHook.load_file to address HIVE-10541

     [ https://issues.apache.org/jira/browse/AIRFLOW-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fokko Driesprong resolved AIRFLOW-2412.
---------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0
                   1.10.0

Issue resolved by pull request #3327
[https://github.com/apache/incubator-airflow/pull/3327]

> Fix HiveCliHook.load_file to address HIVE-10541
> -----------------------------------------------
>
>                 Key: AIRFLOW-2412
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2412
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hive_hooks, hooks
>            Reporter: Kengo Seki
>            Assignee: Kengo Seki
>            Priority: Major
>             Fix For: 1.10.0, 2.0.0
>
>
> HiveCliHook.load_file generates a query file and executes it using {{-f}} option, but that file doesn't have a newline at the end. In such case, beeline bundled Hive under 1.3 doesn't execute the last query due to [a bug|https://issues.apache.org/jira/browse/HIVE-10541]. Example:
> register connection and prepare file to be loaded:
> {code}
> $ airflow connections -a --conn_id hive_cli --conn_type hive_cli --conn_host localhost --conn_port 10000 --conn_schema default --conn_extra '{"use_beeline": true, "auth": "none"}'
> [2018-05-02 18:38:48,208] {__init__.py:48} INFO - Using executor SequentialExecutor
>         Successfully added `conn_id`=hive_cli : hive_cli://:@localhost:10000/default
> $ cat /tmp/t
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> {code}
> executing load_file via ipython:
> {code}
> In [1]: from airflow.hooks.hive_hooks import HiveCliHook
> In [2]: hook = HiveCliHook("hive_cli")
> [2018-05-02 18:50:42,161] {base_hook.py:85} INFO - Using connection to: localhost
> In [3]: hook.load_file(field_dict={"c": "int"}, filepath="/tmp/t", table="foo")
> (snip)
> [2018-05-02 18:51:06,043] {hive_hooks.py:216} INFO - beeline -u jdbc:hive2://localhost:10000/default;auth=none -f /tmp/airflow_hiveop_75jxXU/tmpmvhi0M
> [2018-05-02 18:51:07,397] {hive_hooks.py:231} INFO - Connecting to jdbc:hive2://localhost:10000/default;auth=none
> [2018-05-02 18:51:07,598] {hive_hooks.py:231} INFO - Connected to: Apache Hive (version 1.2.1)
> [2018-05-02 18:51:07,600] {hive_hooks.py:231} INFO - Driver: Hive JDBC (version 1.2.1)
> [2018-05-02 18:51:07,600] {hive_hooks.py:231} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ
> [2018-05-02 18:51:07,644] {hive_hooks.py:231} INFO - 0: jdbc:hive2://localhost:10000/default> USE default;
> [2018-05-02 18:51:07,749] {hive_hooks.py:231} INFO - No rows affected (0.104 seconds)
> [2018-05-02 18:51:07,773] {hive_hooks.py:231} INFO - 0: jdbc:hive2://localhost:10000/defTABLE fooD DATA LOCAL INPATH '/tmp/t' OVERWRITE INTO
> [2018-05-02 18:51:07,773] {hive_hooks.py:231} INFO - Closing: 0: jdbc:hive2://localhost:10000/default;auth=none
> {code}
> Hive table is created, but no data is loaded:
> {code}
> 0: jdbc:hive2://localhost:10000/default> SHOW TABLES;
> +-----------+--+
> | tab_name  |
> +-----------+--+
> | foo       |
> +-----------+--+
> 1 row selected (0.037 seconds)
> 0: jdbc:hive2://localhost:10000/default> SELECT * FROM foo;
> +--------+--+
> | foo.c  |
> +--------+--+
> +--------+--+
> No rows selected (0.1 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)