You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kengo Seki (JIRA)" <ji...@apache.org> on 2018/05/03 03:11:00 UTC

[jira] [Created] (AIRFLOW-2412) Fix HiveCliHook.load_file to address HIVE-10541

Kengo Seki created AIRFLOW-2412:
-----------------------------------

             Summary: Fix HiveCliHook.load_file to address HIVE-10541
                 Key: AIRFLOW-2412
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2412
             Project: Apache Airflow
          Issue Type: Improvement
          Components: hive_hooks, hooks
            Reporter: Kengo Seki
            Assignee: Kengo Seki


HiveCliHook.load_file generates a query file and executes it using {{-f}} option, but that file doesn't have a newline at the end. In such case, beeline bundled Hive under 1.3 doesn't execute the last query due to [a bug|https://issues.apache.org/jira/browse/HIVE-10541]. Example:

register connection and prepare file to be loaded:
{code}
$ airflow connections -a --conn_id hive_cli --conn_type hive_cli --conn_host localhost --conn_port 10000 --conn_schema default --conn_extra '{"use_beeline": true, "auth": "none"}'
[2018-05-02 18:38:48,208] {__init__.py:48} INFO - Using executor SequentialExecutor

        Successfully added `conn_id`=hive_cli : hive_cli://:@localhost:10000/default

$ cat /tmp/t
0
1
2
3
4
5
6
7
8
9
{code}

executing load_file via ipython:
{code}
In [1]: from airflow.hooks.hive_hooks import HiveCliHook

In [2]: hook = HiveCliHook("hive_cli")
[2018-05-02 18:50:42,161] {base_hook.py:85} INFO - Using connection to: localhost

In [3]: hook.load_file(field_dict={"c": "int"}, filepath="/tmp/t", table="foo")

(snip)

[2018-05-02 18:51:06,043] {hive_hooks.py:216} INFO - beeline -u jdbc:hive2://localhost:10000/default;auth=none -f /tmp/airflow_hiveop_75jxXU/tmpmvhi0M
[2018-05-02 18:51:07,397] {hive_hooks.py:231} INFO - Connecting to jdbc:hive2://localhost:10000/default;auth=none
[2018-05-02 18:51:07,598] {hive_hooks.py:231} INFO - Connected to: Apache Hive (version 1.2.1)
[2018-05-02 18:51:07,600] {hive_hooks.py:231} INFO - Driver: Hive JDBC (version 1.2.1)
[2018-05-02 18:51:07,600] {hive_hooks.py:231} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ
[2018-05-02 18:51:07,644] {hive_hooks.py:231} INFO - 0: jdbc:hive2://localhost:10000/default> USE default;
[2018-05-02 18:51:07,749] {hive_hooks.py:231} INFO - No rows affected (0.104 seconds)
[2018-05-02 18:51:07,773] {hive_hooks.py:231} INFO - 0: jdbc:hive2://localhost:10000/defTABLE fooD DATA LOCAL INPATH '/tmp/t' OVERWRITE INTO
[2018-05-02 18:51:07,773] {hive_hooks.py:231} INFO - Closing: 0: jdbc:hive2://localhost:10000/default;auth=none
{code}

Hive table is created, but no data is loaded:
{code}
0: jdbc:hive2://localhost:10000/default> SHOW TABLES;
+-----------+--+
| tab_name  |
+-----------+--+
| foo       |
+-----------+--+
1 row selected (0.037 seconds)
0: jdbc:hive2://localhost:10000/default> SELECT * FROM foo;
+--------+--+
| foo.c  |
+--------+--+
+--------+--+
No rows selected (0.1 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)