You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kengo Seki (JIRA)" <ji...@apache.org> on 2018/05/21 16:36:00 UTC

[jira] [Created] (AIRFLOW-2500) Fix MySqlToHiveTransfer to transfer unsigned type properly

Kengo Seki created AIRFLOW-2500:
-----------------------------------

             Summary: Fix MySqlToHiveTransfer to transfer unsigned type properly
                 Key: AIRFLOW-2500
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2500
             Project: Apache Airflow
          Issue Type: Bug
          Components: operators
            Reporter: Kengo Seki
            Assignee: Kengo Seki


Given the following table,

{code}
mysql> USE airflow_ci
Database changed
mysql> DESC users;
+-------+------------------+------+-----+---------+-------+
| Field | Type             | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| id    | int(10) unsigned | YES  |     | NULL    |       |
+-------+------------------+------+-----+---------+-------+
1 row in set (0.00 sec)

mysql> SELECT * FROM users;
+------------+
| id         |
+------------+
| 2147483647 |
| 2147483648 |
+------------+
2 rows in set (0.00 sec)
{code}

executing MySqlToHiveTransfer:

{code}
In [1]: from airflow.operators.mysql_to_hive import MySqlToHiveTransfer
   ...: t = MySqlToHiveTransfer(sql="SELECT * FROM airflow_ci.users", hive_table="users", recreate=True, task_id="t")
   ...: t.execute(None)
   ...: 
[2018-05-21 12:14:09,137] {base_hook.py:83} INFO - Using connection to: localhost
[2018-05-21 12:14:09,140] {base_hook.py:83} INFO - Using connection to: localhost
[2018-05-21 12:14:09,146] {hive_hooks.py:427} INFO - DROP TABLE IF EXISTS users;
CREATE TABLE IF NOT EXISTS users (
id INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ''
STORED AS textfile
;

(snip)

[2018-05-21 12:14:31,667] {hive_hooks.py:233} INFO - Loading data to table default.users
[2018-05-21 12:14:32,364] {hive_hooks.py:233} INFO - Table default.users stats: [numFiles=1, numRows=0, totalSize=24, rawDataSize=0]
[2018-05-21 12:14:32,365] {hive_hooks.py:233} INFO - OK
[2018-05-21 12:14:32,366] {hive_hooks.py:233} INFO - Time taken: 1.299 seconds
{code}

... then the value greater than the upper bound for signed integer is not properly fetched from Hive.

{code}
hive> SELECT * FROM users;
OK
2147483647
NULL
Time taken: 2.461 seconds, Fetched: 2 row(s)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)