You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Guy le Mar (JIRA)" <ji...@apache.org> on 2010/12/23 05:03:01 UTC

[jira] Created: (HIVE-1863) Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.

Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.
--------------------------------------------------------------------------------------------

                 Key: HIVE-1863
                 URL: https://issues.apache.org/jira/browse/HIVE-1863
             Project: Hive
          Issue Type: Bug
          Components: Drivers
    Affects Versions: 0.5.0
            Reporter: Guy le Mar


(1) Using the Hive CLI, create a table using...
create table dt4_boolean
(
        dt4_id          int,
        dt4_testbool    boolean,
        dt4_string      string
)
row format delimited
        fields  terminated by ','
        lines  terminated by '\n';

(2) Create a file containing the following text...
1,true,Value is True
2,null,Data says null and must be null
3,,No value that means null
4,NoIdea,Data says NoIdea that's gonna be null
5,false,Value is FALSE

(3) Load the data in the file into the Hive table...
load data local inpath '<DATA FILE PATH>' overwrite into table dt4_boolean;

(4) Check the table works as expected using the Hive CLI...
hive> select * from dt4_boolean;
OK
1	true	Value is True
2	NULL	Data says null and must be null
3	NULL	No value that means null
4	NULL	Data says NoIdea that's gonna be null
5	false	Value is FALSE
Time taken: 0.049 seconds

(5) Using the Hive JDBC driver, execute the same Hive query (select * from dt4_boolean)
(5.1) The "row_str" values obtained by the Hive JDBC driver for deserialization are correct...
1	true	Value is True
2	NULL	Data says null and must be null
3	NULL	No value that means null
4	NULL	Data says NoIdea that's gonna be null
5	false	Value is FALSE

(5.2) However, when these "row_str" are deserialized by the DynamicSerDe to a java.lang.Object, the NULL boolean values are converted to FALSE - instead of being null.

As a consequence, the application making use of the Hive JDBC driver produces this (incorrect) output...

SQL> select dt4_id, dt4_testbool from dt4_boolean;

    DT4_ID DT4_TESTBOOL
---------- ------------
         1            true
         2            false
         3            false
         4            false
         5            false

...instead of producing this (correct) output...
SQL> select dt4_id, dt4_testbool from dt4_boolean;

    DT4_ID DT4_TESTBOOL
---------- ------------
         1            true
         2            NULL
         3            NULL
         4            NULL
         5            false


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1863) Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-1863:
---------------------------------

      Component/s:     (was: Drivers)
                   JDBC
    Fix Version/s: 0.7.0

> Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1863
>                 URL: https://issues.apache.org/jira/browse/HIVE-1863
>             Project: Hive
>          Issue Type: Bug
>          Components: JDBC
>    Affects Versions: 0.5.0
>            Reporter: Guy le Mar
>            Assignee: Edward Capriolo
>             Fix For: 0.7.0
>
>
> (1) Using the Hive CLI, create a table using...
> create table dt4_boolean
> (
>         dt4_id          int,
>         dt4_testbool    boolean,
>         dt4_string      string
> )
> row format delimited
>         fields  terminated by ','
>         lines  terminated by '\n';
> (2) Create a file containing the following text...
> 1,true,Value is True
> 2,null,Data says null and must be null
> 3,,No value that means null
> 4,NoIdea,Data says NoIdea that's gonna be null
> 5,false,Value is FALSE
> (3) Load the data in the file into the Hive table...
> load data local inpath '<DATA FILE PATH>' overwrite into table dt4_boolean;
> (4) Check the table works as expected using the Hive CLI...
> hive> select * from dt4_boolean;
> OK
> 1	true	Value is True
> 2	NULL	Data says null and must be null
> 3	NULL	No value that means null
> 4	NULL	Data says NoIdea that's gonna be null
> 5	false	Value is FALSE
> Time taken: 0.049 seconds
> (5) Using the Hive JDBC driver, execute the same Hive query (select * from dt4_boolean)
> (5.1) The "row_str" values obtained by the Hive JDBC driver for deserialization are correct...
> 1	true	Value is True
> 2	NULL	Data says null and must be null
> 3	NULL	No value that means null
> 4	NULL	Data says NoIdea that's gonna be null
> 5	false	Value is FALSE
> (5.2) However, when these "row_str" are deserialized by the DynamicSerDe to a java.lang.Object, the NULL boolean values are converted to FALSE - instead of being null.
> As a consequence, the application making use of the Hive JDBC driver produces this (incorrect) output...
> SQL> select dt4_id, dt4_testbool from dt4_boolean;
>     DT4_ID DT4_TESTBOOL
> ---------- ------------
>          1            true
>          2            false
>          3            false
>          4            false
>          5            false
> ...instead of producing this (correct) output...
> SQL> select dt4_id, dt4_testbool from dt4_boolean;
>     DT4_ID DT4_TESTBOOL
> ---------- ------------
>          1            true
>          2            NULL
>          3            NULL
>          4            NULL
>          5            false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-1863) Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Capriolo reassigned HIVE-1863:
-------------------------------------

    Assignee: Edward Capriolo

> Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1863
>                 URL: https://issues.apache.org/jira/browse/HIVE-1863
>             Project: Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.5.0
>            Reporter: Guy le Mar
>            Assignee: Edward Capriolo
>
> (1) Using the Hive CLI, create a table using...
> create table dt4_boolean
> (
>         dt4_id          int,
>         dt4_testbool    boolean,
>         dt4_string      string
> )
> row format delimited
>         fields  terminated by ','
>         lines  terminated by '\n';
> (2) Create a file containing the following text...
> 1,true,Value is True
> 2,null,Data says null and must be null
> 3,,No value that means null
> 4,NoIdea,Data says NoIdea that's gonna be null
> 5,false,Value is FALSE
> (3) Load the data in the file into the Hive table...
> load data local inpath '<DATA FILE PATH>' overwrite into table dt4_boolean;
> (4) Check the table works as expected using the Hive CLI...
> hive> select * from dt4_boolean;
> OK
> 1	true	Value is True
> 2	NULL	Data says null and must be null
> 3	NULL	No value that means null
> 4	NULL	Data says NoIdea that's gonna be null
> 5	false	Value is FALSE
> Time taken: 0.049 seconds
> (5) Using the Hive JDBC driver, execute the same Hive query (select * from dt4_boolean)
> (5.1) The "row_str" values obtained by the Hive JDBC driver for deserialization are correct...
> 1	true	Value is True
> 2	NULL	Data says null and must be null
> 3	NULL	No value that means null
> 4	NULL	Data says NoIdea that's gonna be null
> 5	false	Value is FALSE
> (5.2) However, when these "row_str" are deserialized by the DynamicSerDe to a java.lang.Object, the NULL boolean values are converted to FALSE - instead of being null.
> As a consequence, the application making use of the Hive JDBC driver produces this (incorrect) output...
> SQL> select dt4_id, dt4_testbool from dt4_boolean;
>     DT4_ID DT4_TESTBOOL
> ---------- ------------
>          1            true
>          2            false
>          3            false
>          4            false
>          5            false
> ...instead of producing this (correct) output...
> SQL> select dt4_id, dt4_testbool from dt4_boolean;
>     DT4_ID DT4_TESTBOOL
> ---------- ------------
>          1            true
>          2            NULL
>          3            NULL
>          4            NULL
>          5            false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HIVE-1863) Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach resolved HIVE-1863.
----------------------------------

    Resolution: Duplicate

> Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1863
>                 URL: https://issues.apache.org/jira/browse/HIVE-1863
>             Project: Hive
>          Issue Type: Bug
>          Components: JDBC
>    Affects Versions: 0.5.0
>            Reporter: Guy le Mar
>            Assignee: Edward Capriolo
>             Fix For: 0.7.0
>
>
> (1) Using the Hive CLI, create a table using...
> create table dt4_boolean
> (
>         dt4_id          int,
>         dt4_testbool    boolean,
>         dt4_string      string
> )
> row format delimited
>         fields  terminated by ','
>         lines  terminated by '\n';
> (2) Create a file containing the following text...
> 1,true,Value is True
> 2,null,Data says null and must be null
> 3,,No value that means null
> 4,NoIdea,Data says NoIdea that's gonna be null
> 5,false,Value is FALSE
> (3) Load the data in the file into the Hive table...
> load data local inpath '<DATA FILE PATH>' overwrite into table dt4_boolean;
> (4) Check the table works as expected using the Hive CLI...
> hive> select * from dt4_boolean;
> OK
> 1	true	Value is True
> 2	NULL	Data says null and must be null
> 3	NULL	No value that means null
> 4	NULL	Data says NoIdea that's gonna be null
> 5	false	Value is FALSE
> Time taken: 0.049 seconds
> (5) Using the Hive JDBC driver, execute the same Hive query (select * from dt4_boolean)
> (5.1) The "row_str" values obtained by the Hive JDBC driver for deserialization are correct...
> 1	true	Value is True
> 2	NULL	Data says null and must be null
> 3	NULL	No value that means null
> 4	NULL	Data says NoIdea that's gonna be null
> 5	false	Value is FALSE
> (5.2) However, when these "row_str" are deserialized by the DynamicSerDe to a java.lang.Object, the NULL boolean values are converted to FALSE - instead of being null.
> As a consequence, the application making use of the Hive JDBC driver produces this (incorrect) output...
> SQL> select dt4_id, dt4_testbool from dt4_boolean;
>     DT4_ID DT4_TESTBOOL
> ---------- ------------
>          1            true
>          2            false
>          3            false
>          4            false
>          5            false
> ...instead of producing this (correct) output...
> SQL> select dt4_id, dt4_testbool from dt4_boolean;
>     DT4_ID DT4_TESTBOOL
> ---------- ------------
>          1            true
>          2            NULL
>          3            NULL
>          4            NULL
>          5            false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1863) Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974701#action_12974701 ] 

Edward Capriolo commented on HIVE-1863:
---------------------------------------

Right. Something gets "lost in translation" here because Hive Service is using the ArrayList<String> that comes from driver.getResults() which is designed for sending output to the final console not another serde. We should be able to re-write string and place the nulls back in carefully, however a better solution might exists.

> Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1863
>                 URL: https://issues.apache.org/jira/browse/HIVE-1863
>             Project: Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.5.0
>            Reporter: Guy le Mar
>            Assignee: Edward Capriolo
>
> (1) Using the Hive CLI, create a table using...
> create table dt4_boolean
> (
>         dt4_id          int,
>         dt4_testbool    boolean,
>         dt4_string      string
> )
> row format delimited
>         fields  terminated by ','
>         lines  terminated by '\n';
> (2) Create a file containing the following text...
> 1,true,Value is True
> 2,null,Data says null and must be null
> 3,,No value that means null
> 4,NoIdea,Data says NoIdea that's gonna be null
> 5,false,Value is FALSE
> (3) Load the data in the file into the Hive table...
> load data local inpath '<DATA FILE PATH>' overwrite into table dt4_boolean;
> (4) Check the table works as expected using the Hive CLI...
> hive> select * from dt4_boolean;
> OK
> 1	true	Value is True
> 2	NULL	Data says null and must be null
> 3	NULL	No value that means null
> 4	NULL	Data says NoIdea that's gonna be null
> 5	false	Value is FALSE
> Time taken: 0.049 seconds
> (5) Using the Hive JDBC driver, execute the same Hive query (select * from dt4_boolean)
> (5.1) The "row_str" values obtained by the Hive JDBC driver for deserialization are correct...
> 1	true	Value is True
> 2	NULL	Data says null and must be null
> 3	NULL	No value that means null
> 4	NULL	Data says NoIdea that's gonna be null
> 5	false	Value is FALSE
> (5.2) However, when these "row_str" are deserialized by the DynamicSerDe to a java.lang.Object, the NULL boolean values are converted to FALSE - instead of being null.
> As a consequence, the application making use of the Hive JDBC driver produces this (incorrect) output...
> SQL> select dt4_id, dt4_testbool from dt4_boolean;
>     DT4_ID DT4_TESTBOOL
> ---------- ------------
>          1            true
>          2            false
>          3            false
>          4            false
>          5            false
> ...instead of producing this (correct) output...
> SQL> select dt4_id, dt4_testbool from dt4_boolean;
>     DT4_ID DT4_TESTBOOL
> ---------- ------------
>          1            true
>          2            NULL
>          3            NULL
>          4            NULL
>          5            false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1863) Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.

Posted by "Steven Wong (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974728#action_12974728 ] 

Steven Wong commented on HIVE-1863:
-----------------------------------

This may have been fixed by HIVE-1378 in trunk (0.7). Please give that a try.

> Boolean columns in Hive tables containing NULL are treated as FALSE by the Hive JDBC driver.
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-1863
>                 URL: https://issues.apache.org/jira/browse/HIVE-1863
>             Project: Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.5.0
>            Reporter: Guy le Mar
>            Assignee: Edward Capriolo
>
> (1) Using the Hive CLI, create a table using...
> create table dt4_boolean
> (
>         dt4_id          int,
>         dt4_testbool    boolean,
>         dt4_string      string
> )
> row format delimited
>         fields  terminated by ','
>         lines  terminated by '\n';
> (2) Create a file containing the following text...
> 1,true,Value is True
> 2,null,Data says null and must be null
> 3,,No value that means null
> 4,NoIdea,Data says NoIdea that's gonna be null
> 5,false,Value is FALSE
> (3) Load the data in the file into the Hive table...
> load data local inpath '<DATA FILE PATH>' overwrite into table dt4_boolean;
> (4) Check the table works as expected using the Hive CLI...
> hive> select * from dt4_boolean;
> OK
> 1	true	Value is True
> 2	NULL	Data says null and must be null
> 3	NULL	No value that means null
> 4	NULL	Data says NoIdea that's gonna be null
> 5	false	Value is FALSE
> Time taken: 0.049 seconds
> (5) Using the Hive JDBC driver, execute the same Hive query (select * from dt4_boolean)
> (5.1) The "row_str" values obtained by the Hive JDBC driver for deserialization are correct...
> 1	true	Value is True
> 2	NULL	Data says null and must be null
> 3	NULL	No value that means null
> 4	NULL	Data says NoIdea that's gonna be null
> 5	false	Value is FALSE
> (5.2) However, when these "row_str" are deserialized by the DynamicSerDe to a java.lang.Object, the NULL boolean values are converted to FALSE - instead of being null.
> As a consequence, the application making use of the Hive JDBC driver produces this (incorrect) output...
> SQL> select dt4_id, dt4_testbool from dt4_boolean;
>     DT4_ID DT4_TESTBOOL
> ---------- ------------
>          1            true
>          2            false
>          3            false
>          4            false
>          5            false
> ...instead of producing this (correct) output...
> SQL> select dt4_id, dt4_testbool from dt4_boolean;
>     DT4_ID DT4_TESTBOOL
> ---------- ------------
>          1            true
>          2            NULL
>          3            NULL
>          4            NULL
>          5            false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.