You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "N Campbell (JIRA)" <ji...@apache.org> on 2012/07/09 14:17:34 UTC

[jira] [Created] (HIVE-3245) UTF encoded data not displayed correctly by Hive driver

N Campbell created HIVE-3245:
--------------------------------

             Summary: UTF encoded data not displayed correctly by Hive driver
                 Key: HIVE-3245
                 URL: https://issues.apache.org/jira/browse/HIVE-3245
             Project: Hive
          Issue Type: Bug
          Components: JDBC
    Affects Versions: 0.8.0
            Reporter: N Campbell


various foreign language data (i.e. japanese, thai etc) is loaded into string columns via tab delimited text files. A simple projection of the columns in the table is not displaying the correct data. Exporting the data from Hive and looking at the files implies the data is loaded properly. it appears to be an encoding issue at the driver but unaware of any required URL connection properties re encoding that Hive JDBC requires

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3245) UTF encoded data not displayed correctly by Hive driver

Posted by "N Campbell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

N Campbell updated HIVE-3245:
-----------------------------

    Description: 
various foreign language data (i.e. japanese, thai etc) is loaded into string columns via tab delimited text files. A simple projection of the columns in the table is not displaying the correct data. Exporting the data from Hive and looking at the files implies the data is loaded properly. it appears to be an encoding issue at the driver but unaware of any required URL connection properties re encoding that Hive JDBC requires.


create table if not exists CERT.TLJA_JP_E ( RNUM int , C1 string, ORD int)
row format delimited
fields terminated by '\t'
stored as textfile;

create table if not exists CERT.TLJA_JP ( RNUM int , C1 string, ORD int)
stored as sequencefile;

load data local inpath '/home/hadoopadmin/jdbc-cert/CERT/CERT.TLJA_JP.txt'
overwrite into table CERT.TLJA_JP_E;
insert overwrite table CERT.TLJA_JP  select * from CERT.TLJA_JP_E;


  was:various foreign language data (i.e. japanese, thai etc) is loaded into string columns via tab delimited text files. A simple projection of the columns in the table is not displaying the correct data. Exporting the data from Hive and looking at the files implies the data is loaded properly. it appears to be an encoding issue at the driver but unaware of any required URL connection properties re encoding that Hive JDBC requires

    
> UTF encoded data not displayed correctly by Hive driver
> -------------------------------------------------------
>
>                 Key: HIVE-3245
>                 URL: https://issues.apache.org/jira/browse/HIVE-3245
>             Project: Hive
>          Issue Type: Bug
>          Components: JDBC
>    Affects Versions: 0.8.0
>            Reporter: N Campbell
>         Attachments: CERT.TLJA.txt
>
>
> various foreign language data (i.e. japanese, thai etc) is loaded into string columns via tab delimited text files. A simple projection of the columns in the table is not displaying the correct data. Exporting the data from Hive and looking at the files implies the data is loaded properly. it appears to be an encoding issue at the driver but unaware of any required URL connection properties re encoding that Hive JDBC requires.
> create table if not exists CERT.TLJA_JP_E ( RNUM int , C1 string, ORD int)
> row format delimited
> fields terminated by '\t'
> stored as textfile;
> create table if not exists CERT.TLJA_JP ( RNUM int , C1 string, ORD int)
> stored as sequencefile;
> load data local inpath '/home/hadoopadmin/jdbc-cert/CERT/CERT.TLJA_JP.txt'
> overwrite into table CERT.TLJA_JP_E;
> insert overwrite table CERT.TLJA_JP  select * from CERT.TLJA_JP_E;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3245) UTF encoded data not displayed correctly by Hive driver

Posted by "N Campbell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

N Campbell updated HIVE-3245:
-----------------------------

    Attachment: CERT.TLJA.txt

example JP data
                
> UTF encoded data not displayed correctly by Hive driver
> -------------------------------------------------------
>
>                 Key: HIVE-3245
>                 URL: https://issues.apache.org/jira/browse/HIVE-3245
>             Project: Hive
>          Issue Type: Bug
>          Components: JDBC
>    Affects Versions: 0.8.0
>            Reporter: N Campbell
>         Attachments: CERT.TLJA.txt
>
>
> various foreign language data (i.e. japanese, thai etc) is loaded into string columns via tab delimited text files. A simple projection of the columns in the table is not displaying the correct data. Exporting the data from Hive and looking at the files implies the data is loaded properly. it appears to be an encoding issue at the driver but unaware of any required URL connection properties re encoding that Hive JDBC requires

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3245) UTF encoded data not displayed correctly by Hive driver

Posted by "N Campbell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

N Campbell updated HIVE-3245:
-----------------------------

    Attachment: screenshot-1.jpg

example of test shown in SQLSquirrel
                
> UTF encoded data not displayed correctly by Hive driver
> -------------------------------------------------------
>
>                 Key: HIVE-3245
>                 URL: https://issues.apache.org/jira/browse/HIVE-3245
>             Project: Hive
>          Issue Type: Bug
>          Components: JDBC
>    Affects Versions: 0.8.0
>            Reporter: N Campbell
>         Attachments: CERT.TLJA.txt, screenshot-1.jpg
>
>
> various foreign language data (i.e. japanese, thai etc) is loaded into string columns via tab delimited text files. A simple projection of the columns in the table is not displaying the correct data. Exporting the data from Hive and looking at the files implies the data is loaded properly. it appears to be an encoding issue at the driver but unaware of any required URL connection properties re encoding that Hive JDBC requires.
> create table if not exists CERT.TLJA_JP_E ( RNUM int , C1 string, ORD int)
> row format delimited
> fields terminated by '\t'
> stored as textfile;
> create table if not exists CERT.TLJA_JP ( RNUM int , C1 string, ORD int)
> stored as sequencefile;
> load data local inpath '/home/hadoopadmin/jdbc-cert/CERT/CERT.TLJA_JP.txt'
> overwrite into table CERT.TLJA_JP_E;
> insert overwrite table CERT.TLJA_JP  select * from CERT.TLJA_JP_E;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3245) UTF encoded data not displayed correctly by Hive driver

Posted by "Mark Grover (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445297#comment-13445297 ] 

Mark Grover commented on HIVE-3245:
-----------------------------------

I got into some trouble with the JDBC driver on Hive 0.7.1 as well. I did some poking around too but couldn't spend much time. While doing so, I got to org.apache.hadoop.hive.jdbc.HiveQueryResultSet class. Inside next(), the code has:

{code:title=HiveQueryResultSet.java|borderStyle=solid}
Object data = serde.deserialize(new BytesWritable(rowStr.getBytes()));
{code}

Now, getBytes() comes in two variants, one that takes no parameters and uses the default encoding (like in the above row) or one that explicitly takes the encoding as parameter. I have a hunch that this could be a problem and that the encoding should be sent as a parameter. However, I haven't gotten the chance to verify/refute my hunch.
                
> UTF encoded data not displayed correctly by Hive driver
> -------------------------------------------------------
>
>                 Key: HIVE-3245
>                 URL: https://issues.apache.org/jira/browse/HIVE-3245
>             Project: Hive
>          Issue Type: Bug
>          Components: JDBC
>    Affects Versions: 0.8.0
>            Reporter: N Campbell
>         Attachments: ASF.LICENSE.NOT.GRANTED--screenshot-1.jpg, CERT.TLJA.txt
>
>
> various foreign language data (i.e. japanese, thai etc) is loaded into string columns via tab delimited text files. A simple projection of the columns in the table is not displaying the correct data. Exporting the data from Hive and looking at the files implies the data is loaded properly. it appears to be an encoding issue at the driver but unaware of any required URL connection properties re encoding that Hive JDBC requires.
> create table if not exists CERT.TLJA_JP_E ( RNUM int , C1 string, ORD int)
> row format delimited
> fields terminated by '\t'
> stored as textfile;
> create table if not exists CERT.TLJA_JP ( RNUM int , C1 string, ORD int)
> stored as sequencefile;
> load data local inpath '/home/hadoopadmin/jdbc-cert/CERT/CERT.TLJA_JP.txt'
> overwrite into table CERT.TLJA_JP_E;
> insert overwrite table CERT.TLJA_JP  select * from CERT.TLJA_JP_E;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira