You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "xubo245 (JIRA)" <ji...@apache.org> on 2019/04/18 04:18:00 UTC

[jira] [Commented] (HIVE-21626) Why hive can't load normal string as binary from csv?

    [ https://issues.apache.org/jira/browse/HIVE-21626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16820715#comment-16820715 ] 

xubo245 commented on HIVE-21626:
--------------------------------

[~amanin][~antmanin][~amanin] [~$iddhe$h] Please help to check it.

> Why hive can't load normal string as binary from csv?
> -----------------------------------------------------
>
>                 Key: HIVE-21626
>                 URL: https://issues.apache.org/jira/browse/HIVE-21626
>             Project: Hive
>          Issue Type: Bug
>         Environment: hive client: hive1.2.2
>            Reporter: xubo245
>            Priority: Major
>
> Why hive can't load normal string as binary from csv?
> Hive-1.2.2
> {code:java}
> hive>  CREATE TABLE IF NOT EXISTS hivetable (
>     >     id int,
>     >     label boolean,
>     >     name string,
>     >     image binary,
>     >     autoLabel boolean)
>     >  row format delimited fields terminated by 'ö';
> OK
> Time taken: 0.068 seconds
> hive> LOAD DATA LOCAL INPATH '/Users/xubo/Desktop/xubo/git/carbondata3/integration/spark-common-test/src/test/resources/binarystringdata2.csv' INTO TABLE hivetable;
> Loading data to table default.hivetable
> Table default.hivetable stats: ÄnumFiles=1, totalSize=82Å
> OK
> Time taken: 0.122 seconds
> hive> select * from hivetable;
> OK
> 2	false	2.png	i�	true
> 3	false	3.png	n*%�
>                             	false
> 1	true	1.png	ÜAyard dutyÜB	true
> {code}
> binarystringdata2.csv data is:
> {code:java}
> ```
> 2|false|2.png|abc|true
> 3|false|3.png|biology|false
> 1|true|1.png|^Ayard duty^B|true
> {code}
> binarystringdata2.csv without \u0001 like over1k of hive project.
> For the "abc" in csv, it should return abc by reading from hive after loading into hive, but why it is "I�"?. abc get bytes is byte[] 97 98 99, after org.apache.hadoop.hive.serde2.lazy.LazyBinary#decodeIfNeeded, it will decode to base64, return byte[] 105 -74:
> {code:java}
>   public static byte[] decodeIfNeeded(byte[] recv) {
>     boolean arrayByteBase64 = Base64.isArrayByteBase64(recv);
>     if (LOG.isDebugEnabled() && arrayByteBase64) {
>       LOG.debug("Data only contains Base64 alphabets only so try to decode the data.");
>     }
>     return arrayByteBase64 ? Base64.decodeBase64(recv) : recv;
>   }
> {code}
> when we query with sql in spark, it will return byte[] 69 B7, for the hive alien/beeline, it will return string "I�"( char array is 105 65533).
> Why the input and output data is different for hive load data ? insert into is ok.
> Is it bug or limit ? only support base64 code or string that was validated with isBase64 as false in csv? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)