You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Arun Vasu <ar...@gmail.com> on 2013/09/04 11:35:23 UTC

Issue in BINARY datatype

Hi,
I am using Hive 10. When I create an external table with column type as
Binary, the query result on the table is showing some junk values for the
column with binary datatype.

Please find below the query I have used to create the table:

CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '^'
   LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/hivetables/testbinary';

The query I have used is : select * from bool1

The sample data in the hdfs file is:

0^arun@abc.com^001
1^arun@abc.com^010
 ^arun@abc.com^011
 ^arun@abc.com^100
t^arun@abc.com^101
f^arun@abc.com^110
true^arun@abc.com^111
false^arun@abc.com^001
123^    ^01100010
12344^    ^01100001

Please share your inputs if it is possible.

Thanks,
Arun

-- 
Thanks,
Arun

Re: Issue in BINARY datatype

Posted by Sushanth Sowmyan <kh...@gmail.com>.
I tried recreating your exact dump with a more recent(built as of
about 3 weeks back) hive. And in addition to the base64-decoded
version of the binary data, I get some extraneous characters in every
line of the select *. (consistently the same extra characters)

For eg, an od -c of the first line of this table goes:

N U L L \t a b c \t 357 277 275 M \n

The correct base64-decode of "001" is just "M".

Saving this to another equivalent table, with a CTAS (create table as
select) yields a similar encoding to the original file for the last
two lines, and an extra "=" at the end for each line before. That
encoding, in turn, seems stable, if I CTAS from that table to another.
All 3 yield the same output when I do select *. I get the same output
from select * even when I CTAS to an rcfile.

The problem might be with the LazySimpleSerDe binary decode, but if
so, it is so with the encode as well. Or, the problem might be with
how binary data is output using select *. Either way, this merits
creating a jira to address.


On Wed, Sep 4, 2013 at 2:35 AM, Arun Vasu <ar...@gmail.com> wrote:
> Hi,
> I am using Hive 10. When I create an external table with column type as
> Binary, the query result on the table is showing some junk values for the
> column with binary datatype.
>
> Please find below the query I have used to create the table:
>
> CREATE EXTERNAL TABLE BOOL1(NB BOOLEAN,email STRING, bitfld BINARY)
>  ROW FORMAT DELIMITED
>    FIELDS TERMINATED BY '^'
>    LINES TERMINATED BY '\n'
> STORED AS TEXTFILE
> LOCATION '/user/hivetables/testbinary';
>
> The query I have used is : select * from bool1
>
> The sample data in the hdfs file is:
>
> 0^arun@abc.com^001
> 1^arun@abc.com^010
>  ^arun@abc.com^011
>  ^arun@abc.com^100
> t^arun@abc.com^101
> f^arun@abc.com^110
> true^arun@abc.com^111
> false^arun@abc.com^001
> 123^    ^01100010
> 12344^    ^01100001
>
> Please share your inputs if it is possible.
>
> Thanks,
> Arun
>
> --
> Thanks,
> Arun