You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chuck Connell (JIRA)" <ji...@apache.org> on 2012/12/01 00:44:00 UTC

[jira] [Commented] (HIVE-2380) Add Binary Datatype in Hive

    [ https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507761#comment-13507761 ] 

Chuck Connell commented on HIVE-2380:
-------------------------------------

I am trying to use this feature (BINARY columns) and I believe I have the perfect use-case for it, but I am missing something. 

Here is the background... I have some files that each contain just one logical field, which is a binary object. (The files are Google Protobuf format.) I want to put these binary files into a larger file, where each protobuf is a logical record. Then I want to define a Hive table that stores each protobuf as one row, with the entire protobuf object in one BINARY column. Then I will use a custom UDF to select/query the binary object. 

This is about as simple as can be for putting binary data into Hive. But all of the test cases for this jira seem to draw the binary columns from another existing table and CAST them. I want to load the files from disk.

What file format should I use to package the binary rows? What should the Hive table definition be? I cannot use TEXTFILE, since the binary may contain newlines. Many of my attempts have choked on the newlines.

Thanks very much,
Chuck Connell
Nuance
Burlington, MA


                
> Add Binary Datatype in Hive
> ---------------------------
>
>                 Key: HIVE-2380
>                 URL: https://issues.apache.org/jira/browse/HIVE-2380
>             Project: Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.8.0
>
>         Attachments: hive-2380_1.patch, hive-2380_2.patch, hive-2380_3.patch, hive-2380_4.patch, hive-2380.patch
>
>
> Add bytearray as a primitive data type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira