You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Matt Hackett (JIRA)" <ji...@apache.org> on 2010/04/14 18:48:49 UTC
[jira] Commented: (HIVE-333) Add TFileTransport deserializer

    [ https://issues.apache.org/jira/browse/HIVE-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856956#action_12856956 ] 

Matt Hackett commented on HIVE-333:
-----------------------------------

II am curious about the status of this feature request -- it looks like it did not make it into the codebase, though to me and others I imagine it would be extremely useful. 

The ability to move Thrift object stores in TFileTransport format into more Hive/Hadoop-friendly SequenceFiles would seem to complete the loop for a common use case: namely, logging data to the ThriftFile store in Scribe. From what I gather, this is what is also done internally at Facebook.

Apologies in advance if this has already been superseded by other changes or discussions.

> Add TFileTransport deserializer
> -------------------------------
>
>                 Key: HIVE-333
>                 URL: https://issues.apache.org/jira/browse/HIVE-333
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>         Environment: Linux
>            Reporter: Steve Corona
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-333.patch.1, hive-333.patch.2, libthrift_asf.jar
>
>
> I've been googling around all night and havn't really found what I am looking for. Basically, I want to transfer some data from my web servers to hive  in a format that's a little more verbose than plain CSV files. It seems like JSON or thrift would be perfect for this. I am planning on sending this serialized json or thrift data through scribe and loading it into Hive.. I just can't figure out how to tell hive that the input data is a bunch of serialized thrift records (all of the records are the "struct" type)  in a TFileTransport. Hopefully this makes sense...
> Reply from Joydeep Sen Sarma (jssarma@facebook.com)
> Unfortunately the open source code base does not have the loaders we run to convert thrift records in a tfiletransport into a sequencefile that hadoop/hive can work with. One option is that we add this to Hive code base (should be straightforward).
> No process required. Please file a jira - I will try to upload a patch this weekend (just cut'n'paste for most part). Would appreciate some help in finessing it out .. (the internal code is hardwired to some assumptions etc. )

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira