You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2008/09/03 11:03:45 UTC

[jira] Commented: (HADOOP-3787) Add serialization for Thrift

    [ https://issues.apache.org/jira/browse/HADOOP-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627937#action_12627937 ] 

Tom White commented on HADOOP-3787:
-----------------------------------

This, and HADOOP-1986 in general, does not mandate the use of SequenceFile. However, SequenceFiles are a convenient binary format, so that's what's I've used here for the example.

It would be possible to run MapReduce against Thrift records in flat files with a suitable InputFormat (which would need to be written), but such files would not be splittable (unless there is some general way to find Thrift record boundaries from an arbitrary position in the file). Unsplittable files do not in general play well with MapReduce and HDFS. Perhaps one way to fix this is to insert a special Thrift record every n records whose unique byte sequence can be scanned for to realign with the record boundaries. Could this work?

> Add serialization for Thrift
> ----------------------------
>
>                 Key: HADOOP-3787
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3787
>             Project: Hadoop Core
>          Issue Type: Wish
>          Components: examples, mapred
>            Reporter: Tom White
>         Attachments: hadoop-3787.patch, libthrift.jar
>
>
> Thrift (http://incubator.apache.org/thrift/) is cross-language serialization and RPC framework. This issue is to write a ThriftSerialization to support using Thrift types in MapReduce programs, including an example program. This should probably go into contrib.
> (There is a prototype implementation in https://issues.apache.org/jira/secure/attachment/12370464/hadoop-serializer-v2.tar.gz)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.