You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Josh Hansen (JIRA)" <ji...@apache.org> on 2013/02/22 01:12:14 UTC

[jira] [Commented] (MAPREDUCE-377) Add serialization for Protocol Buffers

    [ https://issues.apache.org/jira/browse/MAPREDUCE-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583715#comment-13583715 ] 

Josh Hansen commented on MAPREDUCE-377:
---------------------------------------

writeDelimitedTo(OutputStream), mergeDelimitedFrom(InputStream), and parseDelimitedFrom(InputStream) have all made it into the standard Protocol Buffers library now. See https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/MessageLite#writeDelimitedTo(java.io.OutputStream) . That should resolve one obvious obstacle to addressing this issue.

There were questions a few years ago about whether this issue is still relevant; I'm with Tom White that it's very relevant for people who want to use their protobuf data in Hadoop MapReduce. Avro in particular doesn't meet the needs of my organization due to its lack of a sparse representation.

Twitter's elephant-bird library (https://github.com/kevinweil/elephant-bird) provides some protobuf-in-Hadoop support, but it's less than obvious how to use it with protobufs that are not LZO-compressed.
                
> Add serialization for Protocol Buffers
> --------------------------------------
>
>                 Key: MAPREDUCE-377
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-377
>             Project: Hadoop Map/Reduce
>          Issue Type: Wish
>            Reporter: Tom White
>            Assignee: Alex Loddengaard
>         Attachments: hadoop-3788-v1.patch, hadoop-3788-v2.patch, hadoop-3788-v3.patch, protobuf-java-2.0.1.jar, protobuf-java-2.0.2.jar
>
>
> Protocol Buffers (http://code.google.com/p/protobuf/) are a way of encoding data in a compact binary format. This issue is to write a ProtocolBuffersSerialization to support using Protocol Buffers types in MapReduce programs, including an example program. This should probably go into contrib. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira