You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2010/11/11 23:48:22 UTC

[jira] Updated: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map for configuration

     [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-6685:
----------------------------------

    Attachment: serial.patch

Ok, here is a preliminary patch. 

It includes support for Avro, Thrift, ProtocolBuffers, Writables, Java serialization, and an adaptor for the old style serializations. One of the features of the Avro serialization is that the kind ("reflection", "specific", "generic") is a parameter that can be changed between writing and reading the file.

All of the types can be put into SequenceFiles, MapFiles, BloomFilterMapFiles, SetFile, and ArrayFile.

In a separate issue, I'll upload the OFile wrapper that goes on top of TFile to allow all of the types into TFiles as well.

It creates a new package o.a.h.io.serial that defines the new interfaces. The new serializations save their metadata in a framework specific format. To make the format extensible, I've use protocol buffers to encode this information. This will allow us to make arbitrary compatible extensions later.

> Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: serial.patch
>
>
> Currently, the generic serialization framework uses Map<String,String> for the serialization specific configuration. Since this data is really internal to the specific serialization, I think we should change it to be an opaque binary blob. This will simplify the interface for defining specific serializations for different contexts (MAPREDUCE-1462). It will also move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.