You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2015/02/04 00:40:34 UTC
[jira] [Commented] (SAMZA-484) Define the
serialization/deserialization format for stream tuple
[ https://issues.apache.org/jira/browse/SAMZA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304319#comment-14304319 ]
Chris Riccomini commented on SAMZA-484:
---------------------------------------
bq. This causes some amount of code redundancy (dislike) because we now have a new Serde[Object] for each data-serde format supported in SQL along with the existing ones.
Ya, this is a bummer.
What if we wrap, rather than duplicate, the existing Serdes? So, the Sql\*Serde classes would be responsible for mapping to/from the underlying Serde objects?
{code}
public class SqlStringSerde implements Serde<StringData> {
Serde<String> serde;
public SqlStringSerde(String encoding) {
this.serde = new StringSerde(encoding);
}
@Override
public StringData fromBytes(byte[] bytes) {
return new StringData(serde.fromBytes(bytes));
}
@Override
public byte[] toBytes(StringData object) {
serde.toBytes(object.strValue());
}
}
{code}
It's still not quite ideal, but at least we're not duplicating code. I think we'd have to implement Sql serdes and Data objects for String, byte, Integer, Long, Json, and Avro. It's kind of annoying, but I think it's the most semantically accurate. If you use a StringSerde, you'll get String objects back. If you use a AvroDataSerde you'll get Data objects back that wrap Avro objects.
The only alternatives that I can think of are what you list, and what was discussed on SAMZA-429. Personally, I don't want to force a data model on Samza, nor do I really want to have confusing duplicate APIs in IncomingMessageEnvelope, so that seems to leave only the Serde approach. At least this way, only SQL (code-generated configs), or developers using Sql operator tasks, will really get hit by this.
> Define the serialization/deserialization format for stream tuple
> ----------------------------------------------------------------
>
> Key: SAMZA-484
> URL: https://issues.apache.org/jira/browse/SAMZA-484
> Project: Samza
> Issue Type: Sub-task
> Components: sql
> Reporter: Yi Pan (Data Infrastructure)
> Assignee: Navina Ramesh
> Priority: Minor
> Labels: project
> Attachments: SAMZA-484.patch
>
>
> It came out in the discussion for streaming SQL that we will need to define the serialization/deserialization format for stream tuple.
> The ideal serialization/deserialization format should allow both forward and backward compatibility on additional/missing fields in the data.
> Several choices to be considered:
> 1) Avro
> 2) Protobuf
> 3) Flatbuffer
> It might also be interesting to consider a pluggable serialization interface that allows different serialization methods for different Samza jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)