You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by David Kay <da...@outlook.com> on 2016/05/26 13:29:36 UTC

KAFKA-3744: Message format to identify serializer

All, I plan to submit a KIP to begin discussion on https://issues.apache.org/jira/browse/KAFKA-3744 and the associated Pull Request https://github.com/apache/kafka/pull/1419.  I'll notify this list when the KIP is submitted.

Please discard my previous message containing bogus subject and links.

Re: KAFKA-3744: Message format to identify serializer

Posted by Gwen Shapira <gw...@confluent.io>.
Hi David,

Thank you for bringing this up.
I do agree that improving Kafka's message metadata is important and we are
slightly lacking on the story. We don't have built in support for message
types, or for tagging messages with things like source host or source
cluster (which were frequently requested), or many other similar metadata.

I don't think adding this information to the message format is a good idea
though.
The limited # of bits means that we are very very limited in what you
express, and your specific proposal ties Kafka to very specific formats
(Avro, Text, JSON). You are limited to 4 formats because of the bytes
limitations, but I am strongly against tying Kafka to specific formats.
First, Protobuf, Thrift and XML are very popular. Second, who knows what
will people invent tomorrow? We don't want to plan being obsolete next year.

In suggest an alternative:
Lets work together to design a community-recommended schema. It will be
implemented entirely inside the payload (keys and values), we'll leave the
specific serialization to the users, but we can add some tools to support
extracting metadata and such. Flume already has something similar (the
key-value properties in the key, with hosts, timestamps and such). We can
say something like: The first byte in the value will represent the data
type, the second will be schema-ID (if any), then key-value pairs with at
least host-name and cluster-id, etc. Kind of like how HTML has its own
headers, separate from HTTP protocol.

I can draw up a more detailed proposal (not quite a KIP, since it doesn't
change Kafka proper). if there is community interest. I remember Chris
Ricommini expressing interest, and few others (Maybe LinkedIn?) asked for
specific metadata.

Thanks,

Gwen

On Thu, May 26, 2016 at 6:29 AM, David Kay <da...@outlook.com> wrote:

> All, I plan to submit a KIP to begin discussion on
> https://issues.apache.org/jira/browse/KAFKA-3744 and the associated Pull
> Request https://github.com/apache/kafka/pull/1419.  I'll notify this list
> when the KIP is submitted.
>
> Please discard my previous message containing bogus subject and links.