You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/05/24 01:13:12 UTC
[jira] [Commented] (KAFKA-3744) Message format needs to identify serializer

    [ https://issues.apache.org/jira/browse/KAFKA-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297473#comment-15297473 ] 

ASF GitHub Bot commented on KAFKA-3744:
---------------------------------------

GitHub user davek2 opened a pull request:

    https://github.com/apache/kafka/pull/1419

    Allocate 2 attribute bits to signal payload format

    This documentation update proposes a mechanism to signal the serialization used for the message payload, resolving issue https://issues.apache.org/jira/browse/KAFKA-3744.  No change is made to the message structure; two previously-reserved bits in the attribute byte now have defined values, and for one of four cases the key field is defined to be a JSON object.
    
    No change is required to messaging software.   No change is required to existing producer and consumer clients that use pre-agreed payload serialization. 
    
    Misc notes:
    1) Only one attribute bit would be needed if serialization were always signalled using the key field.  But it seems preferable to define two common serializations that do not have any dependency on the key field.  Selection of the common formats is arbitrary; text and avro seem reasonable but any two could be used instead.
    2) The compression attribute uses three bits but only two are defined.  If the intent is to use all three bits for compression the undefined values should be listed as reserved; if not, the timestamp attribute can slide down to bit 2 and serialization to bits 3~4, leaving bits 5~7 reserved.
    3) It's unclear why message field 6 should be called "key" - a variable-length field is more likely to be described as "attributes" or "metadata", and 1-byte field 3 would be called "flags" instead of "attributes".
    4) Field 8 is called "payload" under message format and "value" under on-disk format.  Changed to payload in both places.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davek2/kafka trunk

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/1419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1419
    
----
commit 1d88b8d48cdfe67989bebf239f7588ca24e961b6
Author: Joe <da...@hotmail.com>
Date:   2016-05-24T00:32:04Z

    Allocate 2 attribute bits for payload format

----


> Message format needs to identify serializer
> -------------------------------------------
>
>                 Key: KAFKA-3744
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3744
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: David Kay
>            Priority: Minor
>
> https://issues.apache.org/jira/browse/KAFKA-3698 was recently resolved with https://github.com/apache/kafka/commit/27a19b964af35390d78e1b3b50bc03d23327f4d0.
> But Kafka documentation on message formats needs to be more explicit for new users. Section 1.3 Step 4 says: "Send some messages" and takes lines of text from the command line. Beginner's guide (http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign Slide 104 says:
> {noformat}
>    Kafka does not care about data format of msg payload
>    Up to developer to handle serialization/deserialization
>       Common choices: Avro, JSON
> {noformat}
> If one producer sends lines of console text, another producer sends Avro, a third producer sends JSON, and a fourth sends CBOR, how does the consumer identify which deserializer to use for the payload?  The commit includes an opaque K byte Key that could potentially include a codec identifier, but provides no guidance on how to use it:
> {quote}
> "Leaving the key and value opaque is the right decision: there is a great deal of progress being made on serialization libraries right now, and any particular choice is unlikely to be right for all uses. Needless to say a particular application using Kafka would likely mandate a particular serialization type as part of its usage."
> {quote}
> Mandating any particular serialization is as unrealistic as mandating a single mime-type for all web content.  There must be a way to signal the serialization used to produce this message's V byte payload, and documenting the existence of even a rudimentary codec registry with a few values (text, Avro, JSON, CBOR) would establish the pattern to be used for future serialization libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)