You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Kirill Rodionov (Jira)" <ji...@apache.org> on 2021/06/25 09:45:00 UTC

[jira] [Commented] (KAFKA-5761) Serializer API should support ByteBuffer

    [ https://issues.apache.org/jira/browse/KAFKA-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369361#comment-17369361 ] 

Kirill Rodionov commented on KAFKA-5761:
----------------------------------------

Can you look at the PR please, [~chia7712] ?

> Serializer API should support ByteBuffer
> ----------------------------------------
>
>                 Key: KAFKA-5761
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5761
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients
>    Affects Versions: 0.11.0.0
>            Reporter: Bhaskar Gollapudi
>            Assignee: Chia-Ping Tsai
>            Priority: Major
>              Labels: features, performance
>
> Consider the Serializer : Its main method is :
> byte[] serialize(String topic, T data);
> Producer applications create a implementation that takes in an instance (
> of T ) and convert that to a byte[]. This byte array is allocated a new for
> this message.This byte array then is handed over to Kafka Producer API
> internals that write the bytes to buffer/ network socket. When the next
> message arrives , the serializer instead of creating a new byte[] , should
> try to reuse the existing byte[] for the new message. This requires two
> things :
> 1. The process of handing off the bytes to the buffer/socket and reusing
> the byte[] must happen on the same thread.
> 2 There should be a way for marking the end of available bytes in the
> byte[].
> The first is reasonably simple to understand. If this does not happen , and
> without other necessary synchrinization , the byte[] get corrupted and so
> is the message written to buffer/socket.However , this requirement is easy
> to meet for a producer application , because it controls the threads on
> which the serializer is invoked.
> The second is where the problem lies with the current API. It does not
> allow a variable size of bytes to be read from a container. It is limited
> by the byte[]'s length. This forces the producer to
> 1 either create a new byte[] for a message that is bigger than the previous
> one.
> OR
> 2. Decide a max size and use a padding .
> Both are cumbersome and error prone, and may cause wasting of network
> bandwidth.
> Instead , if there is an Serializer with this method :
> ByteBuffer serialize(String topic, T data);
> This helps to implements a reusable bytes container for  clients to avoid
> allocations for each message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)