You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Robert Metzger (JIRA)" <ji...@apache.org> on 2016/06/14 09:41:01 UTC
[jira] [Commented] (FLINK-4050) FlinkKafkaProducer API Refactor

    [ https://issues.apache.org/jira/browse/FLINK-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329195#comment-15329195 ] 

Robert Metzger commented on FLINK-4050:
---------------------------------------

Hi Elias,
thank you for opening this JIRA. I agree with you that the serialization of Kafka messages in Flink is not very Kafkaesque. I think the current design has historical reasons: We wanted to share the serializers between the different connectors. However, it seems that there are not so many common grounds between the different connectors, so this option isn't really used in practice.
I also recall that there were some issues with system- and user classloaders when passing the classes from the Flink client to the cluster (Kafka was not respecting Flink's context classloader)

Changing the Kafka connector's serialization APIs would be a breaking change for existing users. I wonder how critical this issue is for you. Maybe we can do some smaller changes to allow users to use Kafka's serializers, by using a NoOp Flink Serialization Schema that just passes the objects through.

> FlinkKafkaProducer API Refactor
> -------------------------------
>
>                 Key: FLINK-4050
>                 URL: https://issues.apache.org/jira/browse/FLINK-4050
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kafka Connector
>    Affects Versions: 1.0.3
>            Reporter: Elias Levy
>
> The FlinkKafkaProducer API seems more difficult to use than it should be.  
> The API requires you pass it a SerializationSchema or a KeyedSerializationSchema, but the Kafka producer already has a serialization API.  Requiring a serializer in the Flink API precludes the use of the Kafka serializers.  For instance, they preclude the use of the Confluent KafkaAvroSerializer class that makes use of the Confluent Schema Registry.  Ideally, the serializer would be optional, so as to allow the Kafka producer serializers to handle the task.
> In addition, the KeyedSerializationSchema conflates message key extraction with key serialization.  If the serializer were optional, to allow the Kafka producer serializers to take over, you'd still need to extract a key from the message.
> And given that the key may not be part of the message you want to write to Kafka, an upstream step may have to package the key with the message to make both available to the sink, for instance in a tuple. That means you also need to define a method to extract the message to write to Kafka from the element passed into the sink by Flink.  
> In summary, there should be separation of extraction of the key and message from the element passed into the sink from serialization, and the serialization step should be optional.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)