You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/04/30 19:43:30 UTC

[GitHub] [flink] AHeise commented on pull request #11725: [FLINK-15670][API] Provide a Kafka Source/Sink pair as KafkaShuffle

AHeise commented on pull request #11725:
URL: https://github.com/apache/flink/pull/11725#issuecomment-622066082


   So another round concerning API changes with @pnowojski . He shares the concerns on extending `SinkFunction` and `TwoPhaseCommitFunction` for something that could be deemed an implementation detail.
   
   So far we have discussed going either the `SinkFunction` or the custom operator route. Piotr actually proposed to do both at once:
   * Instead of some generic `StreamShuffleSink`, have a very specific `KafkaShuffleSink` that extends `StreamSink` and pretty much just overrides the watermark processing.
   * Also implement the `FlinkKafkaShuffleProducer` in the way that you have. But instead of overriding the newly added watermark functions in `SinkFunction` and `TwoPhaseCommitFunction`, just provide these methods explicitly.
   * The `KafkaShuffleSink` will then call this function directly in `processWatermark`.
   
   This approach has several advantages:
   * No changes to Public API (`SinkFunction` and `TwoPhaseCommitFunction`).
   * Full control over watermark handling without forking out some general concept.
   * No code replication in operator and `SinkFunction`.
   * Little changes to your current approach (most changes concern glue code).
   
   The only downside is that functionality is a bit split on two entities, but that's also true for the current approach. With this approach, you could even have both entities in the same class (e.g., make `FlinkKafkaShuffleProducer` an inner class of `KafkaShuffleSink`).
   
   Another advantage is that all implementation work can be done in `flink-connector-kafka`, thus solving the cyclic dependency issue that you raised.
   Further, you can decrease the visibility to all `KafkaProducer` and `KafkaConsumer` changes from `protected` to `package-private`, making it more obvious that we don't see these things as public API.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org