You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/10/01 14:02:06 UTC

[GitHub] [flink] StephanEwen commented on pull request #13501: Single task result partition type

StephanEwen commented on pull request #13501:
URL: https://github.com/apache/flink/pull/13501#issuecomment-702157253


   I took a look only at the partial record handling. I like the idea of not transporting the partial record offsets at all across the networks, which means we don't have to make the deserializers aware of that.
   
   Regarding the implementation, I have the same comment as @pnowojski . Given that the partial lengths fields are not communicated across the network, I feel that this logic does not belong into buffers or `BufferBuilder`/`BufferConsumer`. Those should handle purely bytes without any knowledge of records.
   
   I would suggest to change the `buffers` queue from holding `BufferConsumer` to holding `BufferConsumerWithOffsets` or so (maybe come up with a shorter name). That way we transport the same information, but not inject it into the buffers.
   
   Some more things to watch out for in further refactoring:
   
     - The *per record methods* of *ResultPartition* and its subclasses are highly performance sensitive. They should not become virtual methods. We currently can guarantee that because there is only one class implementing these per-record methods (BufferWritingResultPartition).
       When we start having different implementations, we ideally have only one of them loaded in the JVM at any point, so that the de-virtualization works properly. That means design the classes/subclasses such that a type of job (batch, streaming, streaming with partial failover) would only ever rely on one of these partition types in a job.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org