You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "stevenzwu (via GitHub)" <gi...@apache.org> on 2023/04/24 03:19:00 UTC

[GitHub] [iceberg] stevenzwu commented on issue #7393: The serialization problem caused by Flink shuffling design

stevenzwu commented on issue #7393:
URL: https://github.com/apache/iceberg/issues/7393#issuecomment-1519324779

   @huyuanfeng2018 thx for the experiment. `DataStatisticsOrRecord` is the only way to pass statistics to the custom partitioner. Agree with you that Kryo serialization will be slower. we will need to provide a type serializer for the type, which we had in our internal PoC impl and testing.
   
   > resulting in a performance drop of more than 4 times
   
   Can you elaborate on the observation of 4x slowdown? what are the A/B test setup?
   
   In the benchmark with the internal PoC impl, we observed 60% more CPU overhead for a simple job reading from Kafka and writing to Iceberg with event time partitioned table. As expected, bulk of the overhead comes from serdes and network I/O.
   
   cc @yegangy0718 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org