You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/24 10:21:08 UTC

[GitHub] [hudi] danny0405 commented on issue #5107: [SUPPORT] High performance costs of AvroSerializer in Datasource writing

danny0405 commented on issue #5107:
URL: https://github.com/apache/hudi/issues/5107#issuecomment-1077466306


   > @boneanxs True, full support of Dataset is the long term solution. In my experiment, optimizing the usage of `AvroSerializer` could save 80% costs of the source data reading. But the optimization requires modification of the `AvroSerializer` source code in the spark side.
   > 
   > @qjqqyy Yes, each row will initialize `AvroSerializer` (variables in the lambda named `converter`)
   
   Can we copy a `AvroSerializer` currently on Hudi side and just use that ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org