You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/25 17:49:13 UTC

[GitHub] [hudi] alexeykudinkin commented on pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

alexeykudinkin commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1194409817

   @danny0405 a few considerations we need to keep in mind here:
   
   1. RFC-46 is a stepping stone for transitioning from our current "modus operandi" with intermediate representation (Avro) to a state where we'd completely hybrid in relying on engine-specific containers (Dataset/RDD for Spark, for ex) as well as Data representation formats (`InternalRow` for Spark, for ex). This change is very critical first step in that direction of decoupling Hudi fro Avro.
   2. Given how dynamic our code-base is we can't park this change for long. Even now after 2 months of dev, it's going to be a humongous effort to rebase it again onto the latest changes given how much have landed in these 2 months.
   
   While i understand that we all expect radical improvements, we need to keep in mind that these will come when we reach the final state.
   
   P.S. Also, BTW, we won't see 5x improvements, it's gonna be more like up to 2x in the best case simply b/c Hudi is pretty tight in terms of performance across the board.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org