You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/05/18 20:10:54 UTC

[GitHub] [hudi] alexeykudinkin commented on pull request #5129: [HUDI-3709] Fixing `ParquetWriter` impls not respecting Parquet Max File Size limit

alexeykudinkin commented on PR #5129:
URL: https://github.com/apache/hudi/pull/5129#issuecomment-1130488173

   @nsivabalan we should not be interfering with the caching on the Parquet Writer level (by manually flushing), and checking the ParquetWriter for the currently accumulated buffer size is the right way to interface with it (as compared to intercept the  FileSystem writes and accounting for how many bytes were written).
   
   The issue inadvertently planted with this approach (addressed in #5497) was that the cost of the `getDataSize` was not factored in (assumed it to be O(1), while in reality it's O(N) of the written blocks)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org