You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/02/06 17:45:38 UTC

[GitHub] [hudi] jtmzheng commented on issue #2408: [SUPPORT] OutOfMemory on upserting into MOR dataset

jtmzheng commented on issue #2408:
URL: https://github.com/apache/hudi/issues/2408#issuecomment-774513919


   @nsivabalan I have not encountered the issue again after temporarily lowering `hoodie.commits.archival.batch` which cleared out the large commit files being loaded for archival. I believe @umehrot2 identified the right root cause/bug in https://github.com/apache/hudi/issues/2408#issuecomment-758320870 (first one). I think these large commits were generated after I added the option `hoodie.cleaner.commits.retained:1` but I'm not sure (it lined up timeline-wise and that change caused the dataset size to shrink drastically)
   
   Some context:
   - dataset was always indexed with 0.6.0 (no upgrade)
   - we are trying to productionize a dataset in Hudi data lake, but it is not there yet.
   - this is also our first time working with Hudi 
   
   I think this issue can be closed as a support request though would be great to understand the different archival configs better (couldn't find good documentation on these)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org