You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/02 00:49:26 UTC

[GitHub] [hudi] nsivabalan commented on issue #2888: [SUPPORT] Hudi spark submit job crashes at some point after eating all memory available

nsivabalan commented on issue #2888:
URL: https://github.com/apache/hudi/issues/2888#issuecomment-830716667


   @PavelPetukhov :  I see that you have given only 2 as parallelism. Can you try increasing to 50 may be.
   
   --hoodie-conf hoodie.upsert.shuffle.parallelism=50
   --hoodie-conf hoodie.insert.shuffle.parallelism=50
   --hoodie-conf hoodie.delete.shuffle.parallelism=50
   --hoodie-conf hoodie.bulkinsert.shuffle.parallelism=50
   
   Also, I see you are setting the operation as "BULK_INSERT". Wanted to clarify what this operation is about. This is intended to be used only for first time loading data into hudi. Otherwise, you are expected to use "insert" or "upsert". 
   
   also, can you try using the latest release 0.8.0. We did has some fix relating to releasing memory for rdds after each batch of ingestion. might help fix the issue. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org