You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/10/20 04:52:01 UTC

[GitHub] [hudi] xushiyan commented on issue #2888: [SUPPORT] Hudi DeltaStreamer Job with continuos mode submit job crashes at some point after eating all memory available

xushiyan commented on issue #2888:
URL: https://github.com/apache/hudi/issues/2888#issuecomment-947326866


   @PavelPetukhov 
   
   > Note 2: it works fine without --continuous parameter
   Note 3: it stores data as expected with --continuous but fails at some point
   
   you may also want to set `--source-limit` to 1-2G. Or before that, check each commit produced under `--continuous` and see if data size ingested increasing over time. You could exam commit files under `.hoodie/`. Also check the `checkpoint` value in each commit file see if they are advancing over time.
   
   Some other notes: bulkinsert does not equal to upsert. the former does not update records. You'd need to choose the right operation based on your business need. For parallelism, 50 would not be enough still if you have say 100 executors; you'd need to adjust it based on your spark job size and num output partitions.
   
   Also please try upgrade to 0.9.0.
   
   Above are the suggestions I can gather from the thread above. Closing this due to long time inactive. Please follow up here if there is update on this. thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org