You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/10/20 04:52:01 UTC
[GitHub] [hudi] xushiyan commented on issue #2888: [SUPPORT] Hudi DeltaStreamer Job with continuos mode submit job crashes at some point after eating all memory available
xushiyan commented on issue #2888:
URL: https://github.com/apache/hudi/issues/2888#issuecomment-947326866
@PavelPetukhov
> Note 2: it works fine without --continuous parameter
Note 3: it stores data as expected with --continuous but fails at some point
you may also want to set `--source-limit` to 1-2G. Or before that, check each commit produced under `--continuous` and see if data size ingested increasing over time. You could exam commit files under `.hoodie/`. Also check the `checkpoint` value in each commit file see if they are advancing over time.
Some other notes: bulkinsert does not equal to upsert. the former does not update records. You'd need to choose the right operation based on your business need. For parallelism, 50 would not be enough still if you have say 100 executors; you'd need to adjust it based on your spark job size and num output partitions.
Also please try upgrade to 0.9.0.
Above are the suggestions I can gather from the thread above. Closing this due to long time inactive. Please follow up here if there is update on this. thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org