You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jon Erdman <jo...@wyndenstark.com> on 2022/04/04 23:10:23 UTC

[QUESTION] Incremental MongoDB Write

Hi Beamers (is that a thing?),

I am relatively new to Beam and am attempting to use the python WriteToMongoDB transform but ran into some undesirable behavior. The implementation seems to wait until the entire PCollection has been received to start doing the actual Mongo writes. My use case requires millions of new document writes as part of a larger pipeline so this results in a massive backup at best, and possible memory issues that cause the whole job to fail.

I would like to switch to batched incremental writes but, as far as I can tell, this is not possible in the current WriteToMongoDB implementation. This seems primarily due to the “reshuffle” step that requires the entire set of elements prior to executing. I attempted to use different window and trigger configurations, but it seems to ignore those and just use a global window regardless.

Am I missing something here? Is there some other way around this constraint? I’m nearing the point where I am just going to implement my own Mongo writer but wanted to check in here first to see if anyone can provide alternate guidance.

Thanks!
Jon