You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Xander Song <ia...@gmail.com> on 2020/03/05 23:09:44 UTC

Workers running out of memory

I am running a Beam batch pipeline on Dataflow using the Python SDK. When I
turn off autoscaling and specify a large number of workers (> 100), the
jobs succeeds. When I specify a smaller number of workers (e.g., 20),
however, the job fails. I believe the cause is that workers are running out
of memory, as I see in the workers logs that many workers are reaching
memory usage around 530-540 MB before the first exceptions are raised.

[image: Screen Shot 2020-03-05 at 2.52.14 PM.png]


I am looking for suggestions on how to debug this issue. Some options I've
been exploring are:

   1. Setting up Cloud Stackdriver with Beam. I've found a guide to setting
   up Cloud Stackdriver with the Java Beam SDK (
   https://medium.com/google-cloud/profiling-dataflow-pipelines-ddbbef07761d),
   but haven't found instructions on how to set it up with Python.
   2. I noticed in the Beam pipeline options source that there is a flag
   for --profile_memory. If I specify this flag, how do I access the profile
   information?

Any suggestions or advice are welcome. Thank you!