You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Lydian <ly...@gmail.com> on 2024/02/07 19:39:49 UTC

Monitor and limit Beam Harness Memory usage with Flink Runner

Hi,

I found our flink taskmanager is likely to crash due to the python harness
being no longer reachable.

However, it seems like the Beam harness is not the child process of flink
taskmanager process, and thus the flink metrics monitor is unable to report
the usage of the memory usage by the Beam SDK harness (either Java or
Python). Which makes me unable to further debug on the issue. Wondering if
there's a way to monitor the beam memory usage especially for those harness
processes?

Also, it is very likely that the disconnect could potentially result from
OOM. If that is the case, what is the best way to limit the resource usage
by the harness? I noticed there's a resource hint
<https://beam.apache.org/documentation/runtime/resource-hints/>, but it
also mentioned that not all runners will honor that setting, but I couldn't
find anything mentioning in flink runner related to the resource hint.
Wondering if that is the best way for us to fix the memory usage or is
there any other approach that we can do to avoid the OOM on python task
runs on flink runner? Thanks
Sincerely,
Lydian Lee

Re: Monitor and limit Beam Harness Memory usage with Flink Runner

Posted by Lydian <ly...@gmail.com>.
We are actually in an even older version: 2.41.0 and flink  1.14.5

Sincerely,
Lydian Lee



On Wed, Feb 7, 2024 at 12:40 PM Valentyn Tymofieiev via user <
user@beam.apache.org> wrote:

> Hi Lydian,
>
> note that there was a memory leak in certain versions of Beam:
> https://github.com/apache/beam/issues/28246 . Make sure you use a newer
> version. You might also find some of the debugging pointers useful.
>
> To my knowledge flink runner didn't implement resource hints support for
> min_ram hint. Also the intent of that hint is to specify the lower bound
> rather than the upper bound.
>
> On Wed, Feb 7, 2024 at 11:49 AM Lydian <ly...@gmail.com> wrote:
>
>> Hi,
>>
>> I found our flink taskmanager is likely to crash due to the python
>> harness being no longer reachable.
>>
>> However, it seems like the Beam harness is not the child process of flink
>> taskmanager process, and thus the flink metrics monitor is unable to report
>> the usage of the memory usage by the Beam SDK harness (either Java or
>> Python). Which makes me unable to further debug on the issue. Wondering if
>> there's a way to monitor the beam memory usage especially for those harness
>> processes?
>>
>> Also, it is very likely that the disconnect could potentially result from
>> OOM. If that is the case, what is the best way to limit the resource usage
>> by the harness? I noticed there's a resource hint
>> <https://beam.apache.org/documentation/runtime/resource-hints/>, but it
>> also mentioned that not all runners will honor that setting, but I couldn't
>> find anything mentioning in flink runner related to the resource hint.
>> Wondering if that is the best way for us to fix the memory usage or is
>> there any other approach that we can do to avoid the OOM on python task
>> runs on flink runner? Thanks
>> Sincerely,
>> Lydian Lee
>>
>>

Re: Monitor and limit Beam Harness Memory usage with Flink Runner

Posted by Valentyn Tymofieiev via user <us...@beam.apache.org>.
Hi Lydian,

note that there was a memory leak in certain versions of Beam:
https://github.com/apache/beam/issues/28246 . Make sure you use a newer
version. You might also find some of the debugging pointers useful.

To my knowledge flink runner didn't implement resource hints support for
min_ram hint. Also the intent of that hint is to specify the lower bound
rather than the upper bound.

On Wed, Feb 7, 2024 at 11:49 AM Lydian <ly...@gmail.com> wrote:

> Hi,
>
> I found our flink taskmanager is likely to crash due to the python harness
> being no longer reachable.
>
> However, it seems like the Beam harness is not the child process of flink
> taskmanager process, and thus the flink metrics monitor is unable to report
> the usage of the memory usage by the Beam SDK harness (either Java or
> Python). Which makes me unable to further debug on the issue. Wondering if
> there's a way to monitor the beam memory usage especially for those harness
> processes?
>
> Also, it is very likely that the disconnect could potentially result from
> OOM. If that is the case, what is the best way to limit the resource usage
> by the harness? I noticed there's a resource hint
> <https://beam.apache.org/documentation/runtime/resource-hints/>, but it
> also mentioned that not all runners will honor that setting, but I couldn't
> find anything mentioning in flink runner related to the resource hint.
> Wondering if that is the best way for us to fix the memory usage or is
> there any other approach that we can do to avoid the OOM on python task
> runs on flink runner? Thanks
> Sincerely,
> Lydian Lee
>
>