You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "Almeida, Julius" <Ju...@intuit.com> on 2021/03/25 18:56:37 UTC

General guidance

Hi Team,

My streaming pipeline is based on beam & running using flink runner with rocksdb as state backend.

Over time I am  seeing memory spike & after giving a look at heap dump, I am seeing records in  ‘__StatefulParDoGcTimerId’ which seems to be never cleaned.

Found this jira https://issues.apache.org/jira/browse/BEAM-8212 describing the issue I believe I am facing.

Any pointers would be helpful in identifying possible solution.

Thanks,
Julius

Re: General guidance

Posted by "Almeida, Julius" <Ju...@intuit.com>.
Hi Team,

With issue faced in older version of beam with flink runner. I upgraded to beam 2.28 with flink 1.12.1 runner.

After running my pipeline for couple of days, I see that the state size is under control but the memory utilization has spike a lot.

Configs used :

From: "Almeida, Julius" <Ju...@intuit.com>
Reply-To: "user@beam.apache.org" <us...@beam.apache.org>
Date: Monday, March 29, 2021 at 8:11 AM
To: "user@beam.apache.org" <us...@beam.apache.org>
Subject: Re: General guidance

This email is from an external sender.

Hi Jan,

I am using Beam 2.23, but let me upgrade to 2.25 7 verify.

Thanks,
Julius

From: Jan Lukavský <je...@seznam.cz>
Reply-To: "user@beam.apache.org" <us...@beam.apache.org>
Date: Monday, March 29, 2021 at 12:56 AM
To: "user@beam.apache.org" <us...@beam.apache.org>
Subject: Re: General guidance

This email is from an external sender.


Hi Julius,

which version of Beam do you run? There has been a fix for 2.25.0 [1] which could address what you see.

 Jan

[1] https://issues.apache.org/jira/browse/BEAM-10760
On 3/25/21 8:11 PM, Kenneth Knowles wrote:
This is a Beam issue indeed, though it is an issue with the FlinkRunner. So I think I will BCC the Flink list.

You may be in one of the following situations:
 - These timers should not be viewed as distinct by the runner, but deduped, per https://issues.apache.org/jira/browse/BEAM-8212#comment-16946013
 - There is a different problem if you have an unbounded key space with windows that never expire, since then there are unbounded numbers of truly distinct (but irrelevant) timers. That is also the responsibility of the runner to simply not set timers that can never fire.

Kenn

On Thu, Mar 25, 2021 at 11:57 AM Almeida, Julius <Ju...@intuit.com>> wrote:
Hi Team,

My streaming pipeline is based on beam & running using flink runner with rocksdb as state backend.

Over time I am  seeing memory spike & after giving a look at heap dump, I am seeing records in  ‘__StatefulParDoGcTimerId’ which seems to be never cleaned.

Found this jira https://issues.apache.org/jira/browse/BEAM-8212 describing the issue I believe I am facing.

Any pointers would be helpful in identifying possible solution.

Thanks,
Julius

Re: General guidance

Posted by "Almeida, Julius" <Ju...@intuit.com>.
Hi Team,

With issue faced in older version of beam with flink runner. I upgraded to beam 2.28 with flink 1.12.1 runner.

I am using rocksdb state backend.

After running my pipeline for couple of days, I see that the state size is under control but the memory utilization has spike to full utilization. And fails with

java.io.IOException: Cannot allocate memory

Configs used :
Replicas: 15
CPUs: 2
Mem: 8Gi
Offset: 4

Thanks,
Julius

From: "Almeida, Julius" <Ju...@intuit.com>
Reply-To: "user@beam.apache.org" <us...@beam.apache.org>
Date: Monday, March 29, 2021 at 8:11 AM
To: "user@beam.apache.org" <us...@beam.apache.org>
Subject: Re: General guidance

This email is from an external sender.

Hi Jan,

I am using Beam 2.23, but let me upgrade to 2.25 7 verify.

Thanks,
Julius

From: Jan Lukavský <je...@seznam.cz>
Reply-To: "user@beam.apache.org" <us...@beam.apache.org>
Date: Monday, March 29, 2021 at 12:56 AM
To: "user@beam.apache.org" <us...@beam.apache.org>
Subject: Re: General guidance

This email is from an external sender.


Hi Julius,

which version of Beam do you run? There has been a fix for 2.25.0 [1] which could address what you see.

 Jan

[1] https://issues.apache.org/jira/browse/BEAM-10760
On 3/25/21 8:11 PM, Kenneth Knowles wrote:
This is a Beam issue indeed, though it is an issue with the FlinkRunner. So I think I will BCC the Flink list.

You may be in one of the following situations:
 - These timers should not be viewed as distinct by the runner, but deduped, per https://issues.apache.org/jira/browse/BEAM-8212#comment-16946013
 - There is a different problem if you have an unbounded key space with windows that never expire, since then there are unbounded numbers of truly distinct (but irrelevant) timers. That is also the responsibility of the runner to simply not set timers that can never fire.

Kenn

On Thu, Mar 25, 2021 at 11:57 AM Almeida, Julius <Ju...@intuit.com>> wrote:
Hi Team,

My streaming pipeline is based on beam & running using flink runner with rocksdb as state backend.

Over time I am  seeing memory spike & after giving a look at heap dump, I am seeing records in  ‘__StatefulParDoGcTimerId’ which seems to be never cleaned.

Found this jira https://issues.apache.org/jira/browse/BEAM-8212 describing the issue I believe I am facing.

Any pointers would be helpful in identifying possible solution.

Thanks,
Julius

Re: General guidance

Posted by "Almeida, Julius" <Ju...@intuit.com>.
Hi Jan,

I am using Beam 2.23, but let me upgrade to 2.25 7 verify.

Thanks,
Julius

From: Jan Lukavský <je...@seznam.cz>
Reply-To: "user@beam.apache.org" <us...@beam.apache.org>
Date: Monday, March 29, 2021 at 12:56 AM
To: "user@beam.apache.org" <us...@beam.apache.org>
Subject: Re: General guidance

This email is from an external sender.


Hi Julius,

which version of Beam do you run? There has been a fix for 2.25.0 [1] which could address what you see.

 Jan

[1] https://issues.apache.org/jira/browse/BEAM-10760
On 3/25/21 8:11 PM, Kenneth Knowles wrote:
This is a Beam issue indeed, though it is an issue with the FlinkRunner. So I think I will BCC the Flink list.

You may be in one of the following situations:
 - These timers should not be viewed as distinct by the runner, but deduped, per https://issues.apache.org/jira/browse/BEAM-8212#comment-16946013
 - There is a different problem if you have an unbounded key space with windows that never expire, since then there are unbounded numbers of truly distinct (but irrelevant) timers. That is also the responsibility of the runner to simply not set timers that can never fire.

Kenn

On Thu, Mar 25, 2021 at 11:57 AM Almeida, Julius <Ju...@intuit.com>> wrote:
Hi Team,

My streaming pipeline is based on beam & running using flink runner with rocksdb as state backend.

Over time I am  seeing memory spike & after giving a look at heap dump, I am seeing records in  ‘__StatefulParDoGcTimerId’ which seems to be never cleaned.

Found this jira https://issues.apache.org/jira/browse/BEAM-8212 describing the issue I believe I am facing.

Any pointers would be helpful in identifying possible solution.

Thanks,
Julius

Re: General guidance

Posted by Jan Lukavský <je...@seznam.cz>.
Hi Julius,

which version of Beam do you run? There has been a fix for 2.25.0 [1] 
which could address what you see.

  Jan

[1] https://issues.apache.org/jira/browse/BEAM-10760

On 3/25/21 8:11 PM, Kenneth Knowles wrote:
> This is a Beam issue indeed, though it is an issue with the 
> FlinkRunner. So I think I will BCC the Flink list.
>
> You may be in one of the following situations:
>  - These timers should not be viewed as distinct by the runner, but 
> deduped, per 
> https://issues.apache.org/jira/browse/BEAM-8212#comment-16946013 
> <https://issues.apache.org/jira/browse/BEAM-8212#comment-16946013>
>  - There is a different problem if you have an unbounded key space 
> with windows that never expire, since then there are unbounded numbers 
> of truly distinct (but irrelevant) timers. That is also the 
> responsibility of the runner to simply not set timers that can never fire.
>
> Kenn
>
> On Thu, Mar 25, 2021 at 11:57 AM Almeida, Julius 
> <Julius_Almeida@intuit.com <ma...@intuit.com>> wrote:
>
>     Hi Team,
>
>     My streaming pipeline is based on beam & running using flink
>     runner with rocksdb as state backend.
>
>     Over time I am  seeing memory spike & after giving a look at heap
>     dump, I am seeing records in  ‘__StatefulParDoGcTimerId’ which
>     seems to be never cleaned.
>
>     Found this jira https://issues.apache.org/jira/browse/BEAM-8212
>     <https://issues.apache.org/jira/browse/BEAM-8212> describing the
>     issue I believe I am facing.
>
>     Any pointers would be helpful in identifying possible solution.
>
>     Thanks,
>
>     Julius
>

Re: General guidance

Posted by Kenneth Knowles <ke...@apache.org>.
This is a Beam issue indeed, though it is an issue with the FlinkRunner. So
I think I will BCC the Flink list.

You may be in one of the following situations:
 - These timers should not be viewed as distinct by the runner, but
deduped, per
https://issues.apache.org/jira/browse/BEAM-8212#comment-16946013
 - There is a different problem if you have an unbounded key space with
windows that never expire, since then there are unbounded numbers of truly
distinct (but irrelevant) timers. That is also the responsibility of the
runner to simply not set timers that can never fire.

Kenn

On Thu, Mar 25, 2021 at 11:57 AM Almeida, Julius <Ju...@intuit.com>
wrote:

> Hi Team,
>
>
>
> My streaming pipeline is based on beam & running using flink runner with
> rocksdb as state backend.
>
>
>
> Over time I am  seeing memory spike & after giving a look at heap dump, I
> am seeing records in  ‘__StatefulParDoGcTimerId’ which seems to be never
> cleaned.
>
>
>
> Found this jira https://issues.apache.org/jira/browse/BEAM-8212
> describing the issue I believe I am facing.
>
>
>
> Any pointers would be helpful in identifying possible solution.
>
>
>
> Thanks,
>
> Julius
>

Re: General guidance

Posted by Kenneth Knowles <ke...@apache.org>.
This is a Beam issue indeed, though it is an issue with the FlinkRunner. So
I think I will BCC the Flink list.

You may be in one of the following situations:
 - These timers should not be viewed as distinct by the runner, but
deduped, per
https://issues.apache.org/jira/browse/BEAM-8212#comment-16946013
 - There is a different problem if you have an unbounded key space with
windows that never expire, since then there are unbounded numbers of truly
distinct (but irrelevant) timers. That is also the responsibility of the
runner to simply not set timers that can never fire.

Kenn

On Thu, Mar 25, 2021 at 11:57 AM Almeida, Julius <Ju...@intuit.com>
wrote:

> Hi Team,
>
>
>
> My streaming pipeline is based on beam & running using flink runner with
> rocksdb as state backend.
>
>
>
> Over time I am  seeing memory spike & after giving a look at heap dump, I
> am seeing records in  ‘__StatefulParDoGcTimerId’ which seems to be never
> cleaned.
>
>
>
> Found this jira https://issues.apache.org/jira/browse/BEAM-8212
> describing the issue I believe I am facing.
>
>
>
> Any pointers would be helpful in identifying possible solution.
>
>
>
> Thanks,
>
> Julius
>