You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ayush Chauhan <ay...@zomato.com> on 2021/04/22 05:51:07 UTC

Debezium CDC | OOM

Hi,
I am using flink cdc to stream CDC changes in an iceberg table. When I
first run the flink job for a topic which has all the data for a table, it
get out of heap memory as flink try to load all the data during my 15mins
checkpointing interval. Right now, only solution I have is to pass *-ytm
8192 -yjm 2048m* for a table with 10M rows and then reduce it after flink
has consumed all the data. Is there a way to tell flink cdc code to trigger
checkpoint or throttle the consumption speed(I think backpressure should
have handled this)?

-- 
 Ayush Chauhan
 Software Engineer | Data Platform
 [image: mobile-icon]  +91 9990747111

-- 












This email is intended only for the person or the entity to 
whom it is addressed. If you are not the intended recipient, please delete 
this email and contact the sender.


Re: Debezium CDC | OOM

Posted by Matthias Pohl <ma...@ververica.com>.
Got it. Thanks for clarifying.

On Fri, Apr 23, 2021 at 6:36 AM Ayush Chauhan <ay...@zomato.com>
wrote:

> Hi Matthias,
>
> I am using RocksDB as a state backend. I think the iceberg sink is not
> able to propagate back pressure to the source which is resulting in OOM for
> my CDC pipeline.
> Please refer to this - https://github.com/apache/iceberg/issues/2504
>
>
>
> On Thu, Apr 22, 2021 at 8:44 PM Matthias Pohl <ma...@ververica.com>
> wrote:
>
>> Hi Ayush,
>> Which state backend have you configured [1]? Have you considered trying
>> out RocksDB [2]? RocksDB might help with persisting at least keyed state.
>>
>> Best,
>> Matthias
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#choose-the-right-state-backend
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend
>>
>> On Thu, Apr 22, 2021 at 7:52 AM Ayush Chauhan <ay...@zomato.com>
>> wrote:
>>
>>> Hi,
>>> I am using flink cdc to stream CDC changes in an iceberg table. When I
>>> first run the flink job for a topic which has all the data for a table, it
>>> get out of heap memory as flink try to load all the data during my 15mins
>>> checkpointing interval. Right now, only solution I have is to pass *-ytm
>>> 8192 -yjm 2048m* for a table with 10M rows and then reduce it after
>>> flink has consumed all the data. Is there a way to tell flink cdc code to
>>> trigger checkpoint or throttle the consumption speed(I think backpressure
>>> should have handled this)?
>>>
>>> --
>>>  Ayush Chauhan
>>>  Software Engineer | Data Platform
>>>  [image: mobile-icon]  +91 9990747111
>>>
>>>
>>> This email is intended only for the person or the entity to whom it is
>>> addressed. If you are not the intended recipient, please delete this email
>>> and contact the sender.
>>>
>>
>
> --
>  Ayush Chauhan
>  Software Engineer | Data Platform
>  [image: mobile-icon]  +91 9990747111
>
>
> This email is intended only for the person or the entity to whom it is
> addressed. If you are not the intended recipient, please delete this email
> and contact the sender.
>

Re: Debezium CDC | OOM

Posted by Ayush Chauhan <ay...@zomato.com>.
Hi Matthias,

I am using RocksDB as a state backend. I think the iceberg sink is not able
to propagate back pressure to the source which is resulting in OOM for my
CDC pipeline.
Please refer to this - https://github.com/apache/iceberg/issues/2504



On Thu, Apr 22, 2021 at 8:44 PM Matthias Pohl <ma...@ververica.com>
wrote:

> Hi Ayush,
> Which state backend have you configured [1]? Have you considered trying
> out RocksDB [2]? RocksDB might help with persisting at least keyed state.
>
> Best,
> Matthias
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#choose-the-right-state-backend
> [2]
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend
>
> On Thu, Apr 22, 2021 at 7:52 AM Ayush Chauhan <ay...@zomato.com>
> wrote:
>
>> Hi,
>> I am using flink cdc to stream CDC changes in an iceberg table. When I
>> first run the flink job for a topic which has all the data for a table, it
>> get out of heap memory as flink try to load all the data during my 15mins
>> checkpointing interval. Right now, only solution I have is to pass *-ytm
>> 8192 -yjm 2048m* for a table with 10M rows and then reduce it after
>> flink has consumed all the data. Is there a way to tell flink cdc code to
>> trigger checkpoint or throttle the consumption speed(I think backpressure
>> should have handled this)?
>>
>> --
>>  Ayush Chauhan
>>  Software Engineer | Data Platform
>>  [image: mobile-icon]  +91 9990747111
>>
>>
>> This email is intended only for the person or the entity to whom it is
>> addressed. If you are not the intended recipient, please delete this email
>> and contact the sender.
>>
>

-- 
 Ayush Chauhan
 Software Engineer | Data Platform
 [image: mobile-icon]  +91 9990747111

-- 












This email is intended only for the person or the entity to 
whom it is addressed. If you are not the intended recipient, please delete 
this email and contact the sender.


Re: Debezium CDC | OOM

Posted by Matthias Pohl <ma...@ververica.com>.
Hi Ayush,
Which state backend have you configured [1]? Have you considered trying out
RocksDB [2]? RocksDB might help with persisting at least keyed state.

Best,
Matthias

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#choose-the-right-state-backend
[2]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#the-rocksdbstatebackend

On Thu, Apr 22, 2021 at 7:52 AM Ayush Chauhan <ay...@zomato.com>
wrote:

> Hi,
> I am using flink cdc to stream CDC changes in an iceberg table. When I
> first run the flink job for a topic which has all the data for a table, it
> get out of heap memory as flink try to load all the data during my 15mins
> checkpointing interval. Right now, only solution I have is to pass *-ytm
> 8192 -yjm 2048m* for a table with 10M rows and then reduce it after flink
> has consumed all the data. Is there a way to tell flink cdc code to trigger
> checkpoint or throttle the consumption speed(I think backpressure should
> have handled this)?
>
> --
>  Ayush Chauhan
>  Software Engineer | Data Platform
>  [image: mobile-icon]  +91 9990747111
>
>
> This email is intended only for the person or the entity to whom it is
> addressed. If you are not the intended recipient, please delete this email
> and contact the sender.
>