You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Alexandr Dmitrov <ca...@gmail.com> on 2023/03/03 12:11:52 UTC

debezium/flink-oracle-cdc-connector large memory consumption for transactions

Hello! Maybe you could help with debezium/vervica connectors in optimizing
memory consumption for large changes in a single transaction using vervica
flink-oracle-cdc-connector.  
  
Environment:  
flink - 1.15.1  
oracle-cdc - 2.3  
oracle - 19.3

  
When updating large number of rows in a single transaction I get the
exception:  
"ERROR io.debezium.connector.oracle.logminer.LogMinerHelper         [] -
Mining session stopped due to the java.lang.OutOfMemoryError: Java heap
space".  
  
For the table with 20 fields of types (int, float, timestamp, date, string)*4
I could get the results listed in table:  
  
For TaskManager heap space=540Mi, I could get 250k rows through, but failed on
300k.  
For TaskManager heap space=960Mi, I could get 350k rows through, but failed on
400k.  
  
Using SourceFunction created from OracleSource.builder(), startup-mode set to
latest-offset,  
and self-defined deserializer, where I added [log.info](http://log.info/) to
see if the problem starts after debezium/cdc-connector did it work, but Java
Heap Space error occurs before the flow gets to the deserialization.  
  
I've tried to provide the next debezium properties, but got no luck:  
  
dbzProps.setProperty("log.mining.batch.size.max", "10000");  
dbzProps.setProperty("log.mining.batch.size.default", "2000");  
dbzProps.setProperty("log.mining.batch.size.min", "100");  
dbzProps.setProperty("log.mining.view.fetch.size", "1000");  
dbzProps.setProperty("max.batch.size", "64");  
dbzProps.setProperty("max.queue.size", "256");

  
After some digging, I could find places in code where debezium consume changes
from oracle:  
Loop to fetch records:  
[https://github.com/debezium/debezium/blob/1.6/debezium-connector-
oracle/src/main/java/io/debezium/connector/oracle/logminer/LogMinerStreamingChangeEventSource.java#L154](https://github.com/debezium/debezium/blob/1.6/debezium-
connector-
oracle/src/main/java/io/debezium/connector/oracle/logminer/LogMinerStreamingChangeEventSource.java#L154)  
  
Inserts/Updates/Deletes records are registered in transaction buffer:  
[https://github.com/debezium/debezium/blob/1.6/debezium-connector-
oracle/src/main/java/io/debezium/connector/oracle/logminer/LogMinerQueryResultProcessor.java#L267](https://github.com/debezium/debezium/blob/1.6/debezium-
connector-
oracle/src/main/java/io/debezium/connector/oracle/logminer/LogMinerQueryResultProcessor.java#L267)  
  
Commit records cause all events in transaction buffer to be commited - sent
forward to dispatcher, ending in batch handler:  
[https://github.com/debezium/debezium/blob/1.6/debezium-connector-
oracle/src/main/java/io/debezium/connector/oracle/logminer/LogMinerQueryResultProcessor.java#L144](https://github.com/debezium/debezium/blob/1.6/debezium-
connector-
oracle/src/main/java/io/debezium/connector/oracle/logminer/LogMinerQueryResultProcessor.java#L144)  
  
As far as I see, the cause of the problem is that debezium stores all changes
locally before the commit of the transaction occurs.  
  
Is there a way to make the connector split a large amount of changes
processing for a single transaction? Or maybe any other way to get rid of the
large memory consumption problem?




Re: debezium/flink-oracle-cdc-connector large memory consumption for transactions

Posted by Антон <an...@yandex.ru>.
Hello,

  

Try Infinispan -
https://debezium.io/documentation/reference/stable/connectors/oracle.html#oracle-
event-buffering  
  

15:12, 3 марта 2023 г., Alexandr Dmitrov <ca...@gmail.com>:  

> Hello! Maybe you could help with debezium/vervica connectors in optimizing
> memory consumption for large changes in a single transaction using vervica
> flink-oracle-cdc-connector.  
>  
> Environment:  
> flink - 1.15.1  
> oracle-cdc - 2.3  
> oracle - 19.3
>
>  
> When updating large number of rows in a single transaction I get the
> exception:  
> "ERROR io.debezium.connector.oracle.logminer.LogMinerHelper [] - Mining
> session stopped due to the java.lang.OutOfMemoryError: Java heap space".  
>  
> For the table with 20 fields of types (int, float, timestamp, date,
> string)*4 I could get the results listed in table:  
>  
> For TaskManager heap space=540Mi, I could get 250k rows through, but failed
> on 300k.  
> For TaskManager heap space=960Mi, I could get 350k rows through, but failed
> on 400k.  
>  
> Using SourceFunction created from OracleSource.builder(), startup-mode set
> to latest-offset,  
> and self-defined deserializer, where I added [log.info](http://log.info/) to
> see if the problem starts after debezium/cdc-connector did it work, but Java
> Heap Space error occurs before the flow gets to the deserialization.  
>  
> I've tried to provide the next debezium properties, but got no luck:  
>  
> dbzProps.setProperty("log.mining.batch.size.max", "10000");  
> dbzProps.setProperty("log.mining.batch.size.default", "2000");  
> dbzProps.setProperty("log.mining.batch.size.min", "100");  
> dbzProps.setProperty("log.mining.view.fetch.size", "1000");  
> dbzProps.setProperty("max.batch.size", "64");  
> dbzProps.setProperty("max.queue.size", "256");
>
>  
> After some digging, I could find places in code where debezium consume
> changes from oracle:  
> Loop to fetch records:  
> [https://github.com/debezium/debezium/blob/1.6/debezium-connector-
> oracle/src/main/java/io/debezium/connector/oracle/logminer/LogMinerStreamingChangeEventSource.java#L154](https://github.com/debezium/debezium/blob/1.6/debezium-
> connector-
> oracle/src/main/java/io/debezium/connector/oracle/logminer/LogMinerStreamingChangeEventSource.java#L154)  
>  
> Inserts/Updates/Deletes records are registered in transaction buffer:  
> [https://github.com/debezium/debezium/blob/1.6/debezium-connector-
> oracle/src/main/java/io/debezium/connector/oracle/logminer/LogMinerQueryResultProcessor.java#L267](https://github.com/debezium/debezium/blob/1.6/debezium-
> connector-
> oracle/src/main/java/io/debezium/connector/oracle/logminer/LogMinerQueryResultProcessor.java#L267)  
>  
> Commit records cause all events in transaction buffer to be commited - sent
> forward to dispatcher, ending in batch handler:  
> [https://github.com/debezium/debezium/blob/1.6/debezium-connector-
> oracle/src/main/java/io/debezium/connector/oracle/logminer/LogMinerQueryResultProcessor.java#L144](https://github.com/debezium/debezium/blob/1.6/debezium-
> connector-
> oracle/src/main/java/io/debezium/connector/oracle/logminer/LogMinerQueryResultProcessor.java#L144)  
>  
> As far as I see, the cause of the problem is that debezium stores all
> changes locally before the commit of the transaction occurs.  
>  
> Is there a way to make the connector split a large amount of changes
> processing for a single transaction? Or maybe any other way to get rid of
> the large memory consumption problem?

  
  
Sent from Yandex.Mail for mobile: http://m.ya.ru/ymail