You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Konstantine Karantasis (Jira)" <ji...@apache.org> on 2020/08/18 19:11:00 UTC

[jira] [Updated] (KAFKA-10327) Make flush after some count of putted records in SinkTask

     [ https://issues.apache.org/jira/browse/KAFKA-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantine Karantasis updated KAFKA-10327:
-------------------------------------------
    Labels: kip-required needs-kip  (was: )

> Make flush after some count of putted records in SinkTask
> ---------------------------------------------------------
>
>                 Key: KAFKA-10327
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10327
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 2.5.0
>            Reporter: Pavel Kuznetsov
>            Priority: Major
>              Labels: kip-required, needs-kip
>
> In current version of kafka connect all records accumulated with SinkTask.put method are flushed to target system on a time-based manner. So data is flushed and offsets are committed every  offset.flush.timeout.ms (default is 60000) ms.
> But you can't control the number of messages you receive from Kafka between two flushes. It may cause out of memory errors, because in-memory buffer may grow a lot. 
> I suggest to add out of box support of count-based flush to kafka connect. It requires new configuration parameter (offset.flush.count, for example). Number of records sent to SinkTask.put should be counted, and if these amount is greater than offset.flush.count's value, SinkTask.flush is called and offsets are committed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)