You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ilya Shishkov (Jira)" <ji...@apache.org> on 2022/11/21 15:33:00 UTC

[jira] [Updated] (IGNITE-18209) Reduce binary metadata synchronization time for CDC through Kafka

     [ https://issues.apache.org/jira/browse/IGNITE-18209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ilya Shishkov updated IGNITE-18209:
-----------------------------------
    Description: 
Now there is a bottleneck in synchronized method {{KafkaToIgniteMetadataUpdater#updateMetadata}}:
# {{KafkaToIgniteCdcStreamer}} contains multiple {{KafkaToIgniteCdcStreamerApplier}} which shares _single_ {{KafkaToIgniteMetadataUpdater}}.
# All appliers handle corrsponding partitions consequently.
# {{META_UPDATE_MARKER}} is sent twice to each partition of event topic: firstly, in case of type mappings updates, secondly, in case of binary types update.
# When first {{KafkaToIgniteCdcStreamerApplier}} meets {{META_UPDATE_MARKER}} it calls {{KafkaToIgniteMetadataUpdater#updateMetadata}} which in turn calls {{KafkaConsumer#poll}}.
# {{KafkaConsumer#poll}} returns immediately [1] when there are data in metadata topic. If there are few binary types and mappings to update, some {{KafkaToIgniteCdcStreamerApplier}} thread will consume all entries from metadata topic.
# All other threads of all {{KafkaToIgniteCdcStreamerApplier}} will call {{KafkaConsumer#poll}} for empty metadata topic, which will be blocked until new data will become available or request timeout will occur [1].
# Because of {{synchronized}} access to {{KafkaToIgniteMetadataUpdater#updateMetadata}} all threads of all {{KafkaToIgniteCdcStreamerApplier}} will form a sequence of calls. Each call will block remaining applier threads for {{kafkaReqTimeout}} period (if metadata topic remains empty).
# The last call, i.e. part last partition update in this chain will happen after {{(partitionsCount x 2 - 1) x kafkaReqTimeout}}. For example for default timeout and 16 partitions _last partition will be consumed after approximately 1.5 minutes_. Amount of thread does not make sence.

> Reduce binary metadata synchronization time for CDC through Kafka
> -----------------------------------------------------------------
>
>                 Key: IGNITE-18209
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18209
>             Project: Ignite
>          Issue Type: Improvement
>          Components: extensions
>            Reporter: Ilya Shishkov
>            Priority: Minor
>              Labels: IEP-59, ise
>
> Now there is a bottleneck in synchronized method {{KafkaToIgniteMetadataUpdater#updateMetadata}}:
> # {{KafkaToIgniteCdcStreamer}} contains multiple {{KafkaToIgniteCdcStreamerApplier}} which shares _single_ {{KafkaToIgniteMetadataUpdater}}.
> # All appliers handle corrsponding partitions consequently.
> # {{META_UPDATE_MARKER}} is sent twice to each partition of event topic: firstly, in case of type mappings updates, secondly, in case of binary types update.
> # When first {{KafkaToIgniteCdcStreamerApplier}} meets {{META_UPDATE_MARKER}} it calls {{KafkaToIgniteMetadataUpdater#updateMetadata}} which in turn calls {{KafkaConsumer#poll}}.
> # {{KafkaConsumer#poll}} returns immediately [1] when there are data in metadata topic. If there are few binary types and mappings to update, some {{KafkaToIgniteCdcStreamerApplier}} thread will consume all entries from metadata topic.
> # All other threads of all {{KafkaToIgniteCdcStreamerApplier}} will call {{KafkaConsumer#poll}} for empty metadata topic, which will be blocked until new data will become available or request timeout will occur [1].
> # Because of {{synchronized}} access to {{KafkaToIgniteMetadataUpdater#updateMetadata}} all threads of all {{KafkaToIgniteCdcStreamerApplier}} will form a sequence of calls. Each call will block remaining applier threads for {{kafkaReqTimeout}} period (if metadata topic remains empty).
> # The last call, i.e. part last partition update in this chain will happen after {{(partitionsCount x 2 - 1) x kafkaReqTimeout}}. For example for default timeout and 16 partitions _last partition will be consumed after approximately 1.5 minutes_. Amount of thread does not make sence.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)