You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ignite.apache.org by ni...@apache.org on 2022/03/09 19:57:18 UTC

[ignite] branch master updated: IGNITE-16246 CDC extensions documentation (#9874)

This is an automated email from the ASF dual-hosted git repository.

nizhikov pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ignite.git


The following commit(s) were added to refs/heads/master by this push:
     new 62669d7  IGNITE-16246 CDC extensions documentation (#9874)
62669d7 is described below

commit 62669d7773947066796dbbaa11964870215c63dc
Author: Nikolay <ni...@apache.org>
AuthorDate: Wed Mar 9 22:56:40 2022 +0300

    IGNITE-16246 CDC extensions documentation (#9874)
---
 .../change-data-capture-extensions.adoc            | 193 +++++++++++++++++++++
 docs/_docs/persistence/change-data-capture.adoc    |  11 +-
 .../images/integrations/CDC-ignite2ignite.svg      |   4 +
 .../images/integrations/CDC-ignite2kafka.svg       |   4 +
 4 files changed, 209 insertions(+), 3 deletions(-)

diff --git a/docs/_docs/persistence/change-data-capture-extensions.adoc b/docs/_docs/persistence/change-data-capture-extensions.adoc
new file mode 100644
index 0000000..c40cae1
--- /dev/null
+++ b/docs/_docs/persistence/change-data-capture-extensions.adoc
@@ -0,0 +1,193 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+= Change Data Capture Extension
+
+WARNING: CDC is an experimental feature. API or design architecture might be changed.
+
+== Overview
+link:https://github.com/apache/ignite-extensions/tree/master/modules/cdc-ext[Change Data Capture Extension] module provides two ways to set up cross cluster replication based on CDC.
+
+. link:https://github.com/apache/ignite-extensions/blob/master/modules/cdc-ext/src/main/java/org/apache/ignite/cdc/IgniteToIgniteCdcStreamer.java[Ignite2IgniteCdcStreamer] - streams changes to destination cluster using client node.
+. link:https://github.com/apache/ignite-extensions/blob/master/modules/cdc-ext/src/main/java/org/apache/ignite/cdc/kafka/IgniteToKafkaCdcStreamer.java[Ignite2KafkaCdcStreamer] combined with link:https://github.com/apache/ignite-extensions/blob/master/modules/cdc-ext/src/main/java/org/apache/ignite/cdc/kafka/KafkaToIgniteCdcStreamer.java[KafkaToIgniteCdcStreamer] streams changes to destination cluster using link:https://kafka.apache.org[Apache Kafka] as a transport.
+
+NOTE: For each cache replicated between clusters link:https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/version/CacheVersionConflictResolver.java[CacheVersionConflictResolver] should be defined.
+
+
+== Ignite to Ignite CDC streamer
+This streamer starts client node which connects to destination cluster.
+After connection is established, all changes captured by CDC will be replicated to destination cluster.
+
+NOTE: Instances of `ignite-cdc.sh` with configured streamer should be started on each server node of source cluster to capture all changes.
+
+image:../../assets/images/integrations/CDC-ignite2ignite.svg[]
+
+== Configuration
+
+[cols="20%,45%,35%",opts="header"]
+|===
+|Name |Description | Default value
+| `caches` | Set of cache names to replicate. | null
+| `destinationIgniteConfiguration` | Ignite configuration of client nodes that will connect to destination cluster to replicate changes. | null
+| `onlyPrimary` | Flag to handle changes only on primary node. | `false`
+| `maxBatchSize` | Maximum number of events to be sent to destination cluster in a single batch. | 1024
+|===
+
+== Metrics
+
+|===
+|Name |Description
+| `EventsCount` | Count of messages applied to destination cluster.
+| `LastEventTime` | Timestamp of last applied event.
+|===
+
+== CDC replication using Kafka
+
+This way to replicate changes between clusters requires setting up two applications:
+
+. `ignite-cdc.sh` with `org.apache.ignite.cdc.kafka.IgniteToKafkaCdcStreamer` that will capture changes from source cluster and write it to Kafka topic.
+. `kafka-to-ignite.sh` that will read changes from Kafka topic and then write them to destination cluster.
+
+NOTE: Instances of `ignite-cdc.sh` with configured streamer should be started on each server node of source cluster to capture all changes.
+
+image:../../assets/images/integrations/CDC-ignite2kafka.svg[]
+
+=== IgniteToKafkaCdcStreamer Configuration
+
+[cols="20%,45%,35%",opts="header"]
+|===
+|Name |Description | Default value
+| `caches` | Set of cache names to replicate. | null
+| `kafkaProperties` | Kafka producer properties. | null
+| `topic` | Name of the Kafka topic. | null
+| `kafkaParts` | Number of Kafka topic partitions. | null
+| `onlyPrimary` | Flag to handle changes only on primary node. | `false`
+| `maxBatchSize` | Maximum size of concurrently produced Kafka records. When streamer reaches this number, it waits for Kafka acknowledgements, and then commits CDC offset. | `1024`
+| `kafkaRequestTimeout` | Kafka request timeout in milliseconds.  | `3000`
+|===
+
+=== IgniteToKafkaCdcStreamer Metrics
+
+|===
+|Name |Description
+| `EventsCount` | Count of messages applied to destination cluster.
+| `LastEventTime` | Timestamp of last applied event.
+| `BytesSent` | Number of bytes send to Kafka.
+|===
+
+=== `kafka-to-ignite.sh` application
+
+This application should be started near the destination cluster.
+`kafka-to-ignite.sh` will read CDC events from Kafka topic and then apply them to destination cluster.
+
+IMPORTANT: `kafka-to-ignite.sh` implements the fail-fast approach. It just fails in case of any error. The restart procedure should be configured with the OS tools.
+
+Count of instances of the application does not corellate to the count of destination server nodes.
+It should be just enough to process source cluster load.
+Each instance of application will process configured subset of topic partitions to spread the load.
+`KafkaConsumer` for each partition will be created to ensure fair reads.
+
+==== Installation
+
+. Build `cdc-ext` module with maven:
++
+```console
+  $~/src/ignite-extensions/> mvn clean package -DskipTests
+  $~/src/ignite-extensions/> ls modules/cdc-ext/target | grep zip
+ignite-cdc-ext.zip
+```
+
+. Unpack `ignite-cdc-ext.zip` archive to `$IGNITE_HOME` folder.
+
+Now, you have additional binary `$IGNITE_HOME/bin/kafka-to-ignite.sh` and `$IGNITE_HOME/libs/optional/ignite-cdc-ext` module.
+
+NOTE: Please, enable `ignite-cdc-ext` to be able to run `kafka-to-ignite.sh`.
+
+==== Configuration
+
+Application configuration should be done using POJO classes or Spring xml file like regular Ignite node configuration.
+Kafka to ignite configuration file should contain the following beans that will be loaded during startup:
+
+. `IgniteConfiguration` bean: Configuration of the client node that will connect to the destination cluster.
+. `java.util.Properties` bean with the name `kafkaProperties`: Single Kafka consumer configuration.
+. `org.apache.ignite.cdc.kafka.KafkaToIgniteCdcStreamerConfiguration` bean: Options specific to `kafka-to-ignite.sh` application.
+
+[cols="20%,45%,35%",opts="header"]
+|===
+|Name |Description | Default value
+| `caches` | Set of cache names to replicate. | null
+| `topic` | Name of the Kafka topic. | null
+| `kafkaPartsFrom` | Lower Kafka partitions number (inclusive). | -1
+| `kafkaPartsTo` | Lower Kafka partitions number (exclusive). | -1
+| `kafkaRequestTimeout` | Kafka request timeout in milliseconds.  | `3000`
+| `maxBatchSize` | Maximum number of events to be sent to destination cluster in a single batch. | 1024
+| `threadCount` | Count of threads to proceed consumers. Each thread poll records from dedicated partitions in round-robin manner. | 16
+|===
+
+==== Logging
+
+`kakfa-to-ignite.sh` uses the same logging configuration as the Ignite node does. The only difference is that the log is written in the "kafka-ignite-streamer.log" file.
+
+== CacheVersionConflictResolver implementation
+
+It expected that CDC streamers will be configured with the `onlyPrimary=false` in most real-world deployments to ensure fault-tolerance.
+That means streamer will send the same change several times equal to `CacheConfiguration#backups` + 1.
+At the same time concurrent updates of the same key can be done in replicated clusters.
+`CacheVersionConflictResolver` used by Ignite node to selects or merge new (from update request) and existing (stored in the cluster) entry versions.
+Selected entry version will be actually stored in the cluster.
+
+NOTE: Default implementation only select correct entry and never merge.
+
+link:https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/version/CacheVersionConflictResolver.java[CacheVersionConflictResolver] should be defined for each cache replicated between clusters.
+
+Default link:https://github.com/apache/ignite-extensions/blob/master/modules/cdc-ext/src/main/java/org/apache/ignite/cdc/conflictresolve/CacheVersionConflictResolverImpl.java[implementation] is available in cdc-ext.
+
+==== Configuration
+
+[cols="20%,45%,35%",opts="header"]
+|===
+|Name |Description | Default value
+| `clusterId` | Local cluster id. Can be any value from 1 to 31. | null
+| `caches` | Set of cache names to handle with this plugin instance. | null
+| `conflictResolveField` | Value field to resolve conflict with. Optional. Field values must implement `java.lang.Comparable`. | null
+|===
+
+==== Conflict resolve algorithm
+
+Replicated changes contain some additional data. Specifically, entry version from source cluster supplied with the changed data.
+Default conflict resolve algorithm based on entry version and `conflictResolveField`.
+Conflict resolution field should contain user provided monotonically increasing value such as query id or timestamp.
+
+. Changes from the "local" cluster always win.
+. If both old and new entry from the same cluster version comparison used to determine order.
+. If `conflictResolveField` if provided then field values comparison used to determine order.
+. Conflict resolution failed. Update will be ignored.
+
+==== Configuration example
+Configuration is done via Ignite node plugin:
+
+```xml
+<property name="pluginProviders">
+    <bean class="org.apache.ignite.cdc.conflictresolve.CacheVersionConflictResolverPluginProvider">
+        <property name="clusterId" value="1" />
+        <property name="caches">
+            <util:list>
+                <bean class="java.lang.String">
+                    <constructor-arg type="String" value="queryId" />
+                </bean>
+            </util:list>
+        </property>
+    </bean>
+</property>
+```
\ No newline at end of file
diff --git a/docs/_docs/persistence/change-data-capture.adoc b/docs/_docs/persistence/change-data-capture.adoc
index a5e0fb2..7e79831 100644
--- a/docs/_docs/persistence/change-data-capture.adoc
+++ b/docs/_docs/persistence/change-data-capture.adoc
@@ -18,9 +18,9 @@
 == Overview
 Change Data Capture (link:https://en.wikipedia.org/wiki/Change_data_capture[CDC]) is a data processing pattern used to asynchronously receive entries that have been changed on the local node so that action can be taken using the changed entry.
 
-WARNING: CDC is an experimental feature whose API or design architecture might be changed.
+WARNING: CDC is an experimental feature. API or design architecture might be changed.
 
-Below are some of the CDC use cases:
+Below are some CDC use cases:
 
  * Streaming changes in Warehouse;
  * Updating search index;
@@ -129,4 +129,9 @@ IMPORTANT: `ignite-cdc.sh` implements the fail-fast approach. It just fails in c
  3. Load the saved state.
  4. Start the consumer.
  5. Infinitely wait for the newly available segment and process it.
- 6. Stop the consumer in case of a failure or a received stop signal.
\ No newline at end of file
+ 6. Stop the consumer in case of a failure or a received stop signal.
+
+== cdc-ext
+
+Ignite extensions project has link:https://github.com/apache/ignite-extensions/tree/master/modules/cdc-ext[cdc-ext] module which provides two way to setup cross cluster replication based on CDC.
+Detailed documentation can be found on link:../change-data-capture-extensions.adoc[page]
\ No newline at end of file
diff --git a/docs/assets/images/integrations/CDC-ignite2ignite.svg b/docs/assets/images/integrations/CDC-ignite2ignite.svg
new file mode 100644
index 0000000..5c41781
--- /dev/null
+++ b/docs/assets/images/integrations/CDC-ignite2ignite.svg
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Do not edit this file with editors other than diagrams.net -->
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="881px" height="461px" viewBox="-0.5 -0.5 881 461" content="&lt;mxfile host=&quot;app.diagrams.net&quot; modified=&quot;2022-03-05T10:09:02.090Z&quot; agent=&quot;5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 YaBrowser/22.1.0.2500 Yowser/2.5 Safari/537.36&quot; etag=&quot;E2NiiJo0d_kAqKrUVduG&quot; version=&quot;16.1.0&quot; type=&qu [...]
\ No newline at end of file
diff --git a/docs/assets/images/integrations/CDC-ignite2kafka.svg b/docs/assets/images/integrations/CDC-ignite2kafka.svg
new file mode 100644
index 0000000..e006cc9
--- /dev/null
+++ b/docs/assets/images/integrations/CDC-ignite2kafka.svg
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!-- Do not edit this file with editors other than diagrams.net -->
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="1191px" height="381px" viewBox="-0.5 -0.5 1191 381" content="&lt;mxfile host=&quot;app.diagrams.net&quot; modified=&quot;2022-03-05T15:37:04.505Z&quot; agent=&quot;5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 YaBrowser/22.1.0.2500 Yowser/2.5 Safari/537.36&quot; etag=&quot;n7weODi2NYc3QXqS0Kaa&quot; version=&quot;16.6.6&quot; type=& [...]
\ No newline at end of file