You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ignite.apache.org by av...@apache.org on 2022/09/20 10:38:53 UTC

[ignite] branch master updated: IGNITE-17679 Consistency check/repair and Read Repair documentation update (#10254)

This is an automated email from the ASF dual-hosted git repository.

av pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ignite.git


The following commit(s) were added to refs/heads/master by this push:
     new dd6d8eff4cd IGNITE-17679 Consistency check/repair and Read Repair documentation update (#10254)
dd6d8eff4cd is described below

commit dd6d8eff4cd55a04f4170bd75f52fda913aa0075
Author: Anton Vinogradov <av...@apache.org>
AuthorDate: Tue Sep 20 13:38:43 2022 +0300

    IGNITE-17679 Consistency check/repair and Read Repair documentation update (#10254)
---
 docs/_data/toc.yaml                                |   2 +-
 docs/_docs/clustering/connect-client-nodes.adoc    |   2 +-
 .../ignite/snippets/BasicCacheOperations.java      |   5 +-
 docs/_docs/key-value-api/read-repair.adoc          |  99 +++++++++++
 docs/_docs/read-repair.adoc                        |  56 -------
 docs/_docs/tools/control-script.adoc               | 183 ++++++++++++---------
 .../main/java/org/apache/ignite/IgniteCache.java   |  13 +-
 .../apache/ignite/cache/ReadRepairStrategy.java    |   8 +-
 8 files changed, 219 insertions(+), 149 deletions(-)

diff --git a/docs/_data/toc.yaml b/docs/_data/toc.yaml
index 05e679924b2..55feaa956a2 100644
--- a/docs/_data/toc.yaml
+++ b/docs/_data/toc.yaml
@@ -182,7 +182,7 @@
     - title: Using Cache Queries
       url: key-value-api/using-cache-queries
     - title: Read Repair
-      url: read-repair
+      url: key-value-api/read-repair
 - title: Performing Transactions
   url: key-value-api/transactions
 - title: Working with SQL
diff --git a/docs/_docs/clustering/connect-client-nodes.adoc b/docs/_docs/clustering/connect-client-nodes.adoc
index 7373ed70e69..a6bb7bf7129 100644
--- a/docs/_docs/clustering/connect-client-nodes.adoc
+++ b/docs/_docs/clustering/connect-client-nodes.adoc
@@ -76,7 +76,7 @@ There are two discovery events that are triggered on the client node when it is
 * `EVT_CLIENT_NODE_RECONNECTED`
 
 You can listen to these events and execute custom actions in response.
-Refer to the link:events/listening-to-events[Listening to events] section for a code example.
+Please, refer to the link:events/listening-to-events[Listening to events] section for a code example.
 
 == Managing Slow Client Nodes
 
diff --git a/docs/_docs/code-snippets/java/src/main/java/org/apache/ignite/snippets/BasicCacheOperations.java b/docs/_docs/code-snippets/java/src/main/java/org/apache/ignite/snippets/BasicCacheOperations.java
index aeaf37b8e3b..3d445fe4538 100644
--- a/docs/_docs/code-snippets/java/src/main/java/org/apache/ignite/snippets/BasicCacheOperations.java
+++ b/docs/_docs/code-snippets/java/src/main/java/org/apache/ignite/snippets/BasicCacheOperations.java
@@ -129,9 +129,10 @@ public class BasicCacheOperations {
 
         try (Ignite ignite = Ignition.start()) {
             //tag::read-repair[]
-            IgniteCache<Object, Object> cache = ignite.cache("my_cache").withReadRepair();
+            IgniteCache<Object, Object> cache =
+                ignite.cache("my_cache").withReadRepair(ReadRepairStrategy.CHECK_ONLY);
 
-            Object value = cache.get(10);
+            Object value = cache.get(42);
             //end::read-repair[]
         }
 
diff --git a/docs/_docs/key-value-api/read-repair.adoc b/docs/_docs/key-value-api/read-repair.adoc
new file mode 100644
index 00000000000..4fbb475029c
--- /dev/null
+++ b/docs/_docs/key-value-api/read-repair.adoc
@@ -0,0 +1,99 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+= Read Repair
+
+WARNING: Experimental API
+
+[WARNING]
+====
+[discrete]
+=== Limitations
+A consistency check is incompatible with the following cache configurations:
+
+* Caches without backups.
+* Near caches.
+* Caches that use "read-through" mode.
+====
+
+
+`Read Repair` refers to a technique of repairing inconsistencies between primary and backup copies of data during normal read operations. When a specific key (or keys) is read by a user operation, Ignite checks the values for the given key in all backup copies.
+
+The `Read Repair` mode is designed to maintain consistency. However, read operations become {tilde}2 times more costly because backup copies are checked in addition to the primary copy. It is generally not advisable to use this mode all the time, but rather on a once-in-a-while basis.
+
+To enable the `Read Repair` mode, it is necessary to obtain an instance of the cache that enables Read Repair reads as follows:
+
+[source, java]
+----
+include::{javaCodeDir}/BasicCacheOperations.java[tags=read-repair, indent=0]
+----
+
+== Strategies
+In case consistency violations were found, the values across the topology will be replaced by repaired values according to the chosen strategy:
+
+[cols="1,4",opts="header"]
+|===
+| Strategy | Description
+| `LWW` a| Last write (the newest entry) wins.
+
+May cause IgniteException when the fix is impossible (unable to detect the newest entry)::
+* Null(s) found as well as non-null values for the same key. Null (missed entry) has no version, so, it can not be compared with the versioned entry.
+* Entries with the same version have different values.
+| `PRIMARY` | Value from the primary node wins.
+| `RELATIVE_MAJORITY` a| The relative majority, any value found more times than any other wins.
+Works for an even number of copies (which is typical of Ignite) instead of an absolute majority.
+
+May cause IgniteException when it is unable to detect values found more times than any other.
+
+For example, when 5 copies (4 backups) are given::
+* and value `A` found twice, but `X`, `Y` and `Z` only once, `A` wins,
+* but, when `A` is found twice, as well as `B`, and `X` only once, the strategy is unable to determine the winner.
+
+When 4 copies (3 backups) are given, any value found two or more times, when others are found only once, is the winner.
+| `REMOVE` | Inconsistent entries will be removed.
+| `CHECK_ONLY` | Only check will be performed.
+|===
+
+== Events
+link:https://ignite.apache.org/releases/{version}/javadoc/org/apache/ignite/events/EventType.html#EVT_CONSISTENCY_VIOLATION[Сonsistency Violation Event] will be recorded for each violation in case it's configured as recordable. You may listen to this event to get notified about inconsistency issues.
+
+Please, refer to the link:events/listening-to-events[Working with Events] section for the information on how to listen to events.
+
+== Transactional Caches
+Values will be repaired::
+* automatically for transactions that have `TransactionConcurrency.OPTIMISTIC` concurrency mode or `TransactionIsolation.READ_COMMITTED` isolation level,
+
+* at commit() phase for transactions that have `TransactionConcurrency.PESSIMISTIC` concurrency mode and isolation level other than `TransactionIsolation.READ_COMMITTED`
+
+[WARNING]
+====
+[discrete]
+=== Limitations
+This proxy usage does not guarantee "all copies check" in case the value have been already cached inside the transaction.
+
+In case you don't use a `READ_COMMITTED` isolation mode and already have a cached value, for example have already read the value or performed a write, you'll just get the cached value.
+====
+
+== Atomic Caches
+Values will be repaired automatically.
+
+[WARNING]
+====
+[discrete]
+=== Limitations
+Due to the nature of an atomic cache, false-positive results can be observed. For example, an attempt to check consistency under cache's loading may lead to a consistency violation exception.
+
+By default, the implementation tries to check the given key three times. The number of attempts can be changed using `IgniteSystemProperties.IGNITE_NEAR_GET_MAX_REMAPS` property.
+====
+
diff --git a/docs/_docs/read-repair.adoc b/docs/_docs/read-repair.adoc
deleted file mode 100644
index 50c7595aed6..00000000000
--- a/docs/_docs/read-repair.adoc
+++ /dev/null
@@ -1,56 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-// http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-= Read Repair
-
-WARNING: Experimental API.
-
-
-"Read Repair" refers to a technique of repairing inconsistencies between primary and backup copies of data during normal read operations. When a specific key (or keys) is read by a user operation, Ignite checks the values for the given key in all backup copies.
-
-The Read Repair mode is designed to maintain consistency. However, read operations become {tilde}2 times more costly because backup copies are checked. It is generally not advisable to use this mode all the time, but rather on a once-in-a-while basis.
-
-To enable Read Repair mode, obtain an instance of the cache that enables Read Repair reads as follows:
-
-[source, java]
-----
-include::{javaCodeDir}/BasicCacheOperations.java[tags=read-repair, indent=0]
-----
-
-A consistency check is incompatible with the following cache configurations:
-
-* Caches without backups.
-* Local caches.
-* Near caches.
-* Caches that use "read-through" mode.
-
-== Transactional Caches
-
-All values across the topology are replaced with the latest version.
-
-*  Automatically for transactions that have `TransactionConcurrency.OPTIMISTIC` concurrency mode or `TransactionIsolation.READ_COMMITTED` isolation level
-*  at the commit() phase for transactions that have `TransactionConcurrency.PESSIMISTIC` concurrency mode and isolation level other than `TransactionIsolation.READ_COMMITTED`
-
-When a backup inconsistency is detected, Ignite will generate a link:https://ignite.apache.org/releases/{version}/javadoc/org/apache/ignite/events/EventType.html#EVT_CONSISTENCY_VIOLATION[consistency violation event] (if the event is enabled in the configuration). You can listen to this event to get notified about inconsistency issues. Refer to the link:events/listening-to-events[Working with Events] section for the information on how to listen to events.
-
-Read Repair does not guarantee "all copies check" in case value already cached inside the transaction.
-For example, in case you use !TransactionIsolation.READ_COMMITTED isolation mode and already read the value or performed a write, you'll gain the cached value.
-
-== Atomic Caches
-
-The consistency violation exception is thrown if differences are found.
-
-Due to the nature of the atomic cache, false-positive results can be observed. For example, an attempt to check consistency under load may lead to consistency violation exception. By default, the implementation tries to check the given key three times. The number of attempts can be changed by setting `IGNITE_NEAR_GET_MAX_REMAPS` system property.
-
-Be aware that the consistency violation event will not be fired for atomic caches.
diff --git a/docs/_docs/tools/control-script.adoc b/docs/_docs/tools/control-script.adoc
index bf3d787bccc..dae26c36c61 100644
--- a/docs/_docs/tools/control-script.adoc
+++ b/docs/_docs/tools/control-script.adoc
@@ -460,24 +460,28 @@ control.sh --cache reset_lost_partitions cacheName1,cacheName2,...
 ----
 
 
-== Consistency Check Commands
+== Consistency Check and Repair Commands
 
-`control.sh|bat` includes a set of consistency check commands that enable you to verify internal data consistency.
+`control.sh|bat` includes a set of consistency check commands that enable you to verify and repair internal data consistency.
 
 First, the commands can be used for debugging and troubleshooting purposes especially if you're in active development.
 
 Second, if there is a suspicion that a query (such as a SQL query, etc.) returns an incomplete or wrong result set, the commands can verify whether there is inconsistency in the data.
 
-Finally, the consistency check commands can be utilized as part of regular cluster health monitoring.
+Third, the consistency check commands can be utilized as a part of regular cluster health monitoring.
+
+Finally, consistency can be repaired if necessary.
 
 Let's review these usage scenarios in more detail.
 
 === Verifying Partition Checksums
 
-//Even if update counters and size are equal on the primary and backup nodes, there might be a case when the primary and backup  diverge due to some critical failure.
+Even if update counters and size are equal on the primary and backup nodes, the primary and backup might diverge due to some critical failure.
+
 The `idle_verify` command compares the hash of the primary partition with that of the backup partitions and reports any differences.
 The differences might be the result of node failure or incorrect shutdown during an update operation.
-If any inconsistency is detected, we recommend remove the incorrect partitions.
+
+If any inconsistency is detected, we recommend removing the incorrect partitions or repairing the consistency using the `--consistency repair` command.
 
 [source,shell]
 ----
@@ -507,6 +511,102 @@ Partition instances: [PartitionHashRecord [isPrimary=true, partHash=97595430, up
 All updates should be stopped when `idle_verify` calculates hashes, otherwise it may show false positive error results. It's impossible to compare big datasets in a distributed system if they are being constantly updated​.
 ====
 
+=== Repairing cache consistency
+[WARNING]
+====
+[discrete]
+===  Experimental feature
+The command may not work on some special/unique configurations or even cause a cluster/node failure.
+
+Command execution MUST be checked on the test environment using the data/configuration similar to the production before the execution on the real production environment.
+====
+
+[WARNING]
+====
+[discrete]
+===  Additional configuration required
+The command uses special link:https://ignite.apache.org/releases/{version}/javadoc/org/apache/ignite/events/EventType.html#EVT_CONSISTENCY_VIOLATION[Consistency Violation Event] to detect the consistency violations. This event must be enabled before the command execution.
+
+Please, refer to the link:events/listening-to-events#enabling-events[Enabling Events] section for details.
+====
+
+`idle_verify` command provides the inconsistent cache group names and partitions list as a result.
+The `repair` command allows performing cache consistency check and repair (when possible) using the link:key-value-api/read-repair[Read Repair] approach for every inconsistent partition found by `idle_verify`.
+
+The command uses special strategies to perform the repair. It's recommended to use `CHECK_ONLY` strategy to list inconsistent values and then choose the proper link:key-value-api/read-repair#strategies[Repair Strategy].
+
+By default, found inconsistent entries will be listed in the application log. You may change the location by configuring the logger for a special logging path for the `org.apache.ignite.internal.visor.consistency` package.
+
+By default, found inconsistent entries will be listed as is but may be masked by enabling link:logging#suppressing-sensitive-information[IGNITE_TO_STRING_INCLUDE_SENSITIVE] system property.
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --enable-experimental --consistency repair --cache cache-name --partition partition --strategy strategy
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --enable-experimental --consistency repair --cache cache-name --partition partition --strategy strategy
+----
+--
+Parameters:
+
+[cols="1,3",opts="header"]
+|===
+| Parameter | Description
+| `cache-name`| Cache (or cache group) name to be checked/repaired.
+| `partition`| Cache's partition to be checked/repaired.
+| `strategy`| See link:key-value-api/read-repair#strategies[Repair Strategies].
+|===
+
+Optional parameters:
+
+[cols="1,3",opts="header"]
+|===
+| Parameter | Description
+| `--parallel`| Allows performing check/repair in the fastest way, by parallel execution at all partition owners.
+|===
+
+=== Cache consistency check/repair operations status
+
+The command allows to check `--consistency repair` commands status.
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --enable-experimental --consistency status
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --enable-experimental --consistency status
+----
+--
+
+=== Partition update counters finalization
+
+The command allows fo finalize partition update counters after the manual repair.
+Finalization closes gaps at transactional cache partition update counters.
+
+[tabs]
+--
+tab:Unix[]
+[source,shell]
+----
+control.sh --enable-experimental --consistency finalize
+----
+tab:Window[]
+[source,shell]
+----
+control.bat --enable-experimental --consistency finalize
+----
+--
+
 === Validating SQL Index Consistency
 The `validate_indexes` command validates the indexes of given caches on all cluster nodes.
 
@@ -1027,79 +1127,6 @@ control.bat --property get --name 'statistics.usage.state'
 ----
 --
 
-== Cache Consistency
-
-=== Repair
-
-The command allows to perform cache consistency check and repair (when possible) using Read Repair approach.
-
-[tabs]
---
-tab:Unix[]
-[source,shell]
-----
-control.sh --enable-experimental --consistency repair --cache cache-name --partition partition --strategy strategy
-----
-tab:Window[]
-[source,shell]
-----
-control.bat --enable-experimental --consistency repair --cache cache-name --partition partition --strategy strategy
-----
---
-Parameters:
-
-[cols="1,3",opts="header"]
-|===
-| Parameter | Description
-| `cache-name`| Cache (or cache group) name to be checked/repaired.
-| `partition`| Cache's partition to be checked/repaired.
-| `strategy`| Repair strategy [LWW, PRIMARY, RELATIVE_MAJORITY, REMOVE, CHECK_ONLY].
-|===
-
-Optional parameters:
-
-[cols="1,3",opts="header"]
-|===
-| Parameter | Description
-| `--parallel`| Allows performing check/repair in the fastest way, by parallel execution at all partition owners.
-|===
-
-=== Status
-
-The command allows performing cache consistency check/repair operations status check.
-
-[tabs]
---
-tab:Unix[]
-[source,shell]
-----
-control.sh --enable-experimental --consistency status
-----
-tab:Window[]
-[source,shell]
-----
-control.bat --enable-experimental --consistency status
-----
---
-
-=== Partition update counters finalization
-
-The command allows fo finalize partition update counters after the manual repair.
-
-[tabs]
---
-tab:Unix[]
-[source,shell]
-----
-control.sh --enable-experimental --consistency finalize
-----
-tab:Window[]
-[source,shell]
-----
-control.bat --enable-experimental --consistency finalize
-----
---
-
 == Manage cache metrics collection
 
 The command provides an ability to enable, disable or show status of cache metrics collection.
diff --git a/modules/core/src/main/java/org/apache/ignite/IgniteCache.java b/modules/core/src/main/java/org/apache/ignite/IgniteCache.java
index fcde1dfd9d3..339a37add7f 100644
--- a/modules/core/src/main/java/org/apache/ignite/IgniteCache.java
+++ b/modules/core/src/main/java/org/apache/ignite/IgniteCache.java
@@ -160,22 +160,22 @@ public interface IgniteCache<K, V> extends javax.cache.Cache<K, V>, IgniteAsyncS
      *  <li>For transactional caches, values will be repaired:
      *  <ul>
      *      <li>automatically for transactions that have {@link TransactionConcurrency#OPTIMISTIC} concurrency mode
-     *          or {@link TransactionIsolation#READ_COMMITTED} isolation level</li>
+     *          or {@link TransactionIsolation#READ_COMMITTED} isolation level,</li>
      *      <li>at commit() phase for transactions that have {@link TransactionConcurrency#PESSIMISTIC} concurrency mode
      *          and isolation level other than {@link TransactionIsolation#READ_COMMITTED}</li>
      *  </ul>
      *  Warning:
      *  <p>
-     *  This proxy usage does not guarantee "all copies check" in case the value is already cached inside the transaction.
-     *  In case you use not a READ_COMMITTED isolation mode and already have a cached value, for example already read the
-     *  value or performed a write, you'll just gain the cached value.
+     *  This proxy usage does not guarantee "all copies check" in case the value have been already cached inside the transaction.
+     *  In case you don't use a READ_COMMITTED isolation mode and already have a cached value, for example have already
+     *  read the value or performed a write, you'll just get the cached value.
      *  </li>
      *  <li>For atomic caches, values will be repaired automatically.
      *  <p>
      *  Warning:
      *  <p>
-     *  Due to the nature of the atomic cache, false-positive results can be observed. For example, an attempt to check
-     *  consistency under cache loading may lead to a consistency violation exception. By default, the implementation tries
+     *  Due to the nature of an atomic cache, false-positive results can be observed. For example, an attempt to check
+     *  consistency under cache's loading may lead to a consistency violation exception. By default, the implementation tries
      *  to check the given key three times. The number of attempts can be changed using
      *  {@link IgniteSystemProperties#IGNITE_NEAR_GET_MAX_REMAPS} property.
      *  </li>
@@ -183,7 +183,6 @@ public interface IgniteCache<K, V> extends javax.cache.Cache<K, V>, IgniteAsyncS
      * A consistency check is incompatible with the following cache configurations:
      * <ul>
      *     <li>Caches without backups.</li>
-     *     <li>Local caches.</li>
      *     <li>Near caches.</li>
      *     <li>Caches that use "read-through" mode.</li>
      * </ul>
diff --git a/modules/core/src/main/java/org/apache/ignite/cache/ReadRepairStrategy.java b/modules/core/src/main/java/org/apache/ignite/cache/ReadRepairStrategy.java
index 4d80037a7bc..293f7fb5aab 100644
--- a/modules/core/src/main/java/org/apache/ignite/cache/ReadRepairStrategy.java
+++ b/modules/core/src/main/java/org/apache/ignite/cache/ReadRepairStrategy.java
@@ -47,12 +47,12 @@ public enum ReadRepairStrategy {
      * <p>
      * Works for an even number of copies (which is typical of Ignite) instead of an absolute majority.
      * <p>
-     * May cause {@link IgniteException} when unable to detect value found more times than any other.
+     * May cause {@link IgniteException} when it is unable to detect values found more times than any other.
      * <p>
-     * For example, when we have 5 copies (4 backups) and value `A` found twice, but `X`,`Y` and `Z` only once, `A` wins.
-     * But, when `A` found twice, as well as `B`, and `X` only once, the strategy unable to determine the winner.
+     * For example, when 5 copies (4 backups) are given and value `A` is found twice, but `X`, `Y`, and `Z` only once, `A` wins.
+     * But, when `A` is found twice, as well as `B`, and `X` only once, the strategy is unable to determine the winner.
      * <p>
-     * When we have 4 copies (3 backups), any value found two or more times, when others are found only once, is the winner.
+     * When 4 copies (3 backups) are given, any value found two or more times, when others are found only once, is the winner.
      */
     RELATIVE_MAJORITY("RELATIVE_MAJORITY"),