You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ivan (Jira)" <ji...@apache.org> on 2020/04/26 08:22:00 UTC
[jira] [Created] (IGNITE-12950) Partitions validator must check
sizes even if update counters are different
Ivan created IGNITE-12950:
-----------------------------
Summary: Partitions validator must check sizes even if update counters are different
Key: IGNITE-12950
URL: https://issues.apache.org/jira/browse/IGNITE-12950
Project: Ignite
Issue Type: Improvement
Components: cache
Reporter: Ivan
Fix For: 2.9
We have method in GridDhtPartitionsStateValidator:
{code:java}
// public void validatePartitionCountersAndSizes(
GridDhtPartitionsExchangeFuture fut,
GridDhtPartitionTopology top,
Map<UUID, GridDhtPartitionsSingleMessage> messages
) throws IgniteCheckedException {
final Set<UUID> ignoringNodes = new HashSet<>();
// Ignore just joined nodes.
for (DiscoveryEvent evt : fut.events().events()) {
if (evt.type() == EVT_NODE_JOINED)
ignoringNodes.add(evt.eventNode().id());
}
AffinityTopologyVersion topVer = fut.context().events().topologyVersion();
// Validate update counters.
Map<Integer, Map<UUID, Long>> result = validatePartitionsUpdateCounters(top, messages, ignoringNodes);
if (!result.isEmpty())
throw new IgniteCheckedException("Partitions update counters are inconsistent for " + fold(topVer, result));
// For sizes validation ignore also nodes which are not able to send cache sizes.
for (UUID id : messages.keySet()) {
ClusterNode node = cctx.discovery().node(id);
if (node != null && node.version().compareTo(SIZES_VALIDATION_AVAILABLE_SINCE) < 0)
ignoringNodes.add(id);
}
if (!cctx.cache().cacheGroup(top.groupId()).mvccEnabled()) { // TODO: Remove "if" clause in IGNITE-9451.
// Validate cache sizes.
result = validatePartitionsSizes(top, messages, ignoringNodes);
if (!result.isEmpty())
throw new IgniteCheckedException("Partitions cache sizes are inconsistent for " + fold(topVer, result));
}
}
{code}
{{}}
We should check paritions sizes even if update counters are different. It could be helpful for debug problems on production.
We must print information about all copies, if partition is in inconsistent state. Now we could get message on cache group with 3 backups:
{code:java}
// Partition states validation has failed for group: CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey. Partitions update counters are inconsistent for Part 3415: [10.104.6.10:47500=2577263 10.104.6.12:47500=2577263 10.104.6.23:47500=2577262 10.104.6.9:47500=2577263 ] Part 4960: [10.104.6.11:47500=2560994 10.104.6.23:47500=2560993 ]
{code}
(part 4960 contains information about 2 copies only)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)