You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Aleksey Plekhanov (Jira)" <ji...@apache.org> on 2020/07/20 09:17:00 UTC
[jira] [Updated] (IGNITE-12950) Partitions validator must check
sizes even if update counters are different
[ https://issues.apache.org/jira/browse/IGNITE-12950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksey Plekhanov updated IGNITE-12950:
---------------------------------------
Fix Version/s: (was: 2.9)
2.10
> Partitions validator must check sizes even if update counters are different
> ---------------------------------------------------------------------------
>
> Key: IGNITE-12950
> URL: https://issues.apache.org/jira/browse/IGNITE-12950
> Project: Ignite
> Issue Type: Improvement
> Components: cache
> Reporter: Ivan Mironovich
> Assignee: Ivan Mironovich
> Priority: Major
> Fix For: 2.10
>
> Original Estimate: 336h
> Time Spent: 10m
> Remaining Estimate: 335h 50m
>
> We have method in GridDhtPartitionsStateValidator:
> {code:java}
> // public void validatePartitionCountersAndSizes(
> GridDhtPartitionsExchangeFuture fut,
> GridDhtPartitionTopology top,
> Map<UUID, GridDhtPartitionsSingleMessage> messages
> ) throws IgniteCheckedException {
> final Set<UUID> ignoringNodes = new HashSet<>();
> // Ignore just joined nodes.
> for (DiscoveryEvent evt : fut.events().events()) {
> if (evt.type() == EVT_NODE_JOINED)
> ignoringNodes.add(evt.eventNode().id());
> }
> AffinityTopologyVersion topVer = fut.context().events().topologyVersion();
> // Validate update counters.
> Map<Integer, Map<UUID, Long>> result = validatePartitionsUpdateCounters(top, messages, ignoringNodes);
> if (!result.isEmpty())
> throw new IgniteCheckedException("Partitions update counters are inconsistent for " + fold(topVer, result));
> // For sizes validation ignore also nodes which are not able to send cache sizes.
> for (UUID id : messages.keySet()) {
> ClusterNode node = cctx.discovery().node(id);
> if (node != null && node.version().compareTo(SIZES_VALIDATION_AVAILABLE_SINCE) < 0)
> ignoringNodes.add(id);
> }
> if (!cctx.cache().cacheGroup(top.groupId()).mvccEnabled()) { // TODO: Remove "if" clause in IGNITE-9451.
> // Validate cache sizes.
> result = validatePartitionsSizes(top, messages, ignoringNodes);
> if (!result.isEmpty())
> throw new IgniteCheckedException("Partitions cache sizes are inconsistent for " + fold(topVer, result));
> }
> }
> {code}
> We should check partitions sizes even if update counters are different. It could be helpful for debugging problems on production.
> We must print information about all copies, if a partition is in an inconsistent state. Now we could get the message on cache group with 3 backups:
> {code:java}
> // Partition states validation has failed for group: CACHEGROUP. Partitions update counters are inconsistent for Part 3415: [10.104.6.10:47500=2577263 10.104.6.12:47500=2577263 10.104.6.23:47500=2577262 10.104.6.9:47500=2577263 ] Part 4960: [10.104.6.11:47500=2560994 10.104.6.23:47500=2560993 ]
> {code}
> (part 4960 contains information about 2 copies only)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)