You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ignite.apache.org by GitBox <gi...@apache.org> on 2021/01/12 09:00:01 UTC

[GitHub] [ignite] xtern opened a new pull request #8648: IGNITE-13805

xtern opened a new pull request #8648:
URL: https://github.com/apache/ignite/pull/8648


   Thank you for submitting the pull request to the Apache Ignite.
   
   In order to streamline the review of the contribution 
   we ask you to ensure the following steps have been taken:
   
   ### The Contribution Checklist
   - [ ] There is a single JIRA ticket related to the pull request. 
   - [ ] The web-link to the pull request is attached to the JIRA ticket.
   - [ ] The JIRA ticket has the _Patch Available_ state.
   - [ ] The pull request body describes changes that have been made. 
   The description explains _WHAT_ and _WHY_ was made instead of _HOW_.
   - [ ] The pull request title is treated as the final commit message. 
   The following pattern must be used: `IGNITE-XXXX Change summary` where `XXXX` - number of JIRA issue.
   - [ ] A reviewer has been mentioned through the JIRA comments 
   (see [the Maintainers list](https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute#HowtoContribute-ReviewProcessandMaintainers)) 
   - [ ] The pull request has been checked by the Teamcity Bot and 
   the `green visa` attached to the JIRA ticket (see [TC.Bot: Check PR](https://mtcga.gridgain.com/prs.html))
   
   ### Notes
   - [How to Contribute](https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute)
   - [Coding abbreviation rules](https://cwiki.apache.org/confluence/display/IGNITE/Abbreviation+Rules)
   - [Coding Guidelines](https://cwiki.apache.org/confluence/display/IGNITE/Coding+Guidelines)
   - [Apache Ignite Teamcity Bot](https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+Teamcity+Bot)
   
   If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com _#ignite_ channel.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595825355



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;

Review comment:
       added assrtion `opCtx != null`for server nodes




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r600304066



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, updateMeta, stopChecker, errHnd).thenAccept(res -> {
+                Throwable err = opCtx.err.get();
+
+                if (err != null) {
+                    log.error("Unable to restore cache group(s) from the snapshot " +
+                        "[reqId=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', err);
+
+                    retFut.onDone(err);
+                } else
+                    retFut.onDone(new ArrayList<>(opCtx.cfgs.values()));
+            });
+
+            return retFut;
+        } catch (IgniteIllegalStateException | IgniteCheckedException | RejectedExecutionException e) {
+            log.error("Unable to restore cache group(s) from the snapshot " +
+                "[reqId=" + req.requestId() + ", snapshot=" + req.snapshotName() + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param snpName Snapshot name.
+     * @param dirs Cache directories to restore from the snapshot.
+     * @param updateMeta Update binary metadata flag.
+     * @param stopChecker Prcoess interrupt checker.
+     * @param errHnd Error handler.
+     * @throws IgniteCheckedException If failed.
+     */
+    private CompletableFuture<Void> restoreAsync(
+        String snpName,
+        Collection<File> dirs,
+        boolean updateMeta,
+        BooleanSupplier stopChecker,
+        Consumer<Exception> errHnd
+    ) throws IgniteCheckedException {
+        IgniteSnapshotManager snapshotMgr = ctx.cache().context().snapshotMgr();
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        List<CompletableFuture<Void>> futs = new ArrayList<>();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(snapshotMgr.snapshotLocalDir(snpName).getAbsolutePath(), pdsFolderName);
+
+            futs.add(CompletableFuture.runAsync(() -> {
+                try {
+                    ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+                }
+                catch (IgniteCheckedException e) {
+                    errHnd.accept(e);
+                }
+            }, snapshotMgr.snapshotExecutorService()));
+        }
+
+        for (File cacheDir : dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            assert snpCacheDir.exists() : "node=" + ctx.localNodeId() + ", dir=" + snpCacheDir;
+
+            for (File snpFile : snpCacheDir.listFiles()) {
+                futs.add(CompletableFuture.runAsync(() -> {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    try {
+                        Files.copy(snpFile.toPath(), target.toPath());
+                    }
+                    catch (IOException e) {
+                        errHnd.accept(e);
+                    }
+                }, ctx.cache().context().snapshotMgr().snapshotExecutorService()));
+            }
+        }
+
+        int futsSize = futs.size();
+
+        return CompletableFuture.allOf(futs.toArray(new CompletableFuture[futsSize]));
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta == null || !meta.consistentId().equals(cctx.localNode().consistentId().toString()))
+            return new SnapshotRestoreContext(req, Collections.emptyList(), Collections.emptyMap());
+
+        if (meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+        FilePageStoreManager pageStore = (FilePageStoreManager)cctx.pageStore();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), meta.folderName())) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            File cacheDir = pageStore.cacheWorkDir(snpCacheDir.getName().startsWith(CACHE_GRP_DIR_PREFIX), grpName);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+
+            pageStore.readCacheConfigurations(snpCacheDir, cfgsByName);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req, cacheDirs, cfgsById);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        if (ctx.clientNode())
+            return;
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        Exception failure = F.first(errs.values());
+
+        assert opCtx0 != null || failure != null : ctx.localNodeId();
+
+        if (opCtx0 == null) {
+            finishProcess(failure);
+
+            return;
+        }
+
+        if (failure == null)

Review comment:
       We cannot rely on single node (local) errors during the "finish"-phase of the distributed process. For example, one of the nodes may observe a node failure, and the other may not, in which case we will have different behavior on different nodes at the completion stage, so the decision is to track only those errors that are visible for all nodes. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r600305230



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r600306068



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, updateMeta, stopChecker, errHnd).thenAccept(res -> {
+                Throwable err = opCtx.err.get();
+
+                if (err != null) {
+                    log.error("Unable to restore cache group(s) from the snapshot " +
+                        "[reqId=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', err);
+
+                    retFut.onDone(err);
+                } else
+                    retFut.onDone(new ArrayList<>(opCtx.cfgs.values()));
+            });
+
+            return retFut;
+        } catch (IgniteIllegalStateException | IgniteCheckedException | RejectedExecutionException e) {
+            log.error("Unable to restore cache group(s) from the snapshot " +
+                "[reqId=" + req.requestId() + ", snapshot=" + req.snapshotName() + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param snpName Snapshot name.
+     * @param dirs Cache directories to restore from the snapshot.
+     * @param updateMeta Update binary metadata flag.
+     * @param stopChecker Prcoess interrupt checker.
+     * @param errHnd Error handler.
+     * @throws IgniteCheckedException If failed.
+     */
+    private CompletableFuture<Void> restoreAsync(
+        String snpName,
+        Collection<File> dirs,
+        boolean updateMeta,
+        BooleanSupplier stopChecker,
+        Consumer<Exception> errHnd
+    ) throws IgniteCheckedException {
+        IgniteSnapshotManager snapshotMgr = ctx.cache().context().snapshotMgr();
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        List<CompletableFuture<Void>> futs = new ArrayList<>();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(snapshotMgr.snapshotLocalDir(snpName).getAbsolutePath(), pdsFolderName);
+
+            futs.add(CompletableFuture.runAsync(() -> {
+                try {
+                    ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+                }
+                catch (IgniteCheckedException e) {
+                    errHnd.accept(e);
+                }
+            }, snapshotMgr.snapshotExecutorService()));
+        }
+
+        for (File cacheDir : dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            assert snpCacheDir.exists() : "node=" + ctx.localNodeId() + ", dir=" + snpCacheDir;
+
+            for (File snpFile : snpCacheDir.listFiles()) {
+                futs.add(CompletableFuture.runAsync(() -> {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    try {
+                        Files.copy(snpFile.toPath(), target.toPath());
+                    }
+                    catch (IOException e) {
+                        errHnd.accept(e);
+                    }
+                }, ctx.cache().context().snapshotMgr().snapshotExecutorService()));
+            }
+        }
+
+        int futsSize = futs.size();
+
+        return CompletableFuture.allOf(futs.toArray(new CompletableFuture[futsSize]));
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta == null || !meta.consistentId().equals(cctx.localNode().consistentId().toString()))
+            return new SnapshotRestoreContext(req, Collections.emptyList(), Collections.emptyMap());
+
+        if (meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+        FilePageStoreManager pageStore = (FilePageStoreManager)cctx.pageStore();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), meta.folderName())) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            File cacheDir = pageStore.cacheWorkDir(snpCacheDir.getName().startsWith(CACHE_GRP_DIR_PREFIX), grpName);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+
+            pageStore.readCacheConfigurations(snpCacheDir, cfgsByName);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req, cacheDirs, cfgsById);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        if (ctx.clientNode())
+            return;
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        Exception failure = F.first(errs.values());
+
+        assert opCtx0 != null || failure != null : ctx.localNodeId();
+
+        if (opCtx0 == null) {
+            finishProcess(failure);
+
+            return;
+        }
+
+        if (failure == null)
+            failure = checNodeLeft(opCtx0.nodes, res.keySet());
+
+        // Context has been created - should rollback changes cluster-wide.
+        if (failure != null) {
+            opCtx0.err.compareAndSet(null, failure);
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                rollbackRestoreProc.start(reqId, reqId);
+
+            return;
+        }
+
+        Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+        for (List<StoredCacheData> storedCfgs : res.values()) {
+            if (storedCfgs == null)
+                continue;
+
+            for (StoredCacheData cacheData : storedCfgs)
+                globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+        }
+
+        opCtx0.cfgs = globalCfgs;
+
+        if (U.isLocalNodeCoordinator(ctx.discovery()))
+            cacheStartProc.start(reqId, reqId);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null) {
+            return new GridFinishedFuture<>(new IgniteIllegalStateException("Context has not been created on server " +
+                "node during prepare operation [reqId=" + reqId + ", nodeId=" + ctx.localNodeId() + ']'));
+        }
+
+        Throwable err = opCtx0.err.get();
+
+        if (err != null)
+            return new GridFinishedFuture<>(err);
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        Collection<StoredCacheData> ccfgs = opCtx0.cfgs.values();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting restored caches " +
+                "[reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName +
+                ", caches=" + F.viewReadOnly(ccfgs, c -> c.config().getName()) + ']');
+        }
+
+        return ctx.cache().dynamicStartCachesByStoredConf(ccfgs, true, true, false, null, true, opCtx0.nodes);

Review comment:
       comment added
   ```
   // We set the topology node IDs required to successfully start the cache, if any of the required nodes leave
   // the cluster during the cache startup, the whole procedure will be rolled back.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r600304927



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, updateMeta, stopChecker, errHnd).thenAccept(res -> {
+                Throwable err = opCtx.err.get();
+
+                if (err != null) {
+                    log.error("Unable to restore cache group(s) from the snapshot " +
+                        "[reqId=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', err);
+
+                    retFut.onDone(err);
+                } else
+                    retFut.onDone(new ArrayList<>(opCtx.cfgs.values()));
+            });
+
+            return retFut;
+        } catch (IgniteIllegalStateException | IgniteCheckedException | RejectedExecutionException e) {
+            log.error("Unable to restore cache group(s) from the snapshot " +
+                "[reqId=" + req.requestId() + ", snapshot=" + req.snapshotName() + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param snpName Snapshot name.
+     * @param dirs Cache directories to restore from the snapshot.
+     * @param updateMeta Update binary metadata flag.
+     * @param stopChecker Prcoess interrupt checker.
+     * @param errHnd Error handler.
+     * @throws IgniteCheckedException If failed.
+     */
+    private CompletableFuture<Void> restoreAsync(
+        String snpName,
+        Collection<File> dirs,
+        boolean updateMeta,
+        BooleanSupplier stopChecker,
+        Consumer<Exception> errHnd
+    ) throws IgniteCheckedException {
+        IgniteSnapshotManager snapshotMgr = ctx.cache().context().snapshotMgr();
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        List<CompletableFuture<Void>> futs = new ArrayList<>();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(snapshotMgr.snapshotLocalDir(snpName).getAbsolutePath(), pdsFolderName);
+
+            futs.add(CompletableFuture.runAsync(() -> {
+                try {
+                    ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+                }
+                catch (IgniteCheckedException e) {
+                    errHnd.accept(e);
+                }
+            }, snapshotMgr.snapshotExecutorService()));
+        }
+
+        for (File cacheDir : dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            assert snpCacheDir.exists() : "node=" + ctx.localNodeId() + ", dir=" + snpCacheDir;
+
+            for (File snpFile : snpCacheDir.listFiles()) {
+                futs.add(CompletableFuture.runAsync(() -> {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    try {
+                        Files.copy(snpFile.toPath(), target.toPath());
+                    }
+                    catch (IOException e) {
+                        errHnd.accept(e);
+                    }
+                }, ctx.cache().context().snapshotMgr().snapshotExecutorService()));
+            }
+        }
+
+        int futsSize = futs.size();
+
+        return CompletableFuture.allOf(futs.toArray(new CompletableFuture[futsSize]));
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta == null || !meta.consistentId().equals(cctx.localNode().consistentId().toString()))
+            return new SnapshotRestoreContext(req, Collections.emptyList(), Collections.emptyMap());
+
+        if (meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+        FilePageStoreManager pageStore = (FilePageStoreManager)cctx.pageStore();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), meta.folderName())) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            File cacheDir = pageStore.cacheWorkDir(snpCacheDir.getName().startsWith(CACHE_GRP_DIR_PREFIX), grpName);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+
+            pageStore.readCacheConfigurations(snpCacheDir, cfgsByName);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req, cacheDirs, cfgsById);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        if (ctx.clientNode())
+            return;
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        Exception failure = F.first(errs.values());
+
+        assert opCtx0 != null || failure != null : ctx.localNodeId();
+
+        if (opCtx0 == null) {
+            finishProcess(failure);
+
+            return;
+        }
+
+        if (failure == null)
+            failure = checNodeLeft(opCtx0.nodes, res.keySet());

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r614265311



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(
+        DistributedProcessType procType,
+        Class<? extends Throwable> expCls,
+        String expMsg
+    ) throws Exception {
+        String grpName = "shared";
+        String cacheName = "cache1";
+
+        dfltCacheCfg = txCacheConfig(new CacheConfiguration<Integer, Object>(cacheName)).setGroupName(grpName);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, grpName);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(grpName), IgniteCheckedException.class, null);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(cacheName), expCls, expMsg);
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        checkCacheKeys(grid(0).cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeFail() throws Exception {
+        checkTopologyChange(true);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeJoin() throws Exception {
+        checkTopologyChange(false);
+    }
+
+    /**
+     * @param stopNode {@code True} to check node fail, {@code False} to check node join.
+     * @throws Exception if failed.
+     */
+    private void checkTopologyChange(boolean stopNode) throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(4, keysCnt);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(3));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, dfltCacheCfg.getName());
+
+        if (stopNode) {
+            IgniteInternalFuture<?> fut0 = runAsync(() -> stopGrid(3, true));
+
+            GridTestUtils.assertThrowsAnyCause(
+                log,
+                () -> fut.get(TIMEOUT),
+                ClusterTopologyCheckedException.class,
+                "Required node has left the cluster"
+            );
+
+            ensureCacheDirEmpty(3, dfltCacheCfg);
+
+            fut0.get(TIMEOUT);
+
+            awaitPartitionMapExchange();
+
+            dfltCacheCfg = null;
+
+            GridTestUtils.assertThrowsAnyCause(
+                log,
+                () -> startGrid(3),
+                IgniteSpiException.class,
+                "to add the node to cluster - remove directories with the caches"
+            );
+
+            return;
+        }
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> startGrid(4),
+            IgniteSpiException.class,
+            "Joining node during caches restore is not allowed"
+        );
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterStateChangeActiveReadonlyOnPrepare() throws Exception {
+        checkClusterStateChange(ClusterState.ACTIVE_READ_ONLY, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE,
+            IgniteException.class, "Failed to perform start cache operation (cluster is in read-only mode)");
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterStateChangeActiveReadonlyOnCacheStart() throws Exception {
+        checkClusterStateChange(ClusterState.ACTIVE_READ_ONLY, RESTORE_CACHE_GROUP_SNAPSHOT_START, null, null);
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterDeactivateOnPrepare() throws Exception {
+        checkClusterStateChange(ClusterState.INACTIVE, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE,
+            IgniteException.class, "The cluster has been deactivated.");
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterDeactivateOnCacheStart() throws Exception {
+        checkClusterStateChange(ClusterState.INACTIVE, RESTORE_CACHE_GROUP_SNAPSHOT_START, null, null);
+    }
+
+    /**
+     * @param state Cluster state.
+     * @param procType The type of distributed process on which communication is blocked.
+     * @param exCls Expected exception class.
+     * @param expMsg Expected exception message.
+     * @throws Exception if failed.
+     */
+    private void checkClusterStateChange(

Review comment:
       It is not very clear why parameterization is better than the current approach. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r614262376



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {

Review comment:
       Added test `testClusterSnapshotRestoreOnSmallerTopology`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r607800953



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
##########
@@ -869,6 +895,37 @@ else if (f1.error() instanceof IgniteSnapshotVerifyException)
         return res;
     }
 
+    /**
+     * @param name Snapshot name.
+     * @return Future with snapshot metadata obtained from nodes.
+     */
+    IgniteInternalFuture<Map<ClusterNode, List<SnapshotMetadata>>> collectSnapshotMetadata(String name) {
+        GridKernalContext kctx0 = cctx.kernalContext();
+
+        kctx0.security().authorize(ADMIN_SNAPSHOT);
+
+        Collection<ClusterNode> bltNodes = F.view(cctx.discovery().serverNodes(AffinityTopologyVersion.NONE),
+            (node) -> CU.baselineNode(node, kctx0.state().clusterState()));
+
+        kctx0.task().setThreadContext(TC_SKIP_AUTH, true);
+        kctx0.task().setThreadContext(TC_SUBGRID, bltNodes);
+
+        return kctx0.task().execute(SnapshotMetadataCollectorTask.class, name);
+    }
+
+    /**
+     * @param metas Nodes snapshot metadata.
+     * @return Future with the verification results.
+     */
+    IgniteInternalFuture<IdleVerifyResultV2> runSnapshotVerfification(Map<ClusterNode, List<SnapshotMetadata>> metas) {

Review comment:
       `runSnapshotVerfification` > `runSnapshotVerification`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r622443964



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java
##########
@@ -5126,6 +5126,12 @@ public void onNodeLeft(final ClusterNode node) {
 
                             if (crd0 == null)
                                 finishState = new FinishState(null, initialVersion(), null);
+
+                            if (dynamicCacheStartExchange() &&
+                                exchActions.cacheStartRequiredAliveNodes().contains(node.id())) {
+                                exchangeGlobalExceptions.put(cctx.localNodeId(), new ClusterTopologyCheckedException(

Review comment:
       I think we should also set the `exchangeLocE` param.
   Let's add a dedicated test for starting caches with the required alive nodes which is not related to the snapshot procedure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r607144542



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,832 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestoreRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** Future to be completed when the cache restore process is complete (this future will be returned to the user). */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /** Stopped flag. */
+    private volatile boolean stopped;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        try {
+            if (ctx.clientNode())
+                throw new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation.");
+
+            DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+            if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+                throw new IgniteException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster.");
+
+            if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+                throw new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation.");
+
+            if (ctx.cache().context().snapshotMgr().isSnapshotCreating())
+                throw new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress.");
+
+            synchronized (this) {
+                if (isRestoring() && fut == null)
+                    throw new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed.");
+
+                fut = new GridFutureAdapter<>();
+            }
+        } catch (IgniteException e) {
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+
+        ctx.cache().context().snapshotMgr().collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    finishProcess(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    finishProcess(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                ctx.cache().context().snapshotMgr().runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            finishProcess(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestoreRequest req = new SnapshotRestoreRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return isRestoring(null, null);
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(@Nullable String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        if (cacheName == null)
+            return true;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        String details = opCtx0 == null ? "" : " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']';
+
+        if (err != null)
+            log.error("Failed to restore snapshot cache group" + details, err);
+        else if (log.isInfoEnabled())
+            log.info("Successfully restored cache group(s) from the snapshot" + details);
+
+        opCtx = null;
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (fut0 != null) {
+                fut = null;
+
+                ctx.getSystemExecutorService().submit(() -> fut0.onDone(null, err));
+            }
+        }
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     */
+    public void stop() {
+        interrupt(new NodeStoppingException("Node is stopping."), true);
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     */
+    public void deactivate() {
+        interrupt(new IgniteCheckedException("The cluster has been deactivated."), false);
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     * @param stop Stop flag.
+     */
+    private void interrupt(Exception reason, boolean stop) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return;
+
+        opCtx0.err.compareAndSet(null, reason);
+
+        IgniteFuture<?> stopFut;
+
+        synchronized (this) {
+            stopFut = opCtx0.stopFut;
+
+            if (stop)
+                stopped = true;

Review comment:
       Let's rename `stopped` > `canceled`

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,832 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestoreRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** Future to be completed when the cache restore process is complete (this future will be returned to the user). */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /** Stopped flag. */
+    private volatile boolean stopped;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        try {
+            if (ctx.clientNode())
+                throw new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation.");
+
+            DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+            if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+                throw new IgniteException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster.");
+
+            if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+                throw new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation.");
+
+            if (ctx.cache().context().snapshotMgr().isSnapshotCreating())
+                throw new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress.");
+
+            synchronized (this) {
+                if (isRestoring() && fut == null)
+                    throw new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed.");
+
+                fut = new GridFutureAdapter<>();
+            }
+        } catch (IgniteException e) {
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+
+        ctx.cache().context().snapshotMgr().collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    finishProcess(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    finishProcess(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                ctx.cache().context().snapshotMgr().runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            finishProcess(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestoreRequest req = new SnapshotRestoreRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return isRestoring(null, null);
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(@Nullable String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        if (cacheName == null)
+            return true;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        String details = opCtx0 == null ? "" : " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']';
+
+        if (err != null)
+            log.error("Failed to restore snapshot cache group" + details, err);
+        else if (log.isInfoEnabled())
+            log.info("Successfully restored cache group(s) from the snapshot" + details);
+
+        opCtx = null;
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (fut0 != null) {
+                fut = null;
+
+                ctx.getSystemExecutorService().submit(() -> fut0.onDone(null, err));
+            }
+        }
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     */
+    public void stop() {
+        interrupt(new NodeStoppingException("Node is stopping."), true);
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     */
+    public void deactivate() {
+        interrupt(new IgniteCheckedException("The cluster has been deactivated."), false);
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     * @param stop Stop flag.
+     */
+    private void interrupt(Exception reason, boolean stop) {

Review comment:
       Let's rename `stop` > `cancel`

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,832 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestoreRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** Future to be completed when the cache restore process is complete (this future will be returned to the user). */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /** Stopped flag. */
+    private volatile boolean stopped;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        try {
+            if (ctx.clientNode())
+                throw new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation.");
+
+            DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+            if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+                throw new IgniteException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster.");
+
+            if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+                throw new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation.");
+
+            if (ctx.cache().context().snapshotMgr().isSnapshotCreating())
+                throw new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress.");
+
+            synchronized (this) {
+                if (isRestoring() && fut == null)
+                    throw new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed.");
+
+                fut = new GridFutureAdapter<>();
+            }
+        } catch (IgniteException e) {
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+
+        ctx.cache().context().snapshotMgr().collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    finishProcess(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    finishProcess(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                ctx.cache().context().snapshotMgr().runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            finishProcess(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestoreRequest req = new SnapshotRestoreRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return isRestoring(null, null);
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(@Nullable String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        if (cacheName == null)
+            return true;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        String details = opCtx0 == null ? "" : " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']';
+
+        if (err != null)
+            log.error("Failed to restore snapshot cache group" + details, err);
+        else if (log.isInfoEnabled())
+            log.info("Successfully restored cache group(s) from the snapshot" + details);
+
+        opCtx = null;
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (fut0 != null) {
+                fut = null;
+
+                ctx.getSystemExecutorService().submit(() -> fut0.onDone(null, err));
+            }
+        }
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     */
+    public void stop() {
+        interrupt(new NodeStoppingException("Node is stopping."), true);
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     */
+    public void deactivate() {
+        interrupt(new IgniteCheckedException("The cluster has been deactivated."), false);
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     * @param stop Stop flag.
+     */
+    private void interrupt(Exception reason, boolean stop) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return;
+
+        opCtx0.err.compareAndSet(null, reason);
+
+        IgniteFuture<?> stopFut;
+
+        synchronized (this) {
+            stopFut = opCtx0.stopFut;
+
+            if (stop)
+                stopped = true;
+        }
+
+        if (stopFut != null && stopFut.isDone())
+            stopFut.get();
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestoreRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (ctx.cache().context().snapshotMgr().isSnapshotCreating())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "A cluster snapshot operation is in progress.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            Consumer<Throwable> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            synchronized (this) {
+                if (stopped || ctx.isStopping())
+                    throw new NodeStoppingException("Node is stopping.");
+
+                opCtx0.stopFut = new IgniteFutureImpl<>(retFut.chain(f -> null));
+            }
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, ctx.localNodeId().equals(req.updateMetaNodeId()), stopChecker, errHnd)
+                .thenAccept(res -> {
+                    Throwable err = opCtx.err.get();
+
+                    if (err != null) {
+                        log.error("Unable to restore cache group(s) from the snapshot " +
+                            "[reqId=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', err);
+
+                        retFut.onDone(err);
+                    }
+                    else
+                        retFut.onDone(new ArrayList<>(opCtx.cfgs.values()));
+                });
+
+            return retFut;
+        } catch (IgniteIllegalStateException | IgniteCheckedException | RejectedExecutionException e) {
+            log.error("Unable to restore cache group(s) from the snapshot " +
+                "[reqId=" + req.requestId() + ", snapshot=" + req.snapshotName() + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param snpName Snapshot name.
+     * @param dirs Cache directories to restore from the snapshot.
+     * @param updateMeta Update binary metadata flag.
+     * @param stopChecker Process interrupt checker.
+     * @param errHnd Error handler.
+     * @throws IgniteCheckedException If failed.
+     */
+    private CompletableFuture<Void> restoreAsync(
+        String snpName,
+        Collection<File> dirs,
+        boolean updateMeta,
+        BooleanSupplier stopChecker,
+        Consumer<Throwable> errHnd
+    ) throws IgniteCheckedException {
+        IgniteSnapshotManager snapshotMgr = ctx.cache().context().snapshotMgr();
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        List<CompletableFuture<Void>> futs = new ArrayList<>();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(snapshotMgr.snapshotLocalDir(snpName).getAbsolutePath(), pdsFolderName);
+
+            futs.add(CompletableFuture.runAsync(() -> {
+                try {
+                    ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+                }
+                catch (Throwable t) {
+                    errHnd.accept(t);
+                }
+            }, snapshotMgr.snapshotExecutorService()));
+        }
+
+        for (File cacheDir : dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            assert snpCacheDir.exists() : "node=" + ctx.localNodeId() + ", dir=" + snpCacheDir;
+
+            for (File snpFile : snpCacheDir.listFiles()) {
+                futs.add(CompletableFuture.runAsync(() -> {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    try {
+                        if (Thread.currentThread().isInterrupted())
+                            throw new IgniteInterruptedCheckedException("Thread has been interrupted.");
+
+                        File target = new File(cacheDir, snpFile.getName());
+
+                        if (log.isDebugEnabled()) {
+                            log.debug("Copying file from the snapshot " +
+                                "[snapshot=" + snpName +
+                                ", src=" + snpFile +
+                                ", target=" + target + "]");
+                        }
+
+                        Files.copy(snpFile.toPath(), target.toPath());
+                    }
+                    catch (IgniteInterruptedCheckedException | IOException e) {
+                        errHnd.accept(e);
+                    }
+                }, ctx.cache().context().snapshotMgr().snapshotExecutorService()));
+            }
+        }
+
+        int futsSize = futs.size();
+
+        return CompletableFuture.allOf(futs.toArray(new CompletableFuture[futsSize]));
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestoreRequest req) throws IgniteCheckedException {
+        if (opCtx != null) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta == null || !meta.consistentId().equals(cctx.localNode().consistentId().toString()))
+            return new SnapshotRestoreContext(req, Collections.emptyList(), Collections.emptyMap());
+
+        if (meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+        FilePageStoreManager pageStore = (FilePageStoreManager)cctx.pageStore();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), meta.folderName())) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            File cacheDir = pageStore.cacheWorkDir(snpCacheDir.getName().startsWith(CACHE_GRP_DIR_PREFIX), grpName);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+
+            pageStore.readCacheConfigurations(snpCacheDir, cfgsByName);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req, cacheDirs, cfgsById);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        if (ctx.clientNode())
+            return;
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        Exception failure = F.first(errs.values());
+
+        assert opCtx0 != null || failure != null : ctx.localNodeId();
+
+        if (opCtx0 == null) {
+            finishProcess(failure);
+
+            return;
+        }
+
+        if (failure == null)
+            failure = checkNodeLeft(opCtx0.nodes, res.keySet());
+
+        // Context has been created - should rollback changes cluster-wide.
+        if (failure != null) {
+            opCtx0.err.compareAndSet(null, failure);
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                rollbackRestoreProc.start(reqId, reqId);
+
+            return;
+        }
+
+        Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+        for (List<StoredCacheData> storedCfgs : res.values()) {
+            if (storedCfgs == null)
+                continue;
+
+            for (StoredCacheData cacheData : storedCfgs)
+                globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+        }
+
+        opCtx0.cfgs = globalCfgs;
+
+        if (U.isLocalNodeCoordinator(ctx.discovery()))
+            cacheStartProc.start(reqId, reqId);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        Throwable err = opCtx0.err.get();
+
+        if (err != null)
+            return new GridFinishedFuture<>(err);
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        Collection<StoredCacheData> ccfgs = opCtx0.cfgs.values();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting restored caches " +
+                "[reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName +
+                ", caches=" + F.viewReadOnly(ccfgs, c -> c.config().getName()) + ']');
+        }
+
+        // We set the topology node IDs required to successfully start the cache, if any of the required nodes leave
+        // the cluster during the cache startup, the whole procedure will be rolled back.
+        return ctx.cache().dynamicStartCachesByStoredConf(ccfgs, true, true, false, null, true, opCtx0.nodes);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishCacheStart(UUID reqId, Map<UUID, Boolean> res, Map<UUID, Exception> errs) {
+        if (ctx.clientNode())
+            return;
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        Exception failure = errs.values().stream().findFirst().
+            orElse(checkNodeLeft(opCtx0.nodes, res.keySet()));
+
+        if (failure == null) {
+            finishProcess();
+
+            return;
+        }
+
+        opCtx0.err.compareAndSet(null, failure);
+
+        if (U.isLocalNodeCoordinator(ctx.discovery()))
+            rollbackRestoreProc.start(reqId, reqId);
+    }
+
+    /**
+     * @param reqNodes Set of required topology nodes.
+     * @param respNodes Set of responding topology nodes.
+     * @return Error, if no response was received from the required topology node.
+     */
+    private Exception checkNodeLeft(Set<UUID> reqNodes, Set<UUID> respNodes) {
+        if (!respNodes.containsAll(reqNodes)) {
+            Set<UUID> leftNodes = new HashSet<>(reqNodes);
+
+            leftNodes.removeAll(respNodes);
+
+            return new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodes + ']');
+        }
+
+        return null;
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> rollback(UUID reqId) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (F.isEmpty(opCtx0.dirs))
+            return new GridFinishedFuture<>();
+
+        GridFutureAdapter<Boolean> retFut = new GridFutureAdapter<>();
+
+        synchronized (this) {
+            if (stopped)
+                return new GridFinishedFuture<>(new NodeStoppingException("Node is stopping."));
+
+            opCtx0.stopFut = new IgniteFutureImpl<>(retFut.chain(f -> null));
+        }
+
+        try {
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                if (log.isInfoEnabled()) {
+                    log.info("Removing restored cache directories [reqId=" + opCtx0.reqId +
+                        ", snapshot=" + opCtx0.snpName + ", dirs=" + opCtx0.dirs + ']');
+                }
+
+                IgniteCheckedException ex = null;
+
+                for (File cacheDir : opCtx0.dirs) {
+                    if (!cacheDir.exists())
+                        continue;
+
+                    if (!U.delete(cacheDir))
+                        ex = new IgniteCheckedException("Unable to remove directory " + cacheDir);
+                }
+
+                if (ex != null)
+                    retFut.onDone(ex);
+                else
+                    retFut.onDone(true);
+            });
+        } catch (RejectedExecutionException e) {
+            retFut.onDone(e);
+        }
+
+        return retFut;
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishRollback(UUID reqId, Map<UUID, Boolean> res, Map<UUID, Exception> errs) {
+        if (ctx.clientNode())
+            return;
+
+        for (Map.Entry<UUID, Exception> entry : errs.entrySet()) {

Review comment:
        This will flood the cluster log since it will be printed on all nodes for all nodes. Let's print the total of exceptions and the exception related to the current node only.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r623196474



##########
File path: modules/indexing/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreWithIndexingTest.java
##########
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.binary.BinaryBasicNameMapper;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.SqlFieldsQuery;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.query.GridQueryProcessor;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.G;
+import org.junit.Test;
+
+/**
+ * Cluster snapshot restore tests verifying SQL and indexing.
+ */
+public class IgniteClusterSnapshotRestoreWithIndexingTest extends IgniteClusterSnapshotRestoreBaseTest {
+    /** Type name used for binary and SQL. */
+    private static final String TYPE_NAME = IndexedObject.class.getName();
+
+    /** Number of cache keys to pre-create at node start. */
+    private static final int CACHE_KEYS_RANGE = 10_000;
+
+    /** Cache value builder. */
+    private Function<Integer, Object> valBuilder = new BinaryValueBuilder(TYPE_NAME);
+
+    /** {@inheritDoc} */
+    @Override protected <K, V> CacheConfiguration<K, V> txCacheConfig(CacheConfiguration<K, V> ccfg) {
+        return super.txCacheConfig(ccfg).setSqlIndexMaxInlineSize(255).setSqlSchema("PUBLIC")
+            .setQueryEntities(Collections.singletonList(new QueryEntity()
+                .setKeyType(Integer.class.getName())
+                .setValueType(TYPE_NAME)
+                .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+                .setIndexes(Collections.singletonList(new QueryIndex("id")))));
+    }
+
+    /** {@inheritDoc} */
+    @Override protected Function<Integer, Object> valueBuilder() {
+        return valBuilder;
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        valBuilder = new IndexedValueBuilder();
+
+        IgniteEx client = startGridsWithSnapshot(2, CACHE_KEYS_RANGE, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(DEFAULT_CACHE_NAME)).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = client.cache(DEFAULT_CACHE_NAME);
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(DEFAULT_CACHE_NAME)).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(DEFAULT_CACHE_NAME).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r600307157



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestorePrepareRequest.java
##########
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Set;
+import java.util.UUID;
+import org.apache.ignite.internal.util.typedef.internal.S;
+
+/**
+ * Request to prepare cache group restore from the snapshot.
+ */
+public class SnapshotRestorePrepareRequest implements Serializable {

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r623196674



##########
File path: modules/indexing/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreWithIndexingTest.java
##########
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.binary.BinaryBasicNameMapper;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.SqlFieldsQuery;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.query.GridQueryProcessor;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.G;
+import org.junit.Test;
+
+/**
+ * Cluster snapshot restore tests verifying SQL and indexing.
+ */
+public class IgniteClusterSnapshotRestoreWithIndexingTest extends IgniteClusterSnapshotRestoreBaseTest {
+    /** Type name used for binary and SQL. */
+    private static final String TYPE_NAME = IndexedObject.class.getName();
+
+    /** Number of cache keys to pre-create at node start. */
+    private static final int CACHE_KEYS_RANGE = 10_000;
+
+    /** Cache value builder. */
+    private Function<Integer, Object> valBuilder = new BinaryValueBuilder(TYPE_NAME);
+
+    /** {@inheritDoc} */
+    @Override protected <K, V> CacheConfiguration<K, V> txCacheConfig(CacheConfiguration<K, V> ccfg) {
+        return super.txCacheConfig(ccfg).setSqlIndexMaxInlineSize(255).setSqlSchema("PUBLIC")
+            .setQueryEntities(Collections.singletonList(new QueryEntity()
+                .setKeyType(Integer.class.getName())
+                .setValueType(TYPE_NAME)
+                .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+                .setIndexes(Collections.singletonList(new QueryIndex("id")))));
+    }
+
+    /** {@inheritDoc} */
+    @Override protected Function<Integer, Object> valueBuilder() {
+        return valBuilder;
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        valBuilder = new IndexedValueBuilder();
+
+        IgniteEx client = startGridsWithSnapshot(2, CACHE_KEYS_RANGE, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(DEFAULT_CACHE_NAME)).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = client.cache(DEFAULT_CACHE_NAME);
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, CACHE_KEYS_RANGE);

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r609769889



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       Let's add a test, that unable to create a snapshot during the restore procedure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r622977705



##########
File path: modules/core/src/main/java/org/apache/ignite/IgniteSnapshot.java
##########
@@ -48,4 +50,13 @@
      * @return Future which will be completed when cancel operation finished.
      */
     public IgniteFuture<Void> cancelSnapshot(String name);
+
+    /**
+     * Restore cache group(s) from the snapshot.

Review comment:
       description changed to
   ```
   Restore cache group(s) from the snapshot.
   <p>
   <b>NOTE:</b> Cache groups to be restored from the snapshot must not present in the cluster, if they present, they must be destroyed by the user before starting this operation.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r613238948



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();

Review comment:
       JUnit creates a new instance of the class for each test run, so this variable will always have its default value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r622376249



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,916 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.ClusterSnapshotFuture;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Temporary cache directory prefix. */
+    public static final String TMP_CACHE_DIR_PREFIX = ".tmp.snp.restore.";

Review comment:
       I think it's better to use the `_` instead of `.` for the cache directory prefix. I also think that a dot should be at the first place of the temparary directory name due to we are not hiding temparary directories.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595827978



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return new GridFinishedFuture<>();
+
+        if (!reqId.equals(opCtx0.reqId)) {
+            return new GridFinishedFuture<>(
+                new IgniteCheckedException("Unknown snapshot restore operation was rejected."));
+        }
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active."));
+
+        Throwable err = opCtx0.err.get();
+
+        if (err != null)
+            return new GridFinishedFuture<>(err);
+
+        if (!allNodesInBaselineAndAlive(opCtx0.nodes))

Review comment:
       removed allNodesInBaselineAndAlive
   (it seemed to me that sometimes `node left` callback notifies later than such direct checking that's why I kept it)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r596021298



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return new GridFinishedFuture<>();
+
+        if (!reqId.equals(opCtx0.reqId)) {
+            return new GridFinishedFuture<>(
+                new IgniteCheckedException("Unknown snapshot restore operation was rejected."));
+        }
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active."));
+
+        Throwable err = opCtx0.err.get();
+
+        if (err != null)
+            return new GridFinishedFuture<>(err);
+
+        if (!allNodesInBaselineAndAlive(opCtx0.nodes))
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        GridFutureAdapter<Boolean> retFut = new GridFutureAdapter<>();
+
+        try {
+            Collection<StoredCacheData> ccfgs = opCtx0.cfgs.values();
+
+            // Ensure that shared cache groups has no conflicts before start caches.
+            for (StoredCacheData cfg : ccfgs) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting restored caches " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName +
+                    ", caches=" + F.viewReadOnly(ccfgs, c -> c.config().getName()) + ']');
+            }
+
+            ctx.cache().dynamicStartCachesByStoredConf(ccfgs, true, true, false, null, true, opCtx0.nodes).listen(

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r596062466



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r622751052



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.OpenOption;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.IntSupplier;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.CacheMode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.FILE_SUFFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreProcess.TMP_CACHE_DIR_PREFIX;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends IgniteClusterSnapshotRestoreBaseTest {
+    /** Type name used for binary and SQL. */
+    private static final String TYPE_NAME = "CustomType";
+
+    /** Cache 1 name. */
+    private static final String CACHE1 = "cache1";
+
+    /** Cache 2 name. */
+    private static final String CACHE2 = "cache2";
+
+    /** Default shared cache group name. */
+    private static final String SHARED_GRP = "shared";
+
+    /** Cache value builder. */
+    private Function<Integer, Object> valBuilder = String::valueOf;
+
+    /** {@inheritDoc} */
+    @Override protected Function<Integer, Object> valueBuilder() {
+        return valBuilder;
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreAllGroups() throws Exception {
+        CacheConfiguration<Integer, Object> cacheCfg1 =
+            txCacheConfig(new CacheConfiguration<Integer, Object>(CACHE1)).setGroupName(SHARED_GRP);
+
+        CacheConfiguration<Integer, Object> cacheCfg2 =
+            txCacheConfig(new CacheConfiguration<Integer, Object>(CACHE2)).setGroupName(SHARED_GRP);
+
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder,
+            dfltCacheCfg.setBackups(0), cacheCfg1, cacheCfg2);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cache(CACHE1).destroy();
+        ignite.cache(CACHE2).destroy();
+        ignite.cache(DEFAULT_CACHE_NAME).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Restore all cache groups.
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, null).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(DEFAULT_CACHE_NAME), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(CACHE1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(CACHE2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testStartClusterSnapshotRestoreMultipleThreadsSameNode() throws Exception {
+        checkStartClusterSnapshotRestoreMultithreaded(() -> 0);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testStartClusterSnapshotRestoreMultipleThreadsDiffNode() throws Exception {
+        AtomicInteger nodeIdx = new AtomicInteger();
+
+        checkStartClusterSnapshotRestoreMultithreaded(nodeIdx::getAndIncrement);
+    }
+
+    /**
+     * @param nodeIdxSupplier Ignite node index supplier.
+     */
+    public void checkStartClusterSnapshotRestoreMultithreaded(IntSupplier nodeIdxSupplier) throws Exception {
+        Ignite ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        CountDownLatch startLatch = new CountDownLatch(1);
+        AtomicInteger successCnt = new AtomicInteger();
+
+        IgniteInternalFuture<Long> fut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                startLatch.await(TIMEOUT, TimeUnit.MILLISECONDS);
+
+                grid(nodeIdxSupplier.getAsInt()).snapshot().restoreSnapshot(
+                    SNAPSHOT_NAME, Collections.singleton(DEFAULT_CACHE_NAME)).get(TIMEOUT);
+
+                successCnt.incrementAndGet();
+            }
+            catch (Exception ignore) {
+                // Expected exception.

Review comment:
       There possible 2 exceptions, first about another process started, second (rare) - cache exists (if the second process will be delayed).
   Do you suggest checking for exceptions manually? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r596021465



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r600306664



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;

Review comment:
       changed, but from my point of view there is no difference




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r587500781



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreCacheGroupProcess.java
##########
@@ -0,0 +1,647 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.cluster.ClusterGroupAdapter;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreCacheGroupProcess {

Review comment:
       I think we can simplify the name to `SnapshotRestoreProcess` since restoring snapshots always assume that we are restoring some of the cache groups.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreCacheGroupProcess.java
##########
@@ -0,0 +1,647 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.cluster.ClusterGroupAdapter;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreCacheGroupProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, SnapshotRestoreEmptyResponse> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<SnapshotRestoreCacheStartRequest, SnapshotRestoreEmptyResponse> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut = new GridFutureAdapter<>();
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreCacheGroupProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+
+        fut.onDone();
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        IgniteInternalFuture<Void> fut0 = fut;
+
+        if (!fut0.isDone()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        if (ctx.cache().context().snapshotMgr().isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+            throw new IgniteException("Not all nodes in the cluster support a snapshot restore operation.");
+
+        Collection<ClusterNode> bltNodes = F.viewReadOnly(ctx.discovery().serverNodes(AffinityTopologyVersion.NONE),
+            node -> node, (node) -> CU.baselineNode(node, ctx.state().clusterState()));
+
+        Set<UUID> bltNodeIds = new HashSet<>(F.viewReadOnly(bltNodes, F.node2id()));
+
+        fut = new GridFutureAdapter<>();

Review comment:
       I don't think that it's safe. Two different threads must assign this value since both of them may pass your check - `fut0.isDone()`.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreCacheGroupProcess.java
##########
@@ -0,0 +1,647 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.cluster.ClusterGroupAdapter;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreCacheGroupProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, SnapshotRestoreEmptyResponse> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<SnapshotRestoreCacheStartRequest, SnapshotRestoreEmptyResponse> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut = new GridFutureAdapter<>();
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreCacheGroupProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+
+        fut.onDone();

Review comment:
       You can initialize this value directly as FunishedFuture, right?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
##########
@@ -1022,6 +1056,91 @@ private SnapshotMetadata readSnapshotMetadata(File smf) {
         }
     }
 
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> restoreCacheGroups(String snpName, Collection<String> grpNames) {
+        return restoreCacheGrpProc.start(snpName, grpNames);
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param grpName Cache group name.
+     * @param snpCacheDir Cache group directory in snapshot.
+     * @param stopChecker Node stop or prcoess interrupt checker.
+     * @param newFiles A list to keep track of the files created, the list updates during the restore process.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restoreCacheGroupFiles(

Review comment:
       Let's move this method to the `SnapshotRestoreCacheGroupProcess` class. Is there any reason to keep it here?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreVerificatioTask.java
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.compute.ComputeJob;
+import org.apache.ignite.compute.ComputeJobAdapter;
+import org.apache.ignite.compute.ComputeJobResult;
+import org.apache.ignite.compute.ComputeJobResultPolicy;
+import org.apache.ignite.compute.ComputeTaskAdapter;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.resources.IgniteInstanceResource;
+import org.jetbrains.annotations.NotNull;
+
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+
+/**
+ * Verification task for restoring a cache group from a snapshot.
+ */
+public class SnapshotRestoreVerificatioTask extends
+    ComputeTaskAdapter<SnapshotRestoreVerificationArg, SnapshotRestoreVerificationResult> {

Review comment:
       I don't think you need an `SnapshotRestoreVerificationArg` here, you can pass `snpName` and `grpNames` directly into the class constructor. The same when you're creating jobs for execution.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreVerificatioTask.java
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.compute.ComputeJob;
+import org.apache.ignite.compute.ComputeJobAdapter;
+import org.apache.ignite.compute.ComputeJobResult;
+import org.apache.ignite.compute.ComputeJobResultPolicy;
+import org.apache.ignite.compute.ComputeTaskAdapter;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.resources.IgniteInstanceResource;
+import org.jetbrains.annotations.NotNull;
+
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+
+/**
+ * Verification task for restoring a cache group from a snapshot.
+ */
+public class SnapshotRestoreVerificatioTask extends
+    ComputeTaskAdapter<SnapshotRestoreVerificationArg, SnapshotRestoreVerificationResult> {
+    /** Serial version uid. */
+    private static final long serialVersionUID = 0L;
+
+    /** {@inheritDoc} */
+    @Override public @NotNull Map<? extends ComputeJob, ClusterNode> map(List<ClusterNode> subgrid,
+        SnapshotRestoreVerificationArg arg) throws IgniteException {
+        Map<ComputeJob, ClusterNode> jobs = new HashMap<>();
+
+        for (ClusterNode node : subgrid)
+            jobs.put(new SnapshotRestoreVerificationJob(arg), node);
+
+        return jobs;
+    }
+
+    /** {@inheritDoc} */
+    @Override public SnapshotRestoreVerificationResult reduce(List<ComputeJobResult> results) throws IgniteException {
+        SnapshotRestoreVerificationResult firstRes = null;
+
+        for (ComputeJobResult jobRes : results) {
+            SnapshotRestoreVerificationResult res = jobRes.getData();
+
+            if (res == null)
+                continue;
+
+            if (firstRes == null) {
+                firstRes = res;
+
+                continue;
+            }
+
+            if (firstRes.configs().size() != res.configs().size()) {
+                throw new IgniteException("Count of cache configs mismatch [" +
+                    "node1=" + firstRes.localNodeId() + ", cnt1=" + firstRes.configs().size() +
+                    ", node2=" + res.localNodeId() + ", cnt2=" + res.configs().size() + ']');
+            }
+        }
+
+        return firstRes;
+    }
+
+    /** {@inheritDoc} */
+    @Override public ComputeJobResultPolicy result(ComputeJobResult res, List<ComputeJobResult> rcvd) {
+        IgniteException e = res.getException();
+
+        // Don't failover this job, if topology changed - user should restart operation.
+        if (e != null)
+            throw e;
+
+        return super.result(res, rcvd);
+    }
+
+    /** */
+    private static class SnapshotRestoreVerificationJob extends ComputeJobAdapter {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Auto-injected grid instance. */
+        @IgniteInstanceResource
+        private transient IgniteEx ignite;
+
+        /** Job argument. */
+        private final SnapshotRestoreVerificationArg arg;
+
+        /**
+         * @param arg Job argument.
+         */
+        public SnapshotRestoreVerificationJob(SnapshotRestoreVerificationArg arg) {
+            this.arg = arg;
+        }
+
+        /** {@inheritDoc} */
+        @Override public Object execute() throws IgniteException {
+            assert !ignite.context().clientNode();
+
+            try {
+                return resolveRestoredConfigs();
+            }
+            catch (BinaryObjectException e) {
+                throw new IgniteException("Incompatible binary types found: " + e.getMessage());
+            } catch (IOException | IgniteCheckedException e) {
+                throw F.wrap(e);
+            }
+        }
+
+        /**
+         * Collect cache configurations and verify binary compatibility of specified cache groups.
+         *
+         * @return List of stored cache configurations with local node ID.
+         * @throws IgniteCheckedException If the snapshot is incompatible.
+         * @throws IOException In case of I/O errors while reading the memory page size
+         */
+        private SnapshotRestoreVerificationResult resolveRestoredConfigs() throws IgniteCheckedException, IOException {

Review comment:
       You can inline this method.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreVerificatioTask.java
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.compute.ComputeJob;
+import org.apache.ignite.compute.ComputeJobAdapter;
+import org.apache.ignite.compute.ComputeJobResult;
+import org.apache.ignite.compute.ComputeJobResultPolicy;
+import org.apache.ignite.compute.ComputeTaskAdapter;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.resources.IgniteInstanceResource;
+import org.jetbrains.annotations.NotNull;
+
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+
+/**
+ * Verification task for restoring a cache group from a snapshot.
+ */
+public class SnapshotRestoreVerificatioTask extends
+    ComputeTaskAdapter<SnapshotRestoreVerificationArg, SnapshotRestoreVerificationResult> {
+    /** Serial version uid. */
+    private static final long serialVersionUID = 0L;
+
+    /** {@inheritDoc} */
+    @Override public @NotNull Map<? extends ComputeJob, ClusterNode> map(List<ClusterNode> subgrid,
+        SnapshotRestoreVerificationArg arg) throws IgniteException {
+        Map<ComputeJob, ClusterNode> jobs = new HashMap<>();
+
+        for (ClusterNode node : subgrid)
+            jobs.put(new SnapshotRestoreVerificationJob(arg), node);
+
+        return jobs;
+    }
+
+    /** {@inheritDoc} */
+    @Override public SnapshotRestoreVerificationResult reduce(List<ComputeJobResult> results) throws IgniteException {
+        SnapshotRestoreVerificationResult firstRes = null;
+
+        for (ComputeJobResult jobRes : results) {
+            SnapshotRestoreVerificationResult res = jobRes.getData();
+
+            if (res == null)
+                continue;
+
+            if (firstRes == null) {
+                firstRes = res;
+
+                continue;
+            }
+
+            if (firstRes.configs().size() != res.configs().size()) {
+                throw new IgniteException("Count of cache configs mismatch [" +
+                    "node1=" + firstRes.localNodeId() + ", cnt1=" + firstRes.configs().size() +
+                    ", node2=" + res.localNodeId() + ", cnt2=" + res.configs().size() + ']');
+            }
+        }
+
+        return firstRes;
+    }
+
+    /** {@inheritDoc} */
+    @Override public ComputeJobResultPolicy result(ComputeJobResult res, List<ComputeJobResult> rcvd) {
+        IgniteException e = res.getException();
+
+        // Don't failover this job, if topology changed - user should restart operation.
+        if (e != null)
+            throw e;
+
+        return super.result(res, rcvd);
+    }
+
+    /** */
+    private static class SnapshotRestoreVerificationJob extends ComputeJobAdapter {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Auto-injected grid instance. */
+        @IgniteInstanceResource
+        private transient IgniteEx ignite;
+
+        /** Job argument. */
+        private final SnapshotRestoreVerificationArg arg;
+
+        /**
+         * @param arg Job argument.
+         */
+        public SnapshotRestoreVerificationJob(SnapshotRestoreVerificationArg arg) {
+            this.arg = arg;
+        }
+
+        /** {@inheritDoc} */
+        @Override public Object execute() throws IgniteException {
+            assert !ignite.context().clientNode();
+
+            try {
+                return resolveRestoredConfigs();
+            }
+            catch (BinaryObjectException e) {
+                throw new IgniteException("Incompatible binary types found: " + e.getMessage());
+            } catch (IOException | IgniteCheckedException e) {
+                throw F.wrap(e);
+            }
+        }
+
+        /**
+         * Collect cache configurations and verify binary compatibility of specified cache groups.
+         *
+         * @return List of stored cache configurations with local node ID.
+         * @throws IgniteCheckedException If the snapshot is incompatible.
+         * @throws IOException In case of I/O errors while reading the memory page size
+         */
+        private SnapshotRestoreVerificationResult resolveRestoredConfigs() throws IgniteCheckedException, IOException {
+            Map<String, StoredCacheData> cacheCfgs = new HashMap<>();
+            GridCacheSharedContext<?, ?> cctx = ignite.context().cache().context();
+            String folderName = ignite.context().pdsFolderResolver().resolveFolders().folderName();
+
+            // Collect cache configuration(s) and verify cache groups page size.
+            for (File cacheDir : cctx.snapshotMgr().snapshotCacheDirectories(arg.snapshotName(), folderName)) {
+                String grpName = FilePageStoreManager.cacheGroupName(cacheDir);
+
+                if (!arg.groups().contains(grpName))
+                    continue;
+
+                ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(cacheDir, cacheCfgs);
+
+                List<File> parts = FilePageStoreManager.cachePartitionFiles(cacheDir);
+
+                if (F.isEmpty(parts))
+                    continue;
+
+                int pageSize = ((GridCacheDatabaseSharedManager)cctx.database())
+                    .resolvePageSizeFromPartitionFile(parts.get(0).toPath());
+
+                if (pageSize != cctx.database().pageSize()) {
+                    throw new IgniteCheckedException("Incompatible memory page size " +
+                        "[snapshotPageSize=" + pageSize +
+                        ", nodePageSize=" + cctx.database().pageSize() +
+                        ", group=" + grpName +
+                        ", snapshot=" + arg.snapshotName() + ']');
+                }
+            }
+
+            if (cacheCfgs.isEmpty())
+                return null;
+
+            File binDir = binaryWorkDir(
+                cctx.snapshotMgr().snapshotLocalDir(arg.snapshotName()).getAbsolutePath(),
+                folderName);
+
+            ignite.context().cacheObjects().checkMetadata(binDir);
+
+            return new SnapshotRestoreVerificationResult(new ArrayList<>(cacheCfgs.values()), ignite.localNode().id());

Review comment:
       You can return `Map<UUID, List<StoredCacheData>>`. Do you need a dedicated class-wrapper for this?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreVerificatioTask.java
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.compute.ComputeJob;
+import org.apache.ignite.compute.ComputeJobAdapter;
+import org.apache.ignite.compute.ComputeJobResult;
+import org.apache.ignite.compute.ComputeJobResultPolicy;
+import org.apache.ignite.compute.ComputeTaskAdapter;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.resources.IgniteInstanceResource;
+import org.jetbrains.annotations.NotNull;
+
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+
+/**
+ * Verification task for restoring a cache group from a snapshot.
+ */
+public class SnapshotRestoreVerificatioTask extends

Review comment:
       SnapshotRestoreVerificatioTask > SnapshotRestoreVerificationTask

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreEmptyResponse.java
##########
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Serializable;
+
+/**
+ * Snapshot restore operation single node response.
+ */
+public class SnapshotRestoreEmptyResponse implements Serializable {

Review comment:
       Would it be enough to return a simple `Boolean` object instead? In all the usages.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreCacheGroupProcess.java
##########
@@ -0,0 +1,647 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.cluster.ClusterGroupAdapter;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreCacheGroupProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, SnapshotRestoreEmptyResponse> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<SnapshotRestoreCacheStartRequest, SnapshotRestoreEmptyResponse> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut = new GridFutureAdapter<>();
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreCacheGroupProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+
+        fut.onDone();
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        IgniteInternalFuture<Void> fut0 = fut;
+
+        if (!fut0.isDone()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        if (ctx.cache().context().snapshotMgr().isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+            throw new IgniteException("Not all nodes in the cluster support a snapshot restore operation.");
+
+        Collection<ClusterNode> bltNodes = F.viewReadOnly(ctx.discovery().serverNodes(AffinityTopologyVersion.NONE),
+            node -> node, (node) -> CU.baselineNode(node, ctx.state().clusterState()));
+
+        Set<UUID> bltNodeIds = new HashSet<>(F.viewReadOnly(bltNodes, F.node2id()));
+
+        fut = new GridFutureAdapter<>();
+
+        ((ClusterGroupAdapter)ctx.cluster().get().forNodeIds(bltNodeIds)).compute().executeAsync(
+            new SnapshotRestoreVerificatioTask(), new SnapshotRestoreVerificationArg(snpName, cacheGrpNames)).listen(
+            f -> {
+                try {
+                    SnapshotRestoreVerificationResult res = f.get();
+
+                    Set<String> foundGrps = res == null ? Collections.emptySet() : res.configs().stream()
+                        .map(v -> v.config().getGroupName() != null ? v.config().getGroupName() : v.config().getName())
+                        .collect(Collectors.toSet());
+
+                    if (!foundGrps.containsAll(cacheGrpNames)) {
+                        Set<String> missedGroups = new HashSet<>(cacheGrpNames);
+
+                        missedGroups.removeAll(foundGrps);
+
+                        fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                            "snapshot [groups=" + missedGroups + ", snapshot=" + snpName + ']'));
+
+                        return;
+                    }
+
+                    SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(
+                        UUID.randomUUID(), snpName, bltNodeIds, res.configs(), res.localNodeId());
+
+                    prepareRestoreProc.start(req.requestId(), req);
+                } catch (Throwable t) {
+                    fut.onDone(new IgniteException(OP_REJECT_MSG + t.getMessage(), t));
+                }
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if the cache group restore process is currently running.
+     *
+     * @return {@code True} if cache group restore process is currently running.
+     */
+    public boolean inProgress(@Nullable String cacheName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        return !staleProcess(fut, opCtx0) && (cacheName == null || opCtx0.containsCache(cacheName));
+    }
+
+    /**
+     * @param fut The future of cache snapshot restore operation.
+     * @param opCtx Snapshot restore operation context.
+     * @return {@code True} if the future completed or not initiated.
+     */
+    public boolean staleProcess(IgniteInternalFuture<Void> fut, SnapshotRestoreContext opCtx) {
+        return fut.isDone() || opCtx == null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes().contains(leftNodeId)) {
+            opCtx0.interrupt(new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.interrupt(reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IllegalStateException If cache with the specified name already exists.
+     */
+    private void ensureCacheAbsent(String name) throws IllegalStateException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<SnapshotRestoreEmptyResponse> prepare(SnapshotRestorePrepareRequest req) {
+        if (!req.nodes().contains(ctx.localNodeId()))
+            return new GridFinishedFuture<>();
+
+        if (inProgress(null)) {
+            return new GridFinishedFuture<>(
+                new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        // Skip creating future on initiator.
+        if (fut.isDone())
+            fut = new GridFutureAdapter<>();
+
+        opCtx = new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), req.configs());
+
+        fut.listen(f -> opCtx = null);
+
+        if (!allNodesInBaselineAndAlive(req.nodes()))
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        GridFutureAdapter<SnapshotRestoreEmptyResponse> retFut = new GridFutureAdapter<>();
+
+        try {
+            for (String grpName : opCtx0.groups())
+                ensureCacheAbsent(grpName);
+
+            for (StoredCacheData cfg : opCtx0.configs()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (!ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx0.snapshotName()).exists())
+                return new GridFinishedFuture<>();
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+
+            ctx.getSystemExecutorService().submit(() -> {

Review comment:
       I think the `IgniteSnapshotManager#snpRunner` executor service must be used here instead since the `creation` and `restore` procedures can't be run in-parallel.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreVerificatioTask.java
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.compute.ComputeJob;
+import org.apache.ignite.compute.ComputeJobAdapter;
+import org.apache.ignite.compute.ComputeJobResult;
+import org.apache.ignite.compute.ComputeJobResultPolicy;
+import org.apache.ignite.compute.ComputeTaskAdapter;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.resources.IgniteInstanceResource;
+import org.jetbrains.annotations.NotNull;
+
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+
+/**
+ * Verification task for restoring a cache group from a snapshot.
+ */
+public class SnapshotRestoreVerificatioTask extends
+    ComputeTaskAdapter<SnapshotRestoreVerificationArg, SnapshotRestoreVerificationResult> {
+    /** Serial version uid. */
+    private static final long serialVersionUID = 0L;
+
+    /** {@inheritDoc} */
+    @Override public @NotNull Map<? extends ComputeJob, ClusterNode> map(List<ClusterNode> subgrid,
+        SnapshotRestoreVerificationArg arg) throws IgniteException {
+        Map<ComputeJob, ClusterNode> jobs = new HashMap<>();
+
+        for (ClusterNode node : subgrid)
+            jobs.put(new SnapshotRestoreVerificationJob(arg), node);
+
+        return jobs;
+    }
+
+    /** {@inheritDoc} */
+    @Override public SnapshotRestoreVerificationResult reduce(List<ComputeJobResult> results) throws IgniteException {
+        SnapshotRestoreVerificationResult firstRes = null;
+
+        for (ComputeJobResult jobRes : results) {
+            SnapshotRestoreVerificationResult res = jobRes.getData();
+
+            if (res == null)
+                continue;
+
+            if (firstRes == null) {
+                firstRes = res;
+
+                continue;
+            }
+
+            if (firstRes.configs().size() != res.configs().size()) {
+                throw new IgniteException("Count of cache configs mismatch [" +
+                    "node1=" + firstRes.localNodeId() + ", cnt1=" + firstRes.configs().size() +
+                    ", node2=" + res.localNodeId() + ", cnt2=" + res.configs().size() + ']');
+            }
+        }
+
+        return firstRes;
+    }
+
+    /** {@inheritDoc} */
+    @Override public ComputeJobResultPolicy result(ComputeJobResult res, List<ComputeJobResult> rcvd) {
+        IgniteException e = res.getException();
+
+        // Don't failover this job, if topology changed - user should restart operation.
+        if (e != null)
+            throw e;
+
+        return super.result(res, rcvd);

Review comment:
       Should we use `ComputeJobResultPolicy#FAILOVER` or `ComputeJobResultPolicy#REDUCE`. Maybe I'm missing something, but from my point, It is enough to receive the configuration from the first node and proceed with them. Right?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreCacheGroupProcess.java
##########
@@ -0,0 +1,647 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.cluster.ClusterGroupAdapter;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreCacheGroupProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, SnapshotRestoreEmptyResponse> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<SnapshotRestoreCacheStartRequest, SnapshotRestoreEmptyResponse> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut = new GridFutureAdapter<>();
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreCacheGroupProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+
+        fut.onDone();
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        IgniteInternalFuture<Void> fut0 = fut;
+
+        if (!fut0.isDone()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        if (ctx.cache().context().snapshotMgr().isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+            throw new IgniteException("Not all nodes in the cluster support a snapshot restore operation.");
+
+        Collection<ClusterNode> bltNodes = F.viewReadOnly(ctx.discovery().serverNodes(AffinityTopologyVersion.NONE),
+            node -> node, (node) -> CU.baselineNode(node, ctx.state().clusterState()));
+
+        Set<UUID> bltNodeIds = new HashSet<>(F.viewReadOnly(bltNodes, F.node2id()));
+
+        fut = new GridFutureAdapter<>();
+
+        ((ClusterGroupAdapter)ctx.cluster().get().forNodeIds(bltNodeIds)).compute().executeAsync(
+            new SnapshotRestoreVerificatioTask(), new SnapshotRestoreVerificationArg(snpName, cacheGrpNames)).listen(
+            f -> {
+                try {
+                    SnapshotRestoreVerificationResult res = f.get();
+
+                    Set<String> foundGrps = res == null ? Collections.emptySet() : res.configs().stream()
+                        .map(v -> v.config().getGroupName() != null ? v.config().getGroupName() : v.config().getName())
+                        .collect(Collectors.toSet());
+
+                    if (!foundGrps.containsAll(cacheGrpNames)) {
+                        Set<String> missedGroups = new HashSet<>(cacheGrpNames);
+
+                        missedGroups.removeAll(foundGrps);
+
+                        fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                            "snapshot [groups=" + missedGroups + ", snapshot=" + snpName + ']'));
+
+                        return;
+                    }
+
+                    SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(
+                        UUID.randomUUID(), snpName, bltNodeIds, res.configs(), res.localNodeId());
+
+                    prepareRestoreProc.start(req.requestId(), req);
+                } catch (Throwable t) {
+                    fut.onDone(new IgniteException(OP_REJECT_MSG + t.getMessage(), t));
+                }
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if the cache group restore process is currently running.
+     *
+     * @return {@code True} if cache group restore process is currently running.
+     */
+    public boolean inProgress(@Nullable String cacheName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        return !staleProcess(fut, opCtx0) && (cacheName == null || opCtx0.containsCache(cacheName));
+    }
+
+    /**
+     * @param fut The future of cache snapshot restore operation.
+     * @param opCtx Snapshot restore operation context.
+     * @return {@code True} if the future completed or not initiated.
+     */
+    public boolean staleProcess(IgniteInternalFuture<Void> fut, SnapshotRestoreContext opCtx) {
+        return fut.isDone() || opCtx == null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes().contains(leftNodeId)) {
+            opCtx0.interrupt(new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.interrupt(reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IllegalStateException If cache with the specified name already exists.
+     */
+    private void ensureCacheAbsent(String name) throws IllegalStateException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<SnapshotRestoreEmptyResponse> prepare(SnapshotRestorePrepareRequest req) {
+        if (!req.nodes().contains(ctx.localNodeId()))
+            return new GridFinishedFuture<>();
+
+        if (inProgress(null)) {
+            return new GridFinishedFuture<>(
+                new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        // Skip creating future on initiator.
+        if (fut.isDone())
+            fut = new GridFutureAdapter<>();
+
+        opCtx = new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), req.configs());
+
+        fut.listen(f -> opCtx = null);
+
+        if (!allNodesInBaselineAndAlive(req.nodes()))
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        GridFutureAdapter<SnapshotRestoreEmptyResponse> retFut = new GridFutureAdapter<>();
+
+        try {
+            for (String grpName : opCtx0.groups())
+                ensureCacheAbsent(grpName);
+
+            for (StoredCacheData cfg : opCtx0.configs()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (!ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx0.snapshotName()).exists())
+                return new GridFinishedFuture<>();
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+
+            ctx.getSystemExecutorService().submit(() -> {
+                try {
+                    opCtx0.restore(updateMeta);
+
+                    if (!opCtx0.interrupted()) {
+                        retFut.onDone();
+
+                        return;
+                    }
+
+                    log.error("Snapshot restore process has been interrupted " +
+                        "[groups=" + opCtx0.groups() + ", snapshot=" + opCtx0.snapshotName() + ']', opCtx0.error());
+
+                    opCtx0.rollback();
+
+                    retFut.onDone(opCtx0.error());
+
+                }
+                catch (Throwable t) {
+                    retFut.onDone(t);
+                }
+            });
+
+            return retFut;
+        } catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, SnapshotRestoreEmptyResponse> res, Map<UUID, Exception> errs) {
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0.isDone() || !reqId.equals(opCtx.requestId()))
+            return;
+
+        Exception failure = F.first(errs.values());
+
+        if (failure != null) {
+            opCtx.rollback();
+
+            fut0.onDone(failure);
+
+            return;
+        }
+
+        if (U.isLocalNodeCoordinator(ctx.discovery()))
+            cacheStartProc.start(reqId, new SnapshotRestoreCacheStartRequest(reqId));
+    }
+
+    /**
+     * @param req Request to start restored cache groups.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<SnapshotRestoreEmptyResponse> cacheStart(SnapshotRestoreCacheStartRequest req) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (staleProcess(fut, opCtx0))
+            return new GridFinishedFuture<>();
+
+        if (!req.requestId().equals(opCtx0.requestId()))
+            return new GridFinishedFuture<>(new IgniteException("Unknown snapshot restore operation was rejected."));
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (opCtx0.interrupted())
+            return new GridFinishedFuture<>(opCtx0.error());
+
+        if (!allNodesInBaselineAndAlive(opCtx0.nodes()))
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        GridFutureAdapter<SnapshotRestoreEmptyResponse> retFut = new GridFutureAdapter<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting restored caches " +
+                "[snapshot=" + opCtx0.snapshotName() +
+                ", caches=" + F.viewReadOnly(opCtx0.configs(), c -> c.config().getName()) + ']');
+        }
+
+        ctx.cache().dynamicStartCachesByStoredConf(opCtx.configs(), true, true, false, null, true, opCtx0.nodes()).listen(
+            f -> {
+                if (f.error() != null) {
+                    log.error("Unable to start restored caches.", f.error());
+
+                    retFut.onDone(f.error());
+                }
+                else
+                    retFut.onDone();
+            }
+        );
+
+        return retFut;
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishCacheStart(UUID reqId, Map<UUID, SnapshotRestoreEmptyResponse> res, Map<UUID, Exception> errs) {
+        GridFutureAdapter<Void> fut0 = fut;
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (staleProcess(fut0, opCtx0) || !reqId.equals(opCtx0.requestId()))
+            return;
+
+        Exception failure = F.first(errs.values());
+
+        if (failure == null && !res.keySet().containsAll(opCtx0.nodes())) {
+            Set<UUID> leftNodes = new HashSet<>(opCtx0.nodes());
+
+            leftNodes.removeAll(res.keySet());
+
+            failure = new IgniteException(OP_REJECT_MSG + "Server node(s) has left the cluster [nodeId=" + leftNodes + ']');
+        }
+
+        if (failure != null) {
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                rollbackRestoreProc.start(reqId, new SnapshotRestoreRollbackRequest(reqId, failure));
+
+            return;
+        }
+
+        fut0.onDone();
+    }
+
+    /**
+     * @param req Request to rollback cache group restore process.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<SnapshotRestoreRollbackResponse> rollback(SnapshotRestoreRollbackRequest req) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (staleProcess(fut, opCtx0) || !req.requestId().equals(opCtx0.requestId()))
+            return new GridFinishedFuture<>();
+
+        if (!opCtx0.nodes().contains(ctx.localNodeId()))
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled())
+            log.info("Performing rollback routine for restored cache groups [groups=" + opCtx0.groups() + ']');
+
+        opCtx0.rollback();
+
+        return new GridFinishedFuture<>(new SnapshotRestoreRollbackResponse(req.error()));
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishRollback(UUID reqId, Map<UUID, SnapshotRestoreRollbackResponse> res, Map<UUID, Exception> errs) {
+        GridFutureAdapter<Void> fut0 = fut;
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (staleProcess(fut0, opCtx0) || !reqId.equals(opCtx0.requestId()))
+            return;
+
+        SnapshotRestoreRollbackResponse resp = F.first(F.viewReadOnly(res.values(), v -> v, Objects::nonNull));
+
+        fut0.onDone(resp.error());
+    }
+
+    /**
+     * @param nodeIds Set of required baseline node IDs.
+     * @return {@code True} if all of the specified nodes present in baseline and alive.
+     */
+    private boolean allNodesInBaselineAndAlive(Set<UUID> nodeIds) {
+        for (UUID nodeId : nodeIds) {
+            ClusterNode node = ctx.discovery().node(nodeId);
+
+            if (node == null || !CU.baselineNode(node, ctx.state().clusterState()) || !ctx.discovery().alive(node))
+                return false;
+        }
+
+        return true;
+    }
+
+    /**
+     * Cache group restore from snapshot operation context.
+     */
+    private class SnapshotRestoreContext {
+        /** Request ID. */
+        private final UUID reqId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Baseline node IDs that must be alive to complete the operation. */
+        private final Set<UUID> reqNodes;
+
+        /** List of processed cache IDs. */
+        private final Set<Integer> cacheIds = new HashSet<>();
+
+        /** Cache configurations. */
+        private final List<StoredCacheData> ccfgs;
+
+        /** Restored cache groups. */
+        private final Map<String, List<File>> grps = new ConcurrentHashMap<>();
+
+        /** The exception that led to the interruption of the process. */
+        private final AtomicReference<Throwable> errRef = new AtomicReference<>();
+
+        /**
+         * @param reqId Request ID.
+         * @param snpName Snapshot name.
+         * @param reqNodes Baseline node IDs that must be alive to complete the operation.
+         * @param cfgs Stored cache configurations.
+         */
+        protected SnapshotRestoreContext(UUID reqId, String snpName, Set<UUID> reqNodes, List<StoredCacheData> cfgs) {
+            ccfgs = new ArrayList<>(cfgs);
+
+            for (StoredCacheData cacheData : cfgs) {
+                String cacheName = cacheData.config().getName();
+
+                cacheIds.add(CU.cacheId(cacheName));
+
+                boolean shared = cacheData.config().getGroupName() != null;
+
+                grps.computeIfAbsent(shared ? cacheData.config().getGroupName() : cacheName, v -> new ArrayList<>());
+
+                if (shared)
+                    cacheIds.add(CU.cacheId(cacheData.config().getGroupName()));
+            }
+
+            this.reqId = reqId;
+            this.reqNodes = new HashSet<>(reqNodes);
+            this.snpName = snpName;
+        }
+
+        /** @return Request ID. */
+        protected UUID requestId() {
+            return reqId;
+        }
+
+        /** @return Baseline node IDs that must be alive to complete the operation. */
+        protected Set<UUID> nodes() {
+            return Collections.unmodifiableSet(reqNodes);
+        }
+
+        /** @return Snapshot name. */
+        protected String snapshotName() {
+            return snpName;
+        }
+
+        /**
+         * @return List of cache group names to restore from the snapshot.
+         */
+        protected Set<String> groups() {
+            return grps.keySet();
+        }
+
+        /**
+         * @param name Cache name.
+         * @return {@code True} if the cache with the specified name is currently being restored.
+         */
+        protected boolean containsCache(String name) {
+            return cacheIds.contains(CU.cacheId(name));
+        }
+
+        /** @return Cache configurations. */
+        protected Collection<StoredCacheData> configs() {
+            return ccfgs;
+        }
+
+        /**
+         * @param err Error.
+         * @return {@code True} if operation has been interrupted by this call.
+         */
+        protected boolean interrupt(Exception err) {
+            return errRef.compareAndSet(null, err);
+        }
+
+        /**
+         * @return Interrupted flag.
+         */
+        protected boolean interrupted() {
+            return error() != null;
+        }
+
+        /**
+         * @return Error if operation was interrupted, otherwise {@code null}.
+         */
+        protected @Nullable Throwable error() {
+            return errRef.get();
+        }
+
+        /**
+         * Restore specified cache groups from the local snapshot directory.
+         *
+         * @param updateMetadata Update binary metadata flag.
+         * @throws IgniteCheckedException If failed.
+         */
+        protected void restore(boolean updateMetadata) throws IgniteCheckedException {
+            if (interrupted())
+                return;
+
+            IgniteSnapshotManager snapshotMgr = ctx.cache().context().snapshotMgr();
+            String folderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+            if (updateMetadata) {
+                File binDir = binaryWorkDir(snapshotMgr.snapshotLocalDir(snpName).getAbsolutePath(), folderName);
+
+                if (!binDir.exists()) {
+                    throw new IgniteCheckedException("Unable to update cluster metadata from snapshot, " +
+                        "directory doesn't exists [snapshot=" + snpName + ", dir=" + binDir + ']');
+                }
+
+                ctx.cacheObjects().updateMetadata(binDir, this::interrupted);
+            }
+
+            for (File grpDir : snapshotMgr.snapshotCacheDirectories(snpName, folderName)) {
+                String grpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+                if (!groups().contains(grpName))
+                    continue;
+
+                snapshotMgr.restoreCacheGroupFiles(snpName, grpName, grpDir, this::interrupted, grps.get(grpName));

Review comment:
       Is it possible to interrupt the copy procedure prior to all partitions of the cache groups will be processed? I think we should submit on executor `tasks-to-copy-partition` per each partition independently.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreCacheGroupProcess.java
##########
@@ -0,0 +1,647 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.cluster.ClusterGroupAdapter;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreCacheGroupProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, SnapshotRestoreEmptyResponse> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<SnapshotRestoreCacheStartRequest, SnapshotRestoreEmptyResponse> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut = new GridFutureAdapter<>();

Review comment:
       I suggest splitting this future usage logic:
   - on the node which initiates the restore procedure this `fut` will be not null until the procedure will be completed.
   - on the other nodes which participate in the restore procedure this field will be `null`

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreVerificatioTask.java
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.compute.ComputeJob;
+import org.apache.ignite.compute.ComputeJobAdapter;
+import org.apache.ignite.compute.ComputeJobResult;
+import org.apache.ignite.compute.ComputeJobResultPolicy;
+import org.apache.ignite.compute.ComputeTaskAdapter;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.resources.IgniteInstanceResource;
+import org.jetbrains.annotations.NotNull;
+
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+
+/**
+ * Verification task for restoring a cache group from a snapshot.
+ */
+public class SnapshotRestoreVerificatioTask extends
+    ComputeTaskAdapter<SnapshotRestoreVerificationArg, SnapshotRestoreVerificationResult> {
+    /** Serial version uid. */
+    private static final long serialVersionUID = 0L;
+
+    /** {@inheritDoc} */
+    @Override public @NotNull Map<? extends ComputeJob, ClusterNode> map(List<ClusterNode> subgrid,
+        SnapshotRestoreVerificationArg arg) throws IgniteException {
+        Map<ComputeJob, ClusterNode> jobs = new HashMap<>();
+
+        for (ClusterNode node : subgrid)
+            jobs.put(new SnapshotRestoreVerificationJob(arg), node);
+
+        return jobs;
+    }
+
+    /** {@inheritDoc} */
+    @Override public SnapshotRestoreVerificationResult reduce(List<ComputeJobResult> results) throws IgniteException {
+        SnapshotRestoreVerificationResult firstRes = null;
+
+        for (ComputeJobResult jobRes : results) {
+            SnapshotRestoreVerificationResult res = jobRes.getData();
+
+            if (res == null)
+                continue;
+
+            if (firstRes == null) {
+                firstRes = res;
+
+                continue;
+            }
+
+            if (firstRes.configs().size() != res.configs().size()) {
+                throw new IgniteException("Count of cache configs mismatch [" +
+                    "node1=" + firstRes.localNodeId() + ", cnt1=" + firstRes.configs().size() +
+                    ", node2=" + res.localNodeId() + ", cnt2=" + res.configs().size() + ']');
+            }
+        }
+
+        return firstRes;
+    }
+
+    /** {@inheritDoc} */
+    @Override public ComputeJobResultPolicy result(ComputeJobResult res, List<ComputeJobResult> rcvd) {
+        IgniteException e = res.getException();
+
+        // Don't failover this job, if topology changed - user should restart operation.
+        if (e != null)
+            throw e;
+
+        return super.result(res, rcvd);
+    }
+
+    /** */
+    private static class SnapshotRestoreVerificationJob extends ComputeJobAdapter {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Auto-injected grid instance. */
+        @IgniteInstanceResource
+        private transient IgniteEx ignite;
+
+        /** Job argument. */
+        private final SnapshotRestoreVerificationArg arg;
+
+        /**
+         * @param arg Job argument.
+         */
+        public SnapshotRestoreVerificationJob(SnapshotRestoreVerificationArg arg) {
+            this.arg = arg;
+        }
+
+        /** {@inheritDoc} */
+        @Override public Object execute() throws IgniteException {
+            assert !ignite.context().clientNode();
+
+            try {
+                return resolveRestoredConfigs();
+            }
+            catch (BinaryObjectException e) {
+                throw new IgniteException("Incompatible binary types found: " + e.getMessage());

Review comment:
       I don't think we need to catch `BinaryObjectException` exception at all.

##########
File path: modules/core/src/main/java/org/apache/ignite/IgniteSnapshot.java
##########
@@ -48,4 +49,13 @@
      * @return Future which will be completed when cancel operation finished.
      */
     public IgniteFuture<Void> cancelSnapshot(String name);
+
+    /**
+     * Restore cache group(s) from the snapshot.
+     *
+     * @param snapshotName Snapshot name.
+     * @param cacheGroupNames Cache groups to be restored.
+     * @return Future which will be completed when restore operation finished.
+     */
+    public IgniteFuture<Void> restoreCacheGroups(String snapshotName, Collection<String> cacheGroupNames);

Review comment:
       Let's simply rename to the `restoreSnapshot`.
   `snapshotName` > `name`

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteSnapshotManager.java
##########
@@ -1022,6 +1056,91 @@ private SnapshotMetadata readSnapshotMetadata(File smf) {
         }
     }
 
+    /** {@inheritDoc} */
+    @Override public IgniteFuture<Void> restoreCacheGroups(String snpName, Collection<String> grpNames) {
+        return restoreCacheGrpProc.start(snpName, grpNames);
+    }
+
+    /**
+     * @param snpName Snapshot name.
+     * @param grpName Cache group name.
+     * @param snpCacheDir Cache group directory in snapshot.
+     * @param stopChecker Node stop or prcoess interrupt checker.
+     * @param newFiles A list to keep track of the files created, the list updates during the restore process.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restoreCacheGroupFiles(
+        String snpName,
+        String grpName,
+        File snpCacheDir,
+        BooleanSupplier stopChecker,
+        List<File> newFiles
+    ) throws IgniteCheckedException {
+        File cacheDir = U.resolveWorkDirectory(cctx.kernalContext().config().getWorkDirectory(),
+            Paths.get(databaseRelativePath(pdsSettings.folderName()), snpCacheDir.getName()).toString(), false);
+
+        if (!cacheDir.exists()) {
+            newFiles.add(cacheDir);
+
+            cacheDir.mkdir();
+        }
+        else
+            if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+        try {
+            if (log.isInfoEnabled())
+                log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+            for (File snpFile : snpCacheDir.listFiles()) {
+                if (stopChecker.getAsBoolean())
+                    return;
+
+                File target = new File(cacheDir, snpFile.getName());
+
+                if (log.isDebugEnabled()) {
+                    log.debug("Copying file from the snapshot " +
+                        "[snapshot=" + snpName +
+                        ", grp=" + grpName +
+                        ", src=" + snpFile +
+                        ", target=" + target + "]");
+                }
+
+                newFiles.add(target);
+
+                Files.copy(snpFile.toPath(), target.toPath());
+            }
+        }
+        catch (IOException e) {
+            throw new IgniteCheckedException("Unable to copy file [snapshot=" + snpName + ", grp=" + grpName + ']', e);
+        }
+    }
+
+    /**
+     * @param files Collection of files to delete.
+     */
+    protected void rollbackRestoreOperation(Collection<File> files) {

Review comment:
       Let's move it to `SnapshotRestoreCacheGroupProcess`.
   

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreVerificatioTask.java
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.compute.ComputeJob;
+import org.apache.ignite.compute.ComputeJobAdapter;
+import org.apache.ignite.compute.ComputeJobResult;
+import org.apache.ignite.compute.ComputeJobResultPolicy;
+import org.apache.ignite.compute.ComputeTaskAdapter;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.resources.IgniteInstanceResource;
+import org.jetbrains.annotations.NotNull;
+
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+
+/**
+ * Verification task for restoring a cache group from a snapshot.
+ */
+public class SnapshotRestoreVerificatioTask extends
+    ComputeTaskAdapter<SnapshotRestoreVerificationArg, SnapshotRestoreVerificationResult> {
+    /** Serial version uid. */
+    private static final long serialVersionUID = 0L;
+
+    /** {@inheritDoc} */
+    @Override public @NotNull Map<? extends ComputeJob, ClusterNode> map(List<ClusterNode> subgrid,
+        SnapshotRestoreVerificationArg arg) throws IgniteException {
+        Map<ComputeJob, ClusterNode> jobs = new HashMap<>();
+
+        for (ClusterNode node : subgrid)
+            jobs.put(new SnapshotRestoreVerificationJob(arg), node);
+
+        return jobs;
+    }
+
+    /** {@inheritDoc} */
+    @Override public SnapshotRestoreVerificationResult reduce(List<ComputeJobResult> results) throws IgniteException {
+        SnapshotRestoreVerificationResult firstRes = null;
+
+        for (ComputeJobResult jobRes : results) {
+            SnapshotRestoreVerificationResult res = jobRes.getData();
+
+            if (res == null)
+                continue;
+
+            if (firstRes == null) {
+                firstRes = res;
+
+                continue;
+            }
+
+            if (firstRes.configs().size() != res.configs().size()) {
+                throw new IgniteException("Count of cache configs mismatch [" +
+                    "node1=" + firstRes.localNodeId() + ", cnt1=" + firstRes.configs().size() +
+                    ", node2=" + res.localNodeId() + ", cnt2=" + res.configs().size() + ']');
+            }
+        }
+
+        return firstRes;
+    }
+
+    /** {@inheritDoc} */
+    @Override public ComputeJobResultPolicy result(ComputeJobResult res, List<ComputeJobResult> rcvd) {
+        IgniteException e = res.getException();
+
+        // Don't failover this job, if topology changed - user should restart operation.
+        if (e != null)
+            throw e;
+
+        return super.result(res, rcvd);
+    }
+
+    /** */
+    private static class SnapshotRestoreVerificationJob extends ComputeJobAdapter {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Auto-injected grid instance. */
+        @IgniteInstanceResource
+        private transient IgniteEx ignite;
+
+        /** Job argument. */
+        private final SnapshotRestoreVerificationArg arg;
+
+        /**
+         * @param arg Job argument.
+         */
+        public SnapshotRestoreVerificationJob(SnapshotRestoreVerificationArg arg) {
+            this.arg = arg;
+        }
+
+        /** {@inheritDoc} */
+        @Override public Object execute() throws IgniteException {
+            assert !ignite.context().clientNode();
+
+            try {
+                return resolveRestoredConfigs();
+            }
+            catch (BinaryObjectException e) {
+                throw new IgniteException("Incompatible binary types found: " + e.getMessage());
+            } catch (IOException | IgniteCheckedException e) {
+                throw F.wrap(e);
+            }
+        }
+
+        /**
+         * Collect cache configurations and verify binary compatibility of specified cache groups.
+         *
+         * @return List of stored cache configurations with local node ID.
+         * @throws IgniteCheckedException If the snapshot is incompatible.
+         * @throws IOException In case of I/O errors while reading the memory page size
+         */
+        private SnapshotRestoreVerificationResult resolveRestoredConfigs() throws IgniteCheckedException, IOException {
+            Map<String, StoredCacheData> cacheCfgs = new HashMap<>();
+            GridCacheSharedContext<?, ?> cctx = ignite.context().cache().context();
+            String folderName = ignite.context().pdsFolderResolver().resolveFolders().folderName();
+
+            // Collect cache configuration(s) and verify cache groups page size.
+            for (File cacheDir : cctx.snapshotMgr().snapshotCacheDirectories(arg.snapshotName(), folderName)) {
+                String grpName = FilePageStoreManager.cacheGroupName(cacheDir);
+
+                if (!arg.groups().contains(grpName))
+                    continue;
+
+                ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(cacheDir, cacheCfgs);
+
+                List<File> parts = FilePageStoreManager.cachePartitionFiles(cacheDir);
+
+                if (F.isEmpty(parts))
+                    continue;
+
+                int pageSize = ((GridCacheDatabaseSharedManager)cctx.database())

Review comment:
       Let's use `readSnapshotMetadatas()` and `SnapshotMetadata#pageSize` here. WDYT?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreVerificatioTask.java
##########
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.compute.ComputeJob;
+import org.apache.ignite.compute.ComputeJobAdapter;
+import org.apache.ignite.compute.ComputeJobResult;
+import org.apache.ignite.compute.ComputeJobResultPolicy;
+import org.apache.ignite.compute.ComputeTaskAdapter;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.resources.IgniteInstanceResource;
+import org.jetbrains.annotations.NotNull;
+
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+
+/**
+ * Verification task for restoring a cache group from a snapshot.
+ */
+public class SnapshotRestoreVerificatioTask extends
+    ComputeTaskAdapter<SnapshotRestoreVerificationArg, SnapshotRestoreVerificationResult> {
+    /** Serial version uid. */
+    private static final long serialVersionUID = 0L;
+
+    /** {@inheritDoc} */
+    @Override public @NotNull Map<? extends ComputeJob, ClusterNode> map(List<ClusterNode> subgrid,
+        SnapshotRestoreVerificationArg arg) throws IgniteException {
+        Map<ComputeJob, ClusterNode> jobs = new HashMap<>();
+
+        for (ClusterNode node : subgrid)
+            jobs.put(new SnapshotRestoreVerificationJob(arg), node);
+
+        return jobs;
+    }
+
+    /** {@inheritDoc} */
+    @Override public SnapshotRestoreVerificationResult reduce(List<ComputeJobResult> results) throws IgniteException {
+        SnapshotRestoreVerificationResult firstRes = null;
+
+        for (ComputeJobResult jobRes : results) {
+            SnapshotRestoreVerificationResult res = jobRes.getData();
+
+            if (res == null)
+                continue;
+
+            if (firstRes == null) {
+                firstRes = res;
+
+                continue;
+            }
+
+            if (firstRes.configs().size() != res.configs().size()) {
+                throw new IgniteException("Count of cache configs mismatch [" +
+                    "node1=" + firstRes.localNodeId() + ", cnt1=" + firstRes.configs().size() +
+                    ", node2=" + res.localNodeId() + ", cnt2=" + res.configs().size() + ']');
+            }
+        }
+
+        return firstRes;
+    }
+
+    /** {@inheritDoc} */
+    @Override public ComputeJobResultPolicy result(ComputeJobResult res, List<ComputeJobResult> rcvd) {
+        IgniteException e = res.getException();
+
+        // Don't failover this job, if topology changed - user should restart operation.
+        if (e != null)
+            throw e;
+
+        return super.result(res, rcvd);
+    }
+
+    /** */
+    private static class SnapshotRestoreVerificationJob extends ComputeJobAdapter {
+        /** Serial version uid. */
+        private static final long serialVersionUID = 0L;
+
+        /** Auto-injected grid instance. */
+        @IgniteInstanceResource
+        private transient IgniteEx ignite;
+
+        /** Job argument. */
+        private final SnapshotRestoreVerificationArg arg;
+
+        /**
+         * @param arg Job argument.
+         */
+        public SnapshotRestoreVerificationJob(SnapshotRestoreVerificationArg arg) {
+            this.arg = arg;
+        }
+
+        /** {@inheritDoc} */
+        @Override public Object execute() throws IgniteException {
+            assert !ignite.context().clientNode();
+
+            try {
+                return resolveRestoredConfigs();
+            }
+            catch (BinaryObjectException e) {
+                throw new IgniteException("Incompatible binary types found: " + e.getMessage());
+            } catch (IOException | IgniteCheckedException e) {
+                throw F.wrap(e);

Review comment:
       AFAIK, `F.wrap()` is used for closures inside future callbacks. It seems it is enough to wrap with `IgniteException` here. 

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreCacheStartRequest.java
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Serializable;
+import java.util.UUID;
+import org.apache.ignite.internal.util.typedef.internal.S;
+
+/**
+ * Request to start restored cache group.
+ */
+public class SnapshotRestoreCacheStartRequest implements Serializable {

Review comment:
       Can you use directly the `UUID` instead of the creation `SnapshotRestoreCacheStartRequest` class?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreCacheGroupProcess.java
##########
@@ -0,0 +1,647 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.cluster.ClusterGroupAdapter;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreCacheGroupProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, SnapshotRestoreEmptyResponse> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<SnapshotRestoreCacheStartRequest, SnapshotRestoreEmptyResponse> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut = new GridFutureAdapter<>();
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreCacheGroupProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+
+        fut.onDone();
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        IgniteInternalFuture<Void> fut0 = fut;
+
+        if (!fut0.isDone()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        if (ctx.cache().context().snapshotMgr().isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+            throw new IgniteException("Not all nodes in the cluster support a snapshot restore operation.");
+
+        Collection<ClusterNode> bltNodes = F.viewReadOnly(ctx.discovery().serverNodes(AffinityTopologyVersion.NONE),
+            node -> node, (node) -> CU.baselineNode(node, ctx.state().clusterState()));
+
+        Set<UUID> bltNodeIds = new HashSet<>(F.viewReadOnly(bltNodes, F.node2id()));
+
+        fut = new GridFutureAdapter<>();
+
+        ((ClusterGroupAdapter)ctx.cluster().get().forNodeIds(bltNodeIds)).compute().executeAsync(
+            new SnapshotRestoreVerificatioTask(), new SnapshotRestoreVerificationArg(snpName, cacheGrpNames)).listen(
+            f -> {
+                try {
+                    SnapshotRestoreVerificationResult res = f.get();
+
+                    Set<String> foundGrps = res == null ? Collections.emptySet() : res.configs().stream()
+                        .map(v -> v.config().getGroupName() != null ? v.config().getGroupName() : v.config().getName())
+                        .collect(Collectors.toSet());
+
+                    if (!foundGrps.containsAll(cacheGrpNames)) {
+                        Set<String> missedGroups = new HashSet<>(cacheGrpNames);
+
+                        missedGroups.removeAll(foundGrps);
+
+                        fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                            "snapshot [groups=" + missedGroups + ", snapshot=" + snpName + ']'));
+
+                        return;
+                    }
+
+                    SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(
+                        UUID.randomUUID(), snpName, bltNodeIds, res.configs(), res.localNodeId());
+
+                    prepareRestoreProc.start(req.requestId(), req);
+                } catch (Throwable t) {
+                    fut.onDone(new IgniteException(OP_REJECT_MSG + t.getMessage(), t));
+                }
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if the cache group restore process is currently running.
+     *
+     * @return {@code True} if cache group restore process is currently running.
+     */
+    public boolean inProgress(@Nullable String cacheName) {
+        SnapshotRestoreContext opCtx0 = opCtx;

Review comment:
       I suggest taking into account the `opCtx` class property. If it not null and contains the `cacheName` than return `true`.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreCacheGroupProcess.java
##########
@@ -0,0 +1,647 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.cluster.ClusterGroupAdapter;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreCacheGroupProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, SnapshotRestoreEmptyResponse> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<SnapshotRestoreCacheStartRequest, SnapshotRestoreEmptyResponse> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut = new GridFutureAdapter<>();
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreCacheGroupProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+
+        fut.onDone();
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        IgniteInternalFuture<Void> fut0 = fut;
+
+        if (!fut0.isDone()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        if (ctx.cache().context().snapshotMgr().isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+            throw new IgniteException("Not all nodes in the cluster support a snapshot restore operation.");
+
+        Collection<ClusterNode> bltNodes = F.viewReadOnly(ctx.discovery().serverNodes(AffinityTopologyVersion.NONE),
+            node -> node, (node) -> CU.baselineNode(node, ctx.state().clusterState()));
+
+        Set<UUID> bltNodeIds = new HashSet<>(F.viewReadOnly(bltNodes, F.node2id()));
+
+        fut = new GridFutureAdapter<>();
+
+        ((ClusterGroupAdapter)ctx.cluster().get().forNodeIds(bltNodeIds)).compute().executeAsync(
+            new SnapshotRestoreVerificatioTask(), new SnapshotRestoreVerificationArg(snpName, cacheGrpNames)).listen(
+            f -> {
+                try {
+                    SnapshotRestoreVerificationResult res = f.get();
+
+                    Set<String> foundGrps = res == null ? Collections.emptySet() : res.configs().stream()
+                        .map(v -> v.config().getGroupName() != null ? v.config().getGroupName() : v.config().getName())
+                        .collect(Collectors.toSet());
+
+                    if (!foundGrps.containsAll(cacheGrpNames)) {
+                        Set<String> missedGroups = new HashSet<>(cacheGrpNames);
+
+                        missedGroups.removeAll(foundGrps);
+
+                        fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                            "snapshot [groups=" + missedGroups + ", snapshot=" + snpName + ']'));
+
+                        return;
+                    }
+
+                    SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(
+                        UUID.randomUUID(), snpName, bltNodeIds, res.configs(), res.localNodeId());
+
+                    prepareRestoreProc.start(req.requestId(), req);
+                } catch (Throwable t) {
+                    fut.onDone(new IgniteException(OP_REJECT_MSG + t.getMessage(), t));
+                }
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if the cache group restore process is currently running.
+     *
+     * @return {@code True} if cache group restore process is currently running.
+     */
+    public boolean inProgress(@Nullable String cacheName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        return !staleProcess(fut, opCtx0) && (cacheName == null || opCtx0.containsCache(cacheName));
+    }
+
+    /**
+     * @param fut The future of cache snapshot restore operation.
+     * @param opCtx Snapshot restore operation context.
+     * @return {@code True} if the future completed or not initiated.
+     */
+    public boolean staleProcess(IgniteInternalFuture<Void> fut, SnapshotRestoreContext opCtx) {
+        return fut.isDone() || opCtx == null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes().contains(leftNodeId)) {
+            opCtx0.interrupt(new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.interrupt(reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IllegalStateException If cache with the specified name already exists.
+     */
+    private void ensureCacheAbsent(String name) throws IllegalStateException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<SnapshotRestoreEmptyResponse> prepare(SnapshotRestorePrepareRequest req) {
+        if (!req.nodes().contains(ctx.localNodeId()))
+            return new GridFinishedFuture<>();
+
+        if (inProgress(null)) {
+            return new GridFinishedFuture<>(
+                new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        // Skip creating future on initiator.
+        if (fut.isDone())
+            fut = new GridFutureAdapter<>();
+
+        opCtx = new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), req.configs());
+
+        fut.listen(f -> opCtx = null);
+
+        if (!allNodesInBaselineAndAlive(req.nodes()))
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        GridFutureAdapter<SnapshotRestoreEmptyResponse> retFut = new GridFutureAdapter<>();
+
+        try {
+            for (String grpName : opCtx0.groups())
+                ensureCacheAbsent(grpName);
+
+            for (StoredCacheData cfg : opCtx0.configs()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (!ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx0.snapshotName()).exists())
+                return new GridFinishedFuture<>();
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+
+            ctx.getSystemExecutorService().submit(() -> {
+                try {
+                    opCtx0.restore(updateMeta);
+
+                    if (!opCtx0.interrupted()) {
+                        retFut.onDone();
+
+                        return;
+                    }
+
+                    log.error("Snapshot restore process has been interrupted " +
+                        "[groups=" + opCtx0.groups() + ", snapshot=" + opCtx0.snapshotName() + ']', opCtx0.error());
+
+                    opCtx0.rollback();
+
+                    retFut.onDone(opCtx0.error());
+
+                }
+                catch (Throwable t) {
+                    retFut.onDone(t);
+                }
+            });
+
+            return retFut;
+        } catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, SnapshotRestoreEmptyResponse> res, Map<UUID, Exception> errs) {
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0.isDone() || !reqId.equals(opCtx.requestId()))
+            return;
+
+        Exception failure = F.first(errs.values());
+
+        if (failure != null) {
+            opCtx.rollback();
+
+            fut0.onDone(failure);
+
+            return;
+        }
+
+        if (U.isLocalNodeCoordinator(ctx.discovery()))
+            cacheStartProc.start(reqId, new SnapshotRestoreCacheStartRequest(reqId));
+    }
+
+    /**
+     * @param req Request to start restored cache groups.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<SnapshotRestoreEmptyResponse> cacheStart(SnapshotRestoreCacheStartRequest req) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (staleProcess(fut, opCtx0))
+            return new GridFinishedFuture<>();
+
+        if (!req.requestId().equals(opCtx0.requestId()))
+            return new GridFinishedFuture<>(new IgniteException("Unknown snapshot restore operation was rejected."));
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (opCtx0.interrupted())
+            return new GridFinishedFuture<>(opCtx0.error());
+
+        if (!allNodesInBaselineAndAlive(opCtx0.nodes()))
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        GridFutureAdapter<SnapshotRestoreEmptyResponse> retFut = new GridFutureAdapter<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting restored caches " +
+                "[snapshot=" + opCtx0.snapshotName() +
+                ", caches=" + F.viewReadOnly(opCtx0.configs(), c -> c.config().getName()) + ']');
+        }
+
+        ctx.cache().dynamicStartCachesByStoredConf(opCtx.configs(), true, true, false, null, true, opCtx0.nodes()).listen(
+            f -> {
+                if (f.error() != null) {
+                    log.error("Unable to start restored caches.", f.error());
+
+                    retFut.onDone(f.error());
+                }
+                else
+                    retFut.onDone();
+            }
+        );
+
+        return retFut;
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishCacheStart(UUID reqId, Map<UUID, SnapshotRestoreEmptyResponse> res, Map<UUID, Exception> errs) {
+        GridFutureAdapter<Void> fut0 = fut;
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (staleProcess(fut0, opCtx0) || !reqId.equals(opCtx0.requestId()))
+            return;
+
+        Exception failure = F.first(errs.values());
+
+        if (failure == null && !res.keySet().containsAll(opCtx0.nodes())) {
+            Set<UUID> leftNodes = new HashSet<>(opCtx0.nodes());
+
+            leftNodes.removeAll(res.keySet());
+
+            failure = new IgniteException(OP_REJECT_MSG + "Server node(s) has left the cluster [nodeId=" + leftNodes + ']');
+        }
+
+        if (failure != null) {
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                rollbackRestoreProc.start(reqId, new SnapshotRestoreRollbackRequest(reqId, failure));
+
+            return;
+        }
+
+        fut0.onDone();
+    }
+
+    /**
+     * @param req Request to rollback cache group restore process.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<SnapshotRestoreRollbackResponse> rollback(SnapshotRestoreRollbackRequest req) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (staleProcess(fut, opCtx0) || !req.requestId().equals(opCtx0.requestId()))
+            return new GridFinishedFuture<>();
+
+        if (!opCtx0.nodes().contains(ctx.localNodeId()))
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled())
+            log.info("Performing rollback routine for restored cache groups [groups=" + opCtx0.groups() + ']');
+
+        opCtx0.rollback();
+
+        return new GridFinishedFuture<>(new SnapshotRestoreRollbackResponse(req.error()));
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishRollback(UUID reqId, Map<UUID, SnapshotRestoreRollbackResponse> res, Map<UUID, Exception> errs) {
+        GridFutureAdapter<Void> fut0 = fut;
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (staleProcess(fut0, opCtx0) || !reqId.equals(opCtx0.requestId()))
+            return;
+
+        SnapshotRestoreRollbackResponse resp = F.first(F.viewReadOnly(res.values(), v -> v, Objects::nonNull));
+
+        fut0.onDone(resp.error());
+    }
+
+    /**
+     * @param nodeIds Set of required baseline node IDs.
+     * @return {@code True} if all of the specified nodes present in baseline and alive.
+     */
+    private boolean allNodesInBaselineAndAlive(Set<UUID> nodeIds) {
+        for (UUID nodeId : nodeIds) {
+            ClusterNode node = ctx.discovery().node(nodeId);
+
+            if (node == null || !CU.baselineNode(node, ctx.state().clusterState()) || !ctx.discovery().alive(node))
+                return false;
+        }
+
+        return true;
+    }
+
+    /**
+     * Cache group restore from snapshot operation context.
+     */
+    private class SnapshotRestoreContext {
+        /** Request ID. */
+        private final UUID reqId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Baseline node IDs that must be alive to complete the operation. */
+        private final Set<UUID> reqNodes;
+
+        /** List of processed cache IDs. */
+        private final Set<Integer> cacheIds = new HashSet<>();
+
+        /** Cache configurations. */
+        private final List<StoredCacheData> ccfgs;
+
+        /** Restored cache groups. */
+        private final Map<String, List<File>> grps = new ConcurrentHashMap<>();
+
+        /** The exception that led to the interruption of the process. */
+        private final AtomicReference<Throwable> errRef = new AtomicReference<>();
+
+        /**
+         * @param reqId Request ID.
+         * @param snpName Snapshot name.
+         * @param reqNodes Baseline node IDs that must be alive to complete the operation.
+         * @param cfgs Stored cache configurations.
+         */
+        protected SnapshotRestoreContext(UUID reqId, String snpName, Set<UUID> reqNodes, List<StoredCacheData> cfgs) {
+            ccfgs = new ArrayList<>(cfgs);
+
+            for (StoredCacheData cacheData : cfgs) {
+                String cacheName = cacheData.config().getName();
+
+                cacheIds.add(CU.cacheId(cacheName));
+
+                boolean shared = cacheData.config().getGroupName() != null;
+
+                grps.computeIfAbsent(shared ? cacheData.config().getGroupName() : cacheName, v -> new ArrayList<>());
+
+                if (shared)
+                    cacheIds.add(CU.cacheId(cacheData.config().getGroupName()));
+            }
+
+            this.reqId = reqId;
+            this.reqNodes = new HashSet<>(reqNodes);
+            this.snpName = snpName;
+        }
+
+        /** @return Request ID. */
+        protected UUID requestId() {
+            return reqId;
+        }
+
+        /** @return Baseline node IDs that must be alive to complete the operation. */
+        protected Set<UUID> nodes() {
+            return Collections.unmodifiableSet(reqNodes);
+        }
+
+        /** @return Snapshot name. */
+        protected String snapshotName() {
+            return snpName;
+        }
+
+        /**
+         * @return List of cache group names to restore from the snapshot.
+         */
+        protected Set<String> groups() {
+            return grps.keySet();
+        }
+
+        /**
+         * @param name Cache name.
+         * @return {@code True} if the cache with the specified name is currently being restored.
+         */
+        protected boolean containsCache(String name) {
+            return cacheIds.contains(CU.cacheId(name));
+        }
+
+        /** @return Cache configurations. */
+        protected Collection<StoredCacheData> configs() {
+            return ccfgs;
+        }
+
+        /**
+         * @param err Error.
+         * @return {@code True} if operation has been interrupted by this call.
+         */
+        protected boolean interrupt(Exception err) {
+            return errRef.compareAndSet(null, err);
+        }
+
+        /**
+         * @return Interrupted flag.
+         */
+        protected boolean interrupted() {
+            return error() != null;
+        }
+
+        /**
+         * @return Error if operation was interrupted, otherwise {@code null}.
+         */
+        protected @Nullable Throwable error() {
+            return errRef.get();
+        }
+
+        /**
+         * Restore specified cache groups from the local snapshot directory.
+         *
+         * @param updateMetadata Update binary metadata flag.
+         * @throws IgniteCheckedException If failed.
+         */
+        protected void restore(boolean updateMetadata) throws IgniteCheckedException {

Review comment:
       The `restore` method should be outside the `context` since it performs some actions over data. The `context` should only store the restore process state.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreCacheGroupProcess.java
##########
@@ -0,0 +1,647 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.cluster.ClusterGroupAdapter;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreCacheGroupProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, SnapshotRestoreEmptyResponse> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<SnapshotRestoreCacheStartRequest, SnapshotRestoreEmptyResponse> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut = new GridFutureAdapter<>();
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreCacheGroupProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+
+        fut.onDone();
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        IgniteInternalFuture<Void> fut0 = fut;
+
+        if (!fut0.isDone()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        if (ctx.cache().context().snapshotMgr().isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+            throw new IgniteException("Not all nodes in the cluster support a snapshot restore operation.");
+
+        Collection<ClusterNode> bltNodes = F.viewReadOnly(ctx.discovery().serverNodes(AffinityTopologyVersion.NONE),
+            node -> node, (node) -> CU.baselineNode(node, ctx.state().clusterState()));
+
+        Set<UUID> bltNodeIds = new HashSet<>(F.viewReadOnly(bltNodes, F.node2id()));
+
+        fut = new GridFutureAdapter<>();
+
+        ((ClusterGroupAdapter)ctx.cluster().get().forNodeIds(bltNodeIds)).compute().executeAsync(
+            new SnapshotRestoreVerificatioTask(), new SnapshotRestoreVerificationArg(snpName, cacheGrpNames)).listen(
+            f -> {
+                try {
+                    SnapshotRestoreVerificationResult res = f.get();
+
+                    Set<String> foundGrps = res == null ? Collections.emptySet() : res.configs().stream()
+                        .map(v -> v.config().getGroupName() != null ? v.config().getGroupName() : v.config().getName())
+                        .collect(Collectors.toSet());
+
+                    if (!foundGrps.containsAll(cacheGrpNames)) {
+                        Set<String> missedGroups = new HashSet<>(cacheGrpNames);
+
+                        missedGroups.removeAll(foundGrps);
+
+                        fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                            "snapshot [groups=" + missedGroups + ", snapshot=" + snpName + ']'));
+
+                        return;
+                    }
+
+                    SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(
+                        UUID.randomUUID(), snpName, bltNodeIds, res.configs(), res.localNodeId());
+
+                    prepareRestoreProc.start(req.requestId(), req);
+                } catch (Throwable t) {
+                    fut.onDone(new IgniteException(OP_REJECT_MSG + t.getMessage(), t));
+                }
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if the cache group restore process is currently running.
+     *
+     * @return {@code True} if cache group restore process is currently running.
+     */
+    public boolean inProgress(@Nullable String cacheName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        return !staleProcess(fut, opCtx0) && (cacheName == null || opCtx0.containsCache(cacheName));
+    }
+
+    /**
+     * @param fut The future of cache snapshot restore operation.
+     * @param opCtx Snapshot restore operation context.
+     * @return {@code True} if the future completed or not initiated.
+     */
+    public boolean staleProcess(IgniteInternalFuture<Void> fut, SnapshotRestoreContext opCtx) {
+        return fut.isDone() || opCtx == null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes().contains(leftNodeId)) {
+            opCtx0.interrupt(new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.interrupt(reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IllegalStateException If cache with the specified name already exists.
+     */
+    private void ensureCacheAbsent(String name) throws IllegalStateException {

Review comment:
       I don't think we should declare `IllegalStateException` in the method header and it can be removed. The method is too small to check the and we can see all that is going on.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreCacheGroupProcess.java
##########
@@ -0,0 +1,647 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.cluster.ClusterGroupAdapter;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreCacheGroupProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, SnapshotRestoreEmptyResponse> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<SnapshotRestoreCacheStartRequest, SnapshotRestoreEmptyResponse> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut = new GridFutureAdapter<>();
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreCacheGroupProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+
+        fut.onDone();
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        IgniteInternalFuture<Void> fut0 = fut;
+
+        if (!fut0.isDone()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        if (ctx.cache().context().snapshotMgr().isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+            throw new IgniteException("Not all nodes in the cluster support a snapshot restore operation.");
+
+        Collection<ClusterNode> bltNodes = F.viewReadOnly(ctx.discovery().serverNodes(AffinityTopologyVersion.NONE),
+            node -> node, (node) -> CU.baselineNode(node, ctx.state().clusterState()));
+
+        Set<UUID> bltNodeIds = new HashSet<>(F.viewReadOnly(bltNodes, F.node2id()));
+
+        fut = new GridFutureAdapter<>();
+
+        ((ClusterGroupAdapter)ctx.cluster().get().forNodeIds(bltNodeIds)).compute().executeAsync(
+            new SnapshotRestoreVerificatioTask(), new SnapshotRestoreVerificationArg(snpName, cacheGrpNames)).listen(
+            f -> {
+                try {
+                    SnapshotRestoreVerificationResult res = f.get();
+
+                    Set<String> foundGrps = res == null ? Collections.emptySet() : res.configs().stream()
+                        .map(v -> v.config().getGroupName() != null ? v.config().getGroupName() : v.config().getName())
+                        .collect(Collectors.toSet());
+
+                    if (!foundGrps.containsAll(cacheGrpNames)) {
+                        Set<String> missedGroups = new HashSet<>(cacheGrpNames);
+
+                        missedGroups.removeAll(foundGrps);
+
+                        fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                            "snapshot [groups=" + missedGroups + ", snapshot=" + snpName + ']'));
+
+                        return;
+                    }
+
+                    SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(
+                        UUID.randomUUID(), snpName, bltNodeIds, res.configs(), res.localNodeId());
+
+                    prepareRestoreProc.start(req.requestId(), req);
+                } catch (Throwable t) {
+                    fut.onDone(new IgniteException(OP_REJECT_MSG + t.getMessage(), t));
+                }
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if the cache group restore process is currently running.
+     *
+     * @return {@code True} if cache group restore process is currently running.
+     */
+    public boolean inProgress(@Nullable String cacheName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        return !staleProcess(fut, opCtx0) && (cacheName == null || opCtx0.containsCache(cacheName));
+    }
+
+    /**
+     * @param fut The future of cache snapshot restore operation.
+     * @param opCtx Snapshot restore operation context.
+     * @return {@code True} if the future completed or not initiated.
+     */
+    public boolean staleProcess(IgniteInternalFuture<Void> fut, SnapshotRestoreContext opCtx) {
+        return fut.isDone() || opCtx == null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes().contains(leftNodeId)) {
+            opCtx0.interrupt(new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.interrupt(reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IllegalStateException If cache with the specified name already exists.
+     */
+    private void ensureCacheAbsent(String name) throws IllegalStateException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<SnapshotRestoreEmptyResponse> prepare(SnapshotRestorePrepareRequest req) {
+        if (!req.nodes().contains(ctx.localNodeId()))
+            return new GridFinishedFuture<>();
+
+        if (inProgress(null)) {
+            return new GridFinishedFuture<>(
+                new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        // Skip creating future on initiator.
+        if (fut.isDone())
+            fut = new GridFutureAdapter<>();
+
+        opCtx = new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), req.configs());
+
+        fut.listen(f -> opCtx = null);
+
+        if (!allNodesInBaselineAndAlive(req.nodes()))
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        GridFutureAdapter<SnapshotRestoreEmptyResponse> retFut = new GridFutureAdapter<>();
+
+        try {
+            for (String grpName : opCtx0.groups())
+                ensureCacheAbsent(grpName);
+
+            for (StoredCacheData cfg : opCtx0.configs()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (!ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx0.snapshotName()).exists())
+                return new GridFinishedFuture<>();
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+
+            ctx.getSystemExecutorService().submit(() -> {
+                try {
+                    opCtx0.restore(updateMeta);
+
+                    if (!opCtx0.interrupted()) {
+                        retFut.onDone();
+
+                        return;
+                    }
+
+                    log.error("Snapshot restore process has been interrupted " +
+                        "[groups=" + opCtx0.groups() + ", snapshot=" + opCtx0.snapshotName() + ']', opCtx0.error());
+
+                    opCtx0.rollback();
+
+                    retFut.onDone(opCtx0.error());
+
+                }
+                catch (Throwable t) {
+                    retFut.onDone(t);
+                }
+            });
+
+            return retFut;
+        } catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, SnapshotRestoreEmptyResponse> res, Map<UUID, Exception> errs) {
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0.isDone() || !reqId.equals(opCtx.requestId()))
+            return;
+
+        Exception failure = F.first(errs.values());
+
+        if (failure != null) {
+            opCtx.rollback();

Review comment:
       Deleting a large number of files may be a heavy operation.  We should use atomic operations like deleting the whole directory at once or perform such operations on executor service.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreCacheGroupProcess.java
##########
@@ -0,0 +1,647 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.cluster.ClusterGroupAdapter;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreCacheGroupProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, SnapshotRestoreEmptyResponse> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<SnapshotRestoreCacheStartRequest, SnapshotRestoreEmptyResponse> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut = new GridFutureAdapter<>();
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreCacheGroupProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+
+        fut.onDone();
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(new UnsupportedOperationException("Client and daemon nodes can not " +
+                "perform this operation."));
+        }
+
+        IgniteInternalFuture<Void> fut0 = fut;
+
+        if (!fut0.isDone()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "The baseline topology is not configured for cluster."));
+        }
+
+        if (ctx.cache().context().snapshotMgr().isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG +
+                "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+            throw new IgniteException("Not all nodes in the cluster support a snapshot restore operation.");
+
+        Collection<ClusterNode> bltNodes = F.viewReadOnly(ctx.discovery().serverNodes(AffinityTopologyVersion.NONE),
+            node -> node, (node) -> CU.baselineNode(node, ctx.state().clusterState()));
+
+        Set<UUID> bltNodeIds = new HashSet<>(F.viewReadOnly(bltNodes, F.node2id()));
+
+        fut = new GridFutureAdapter<>();
+
+        ((ClusterGroupAdapter)ctx.cluster().get().forNodeIds(bltNodeIds)).compute().executeAsync(
+            new SnapshotRestoreVerificatioTask(), new SnapshotRestoreVerificationArg(snpName, cacheGrpNames)).listen(
+            f -> {
+                try {
+                    SnapshotRestoreVerificationResult res = f.get();
+
+                    Set<String> foundGrps = res == null ? Collections.emptySet() : res.configs().stream()
+                        .map(v -> v.config().getGroupName() != null ? v.config().getGroupName() : v.config().getName())
+                        .collect(Collectors.toSet());
+
+                    if (!foundGrps.containsAll(cacheGrpNames)) {
+                        Set<String> missedGroups = new HashSet<>(cacheGrpNames);
+
+                        missedGroups.removeAll(foundGrps);
+
+                        fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                            "snapshot [groups=" + missedGroups + ", snapshot=" + snpName + ']'));
+
+                        return;
+                    }
+
+                    SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(
+                        UUID.randomUUID(), snpName, bltNodeIds, res.configs(), res.localNodeId());
+
+                    prepareRestoreProc.start(req.requestId(), req);
+                } catch (Throwable t) {
+                    fut.onDone(new IgniteException(OP_REJECT_MSG + t.getMessage(), t));
+                }
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if the cache group restore process is currently running.
+     *
+     * @return {@code True} if cache group restore process is currently running.
+     */
+    public boolean inProgress(@Nullable String cacheName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        return !staleProcess(fut, opCtx0) && (cacheName == null || opCtx0.containsCache(cacheName));
+    }
+
+    /**
+     * @param fut The future of cache snapshot restore operation.
+     * @param opCtx Snapshot restore operation context.
+     * @return {@code True} if the future completed or not initiated.
+     */
+    public boolean staleProcess(IgniteInternalFuture<Void> fut, SnapshotRestoreContext opCtx) {
+        return fut.isDone() || opCtx == null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes().contains(leftNodeId)) {
+            opCtx0.interrupt(new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.interrupt(reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IllegalStateException If cache with the specified name already exists.
+     */
+    private void ensureCacheAbsent(String name) throws IllegalStateException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<SnapshotRestoreEmptyResponse> prepare(SnapshotRestorePrepareRequest req) {
+        if (!req.nodes().contains(ctx.localNodeId()))
+            return new GridFinishedFuture<>();
+
+        if (inProgress(null)) {
+            return new GridFinishedFuture<>(
+                new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+        }
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        // Skip creating future on initiator.
+        if (fut.isDone())
+            fut = new GridFutureAdapter<>();
+
+        opCtx = new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), req.configs());
+
+        fut.listen(f -> opCtx = null);
+
+        if (!allNodesInBaselineAndAlive(req.nodes()))
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        GridFutureAdapter<SnapshotRestoreEmptyResponse> retFut = new GridFutureAdapter<>();
+
+        try {
+            for (String grpName : opCtx0.groups())
+                ensureCacheAbsent(grpName);
+
+            for (StoredCacheData cfg : opCtx0.configs()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (!ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx0.snapshotName()).exists())
+                return new GridFinishedFuture<>();
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+
+            ctx.getSystemExecutorService().submit(() -> {
+                try {
+                    opCtx0.restore(updateMeta);
+
+                    if (!opCtx0.interrupted()) {
+                        retFut.onDone();
+
+                        return;
+                    }
+
+                    log.error("Snapshot restore process has been interrupted " +
+                        "[groups=" + opCtx0.groups() + ", snapshot=" + opCtx0.snapshotName() + ']', opCtx0.error());
+
+                    opCtx0.rollback();
+
+                    retFut.onDone(opCtx0.error());
+
+                }
+                catch (Throwable t) {
+                    retFut.onDone(t);
+                }
+            });
+
+            return retFut;
+        } catch (Exception e) {
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, SnapshotRestoreEmptyResponse> res, Map<UUID, Exception> errs) {
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0.isDone() || !reqId.equals(opCtx.requestId()))
+            return;
+
+        Exception failure = F.first(errs.values());
+
+        if (failure != null) {
+            opCtx.rollback();
+
+            fut0.onDone(failure);
+
+            return;
+        }
+
+        if (U.isLocalNodeCoordinator(ctx.discovery()))
+            cacheStartProc.start(reqId, new SnapshotRestoreCacheStartRequest(reqId));
+    }
+
+    /**
+     * @param req Request to start restored cache groups.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<SnapshotRestoreEmptyResponse> cacheStart(SnapshotRestoreCacheStartRequest req) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (staleProcess(fut, opCtx0))
+            return new GridFinishedFuture<>();
+
+        if (!req.requestId().equals(opCtx0.requestId()))
+            return new GridFinishedFuture<>(new IgniteException("Unknown snapshot restore operation was rejected."));
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (opCtx0.interrupted())
+            return new GridFinishedFuture<>(opCtx0.error());
+
+        if (!allNodesInBaselineAndAlive(opCtx0.nodes()))
+            return new GridFinishedFuture<>(new IgniteException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        GridFutureAdapter<SnapshotRestoreEmptyResponse> retFut = new GridFutureAdapter<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting restored caches " +
+                "[snapshot=" + opCtx0.snapshotName() +
+                ", caches=" + F.viewReadOnly(opCtx0.configs(), c -> c.config().getName()) + ']');
+        }
+
+        ctx.cache().dynamicStartCachesByStoredConf(opCtx.configs(), true, true, false, null, true, opCtx0.nodes()).listen(
+            f -> {
+                if (f.error() != null) {
+                    log.error("Unable to start restored caches.", f.error());
+
+                    retFut.onDone(f.error());
+                }
+                else
+                    retFut.onDone();
+            }
+        );
+
+        return retFut;
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishCacheStart(UUID reqId, Map<UUID, SnapshotRestoreEmptyResponse> res, Map<UUID, Exception> errs) {
+        GridFutureAdapter<Void> fut0 = fut;
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (staleProcess(fut0, opCtx0) || !reqId.equals(opCtx0.requestId()))
+            return;
+
+        Exception failure = F.first(errs.values());
+
+        if (failure == null && !res.keySet().containsAll(opCtx0.nodes())) {
+            Set<UUID> leftNodes = new HashSet<>(opCtx0.nodes());
+
+            leftNodes.removeAll(res.keySet());
+
+            failure = new IgniteException(OP_REJECT_MSG + "Server node(s) has left the cluster [nodeId=" + leftNodes + ']');
+        }
+
+        if (failure != null) {
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                rollbackRestoreProc.start(reqId, new SnapshotRestoreRollbackRequest(reqId, failure));
+
+            return;
+        }
+
+        fut0.onDone();
+    }
+
+    /**
+     * @param req Request to rollback cache group restore process.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<SnapshotRestoreRollbackResponse> rollback(SnapshotRestoreRollbackRequest req) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (staleProcess(fut, opCtx0) || !req.requestId().equals(opCtx0.requestId()))
+            return new GridFinishedFuture<>();
+
+        if (!opCtx0.nodes().contains(ctx.localNodeId()))
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled())
+            log.info("Performing rollback routine for restored cache groups [groups=" + opCtx0.groups() + ']');
+
+        opCtx0.rollback();
+
+        return new GridFinishedFuture<>(new SnapshotRestoreRollbackResponse(req.error()));
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishRollback(UUID reqId, Map<UUID, SnapshotRestoreRollbackResponse> res, Map<UUID, Exception> errs) {
+        GridFutureAdapter<Void> fut0 = fut;
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (staleProcess(fut0, opCtx0) || !reqId.equals(opCtx0.requestId()))
+            return;
+
+        SnapshotRestoreRollbackResponse resp = F.first(F.viewReadOnly(res.values(), v -> v, Objects::nonNull));
+
+        fut0.onDone(resp.error());
+    }
+
+    /**
+     * @param nodeIds Set of required baseline node IDs.
+     * @return {@code True} if all of the specified nodes present in baseline and alive.
+     */
+    private boolean allNodesInBaselineAndAlive(Set<UUID> nodeIds) {
+        for (UUID nodeId : nodeIds) {
+            ClusterNode node = ctx.discovery().node(nodeId);
+
+            if (node == null || !CU.baselineNode(node, ctx.state().clusterState()) || !ctx.discovery().alive(node))
+                return false;
+        }
+
+        return true;
+    }
+
+    /**
+     * Cache group restore from snapshot operation context.
+     */
+    private class SnapshotRestoreContext {
+        /** Request ID. */
+        private final UUID reqId;
+
+        /** Snapshot name. */
+        private final String snpName;
+
+        /** Baseline node IDs that must be alive to complete the operation. */
+        private final Set<UUID> reqNodes;
+
+        /** List of processed cache IDs. */
+        private final Set<Integer> cacheIds = new HashSet<>();
+
+        /** Cache configurations. */
+        private final List<StoredCacheData> ccfgs;
+
+        /** Restored cache groups. */
+        private final Map<String, List<File>> grps = new ConcurrentHashMap<>();
+
+        /** The exception that led to the interruption of the process. */
+        private final AtomicReference<Throwable> errRef = new AtomicReference<>();
+
+        /**
+         * @param reqId Request ID.
+         * @param snpName Snapshot name.
+         * @param reqNodes Baseline node IDs that must be alive to complete the operation.
+         * @param cfgs Stored cache configurations.
+         */
+        protected SnapshotRestoreContext(UUID reqId, String snpName, Set<UUID> reqNodes, List<StoredCacheData> cfgs) {
+            ccfgs = new ArrayList<>(cfgs);
+
+            for (StoredCacheData cacheData : cfgs) {
+                String cacheName = cacheData.config().getName();
+
+                cacheIds.add(CU.cacheId(cacheName));
+
+                boolean shared = cacheData.config().getGroupName() != null;
+
+                grps.computeIfAbsent(shared ? cacheData.config().getGroupName() : cacheName, v -> new ArrayList<>());
+
+                if (shared)
+                    cacheIds.add(CU.cacheId(cacheData.config().getGroupName()));
+            }
+
+            this.reqId = reqId;
+            this.reqNodes = new HashSet<>(reqNodes);
+            this.snpName = snpName;
+        }
+
+        /** @return Request ID. */
+        protected UUID requestId() {
+            return reqId;
+        }
+
+        /** @return Baseline node IDs that must be alive to complete the operation. */
+        protected Set<UUID> nodes() {
+            return Collections.unmodifiableSet(reqNodes);
+        }
+
+        /** @return Snapshot name. */
+        protected String snapshotName() {
+            return snpName;
+        }
+
+        /**
+         * @return List of cache group names to restore from the snapshot.
+         */
+        protected Set<String> groups() {
+            return grps.keySet();
+        }
+
+        /**
+         * @param name Cache name.
+         * @return {@code True} if the cache with the specified name is currently being restored.
+         */
+        protected boolean containsCache(String name) {
+            return cacheIds.contains(CU.cacheId(name));
+        }
+
+        /** @return Cache configurations. */
+        protected Collection<StoredCacheData> configs() {
+            return ccfgs;
+        }
+
+        /**
+         * @param err Error.
+         * @return {@code True} if operation has been interrupted by this call.
+         */
+        protected boolean interrupt(Exception err) {
+            return errRef.compareAndSet(null, err);
+        }
+
+        /**
+         * @return Interrupted flag.
+         */
+        protected boolean interrupted() {
+            return error() != null;
+        }
+
+        /**
+         * @return Error if operation was interrupted, otherwise {@code null}.
+         */
+        protected @Nullable Throwable error() {
+            return errRef.get();
+        }
+
+        /**
+         * Restore specified cache groups from the local snapshot directory.
+         *
+         * @param updateMetadata Update binary metadata flag.
+         * @throws IgniteCheckedException If failed.
+         */
+        protected void restore(boolean updateMetadata) throws IgniteCheckedException {
+            if (interrupted())
+                return;
+
+            IgniteSnapshotManager snapshotMgr = ctx.cache().context().snapshotMgr();
+            String folderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+            if (updateMetadata) {
+                File binDir = binaryWorkDir(snapshotMgr.snapshotLocalDir(snpName).getAbsolutePath(), folderName);
+
+                if (!binDir.exists()) {
+                    throw new IgniteCheckedException("Unable to update cluster metadata from snapshot, " +
+                        "directory doesn't exists [snapshot=" + snpName + ", dir=" + binDir + ']');
+                }
+
+                ctx.cacheObjects().updateMetadata(binDir, this::interrupted);
+            }
+
+            for (File grpDir : snapshotMgr.snapshotCacheDirectories(snpName, folderName)) {
+                String grpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+                if (!groups().contains(grpName))
+                    continue;
+
+                snapshotMgr.restoreCacheGroupFiles(snpName, grpName, grpDir, this::interrupted, grps.get(grpName));
+            }
+        }
+
+        /**
+         * Rollback changes made by process in specified cache group.
+         */
+        protected void rollback() {

Review comment:
       We also should not place such methods in the context. Let's pass the `ctx` into this method if we should do any actions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r622373430



##########
File path: modules/core/src/main/java/org/apache/ignite/IgniteSnapshot.java
##########
@@ -48,4 +50,13 @@
      * @return Future which will be completed when cancel operation finished.
      */
     public IgniteFuture<Void> cancelSnapshot(String name);
+
+    /**
+     * Restore cache group(s) from the snapshot.

Review comment:
       Let's add to the descriptoin that caches that are planning to be restored must be destroyed manually by the user and they must not exist in the cluster.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r615722367



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       Why we shouldn't check SQL and indexing?
   Or why is it worth running the same tests twice? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r622388763



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.OpenOption;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.IntSupplier;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.CacheMode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.FILE_SUFFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreProcess.TMP_CACHE_DIR_PREFIX;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends IgniteClusterSnapshotRestoreBaseTest {
+    /** Type name used for binary and SQL. */
+    private static final String TYPE_NAME = "CustomType";
+
+    /** Cache 1 name. */
+    private static final String CACHE1 = "cache1";
+
+    /** Cache 2 name. */
+    private static final String CACHE2 = "cache2";
+
+    /** Default shared cache group name. */
+    private static final String SHARED_GRP = "shared";
+
+    /** Cache value builder. */
+    private Function<Integer, Object> valBuilder = String::valueOf;
+
+    /** {@inheritDoc} */
+    @Override protected Function<Integer, Object> valueBuilder() {
+        return valBuilder;
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreAllGroups() throws Exception {
+        CacheConfiguration<Integer, Object> cacheCfg1 =
+            txCacheConfig(new CacheConfiguration<Integer, Object>(CACHE1)).setGroupName(SHARED_GRP);
+
+        CacheConfiguration<Integer, Object> cacheCfg2 =
+            txCacheConfig(new CacheConfiguration<Integer, Object>(CACHE2)).setGroupName(SHARED_GRP);
+
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder,
+            dfltCacheCfg.setBackups(0), cacheCfg1, cacheCfg2);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cache(CACHE1).destroy();
+        ignite.cache(CACHE2).destroy();
+        ignite.cache(DEFAULT_CACHE_NAME).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Restore all cache groups.
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, null).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(DEFAULT_CACHE_NAME), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(CACHE1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(CACHE2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testStartClusterSnapshotRestoreMultipleThreadsSameNode() throws Exception {
+        checkStartClusterSnapshotRestoreMultithreaded(() -> 0);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testStartClusterSnapshotRestoreMultipleThreadsDiffNode() throws Exception {
+        AtomicInteger nodeIdx = new AtomicInteger();
+
+        checkStartClusterSnapshotRestoreMultithreaded(nodeIdx::getAndIncrement);
+    }
+
+    /**
+     * @param nodeIdxSupplier Ignite node index supplier.
+     */
+    public void checkStartClusterSnapshotRestoreMultithreaded(IntSupplier nodeIdxSupplier) throws Exception {

Review comment:
       This method can be private.

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.OpenOption;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.IntSupplier;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.CacheMode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.FILE_SUFFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreProcess.TMP_CACHE_DIR_PREFIX;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends IgniteClusterSnapshotRestoreBaseTest {
+    /** Type name used for binary and SQL. */
+    private static final String TYPE_NAME = "CustomType";
+
+    /** Cache 1 name. */
+    private static final String CACHE1 = "cache1";
+
+    /** Cache 2 name. */
+    private static final String CACHE2 = "cache2";
+
+    /** Default shared cache group name. */
+    private static final String SHARED_GRP = "shared";
+
+    /** Cache value builder. */
+    private Function<Integer, Object> valBuilder = String::valueOf;
+
+    /** {@inheritDoc} */
+    @Override protected Function<Integer, Object> valueBuilder() {
+        return valBuilder;
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreAllGroups() throws Exception {
+        CacheConfiguration<Integer, Object> cacheCfg1 =
+            txCacheConfig(new CacheConfiguration<Integer, Object>(CACHE1)).setGroupName(SHARED_GRP);
+
+        CacheConfiguration<Integer, Object> cacheCfg2 =
+            txCacheConfig(new CacheConfiguration<Integer, Object>(CACHE2)).setGroupName(SHARED_GRP);
+
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder,
+            dfltCacheCfg.setBackups(0), cacheCfg1, cacheCfg2);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cache(CACHE1).destroy();
+        ignite.cache(CACHE2).destroy();
+        ignite.cache(DEFAULT_CACHE_NAME).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Restore all cache groups.
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, null).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(DEFAULT_CACHE_NAME), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(CACHE1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(CACHE2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testStartClusterSnapshotRestoreMultipleThreadsSameNode() throws Exception {
+        checkStartClusterSnapshotRestoreMultithreaded(() -> 0);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testStartClusterSnapshotRestoreMultipleThreadsDiffNode() throws Exception {
+        AtomicInteger nodeIdx = new AtomicInteger();
+
+        checkStartClusterSnapshotRestoreMultithreaded(nodeIdx::getAndIncrement);
+    }
+
+    /**
+     * @param nodeIdxSupplier Ignite node index supplier.
+     */
+    public void checkStartClusterSnapshotRestoreMultithreaded(IntSupplier nodeIdxSupplier) throws Exception {
+        Ignite ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        CountDownLatch startLatch = new CountDownLatch(1);
+        AtomicInteger successCnt = new AtomicInteger();
+
+        IgniteInternalFuture<Long> fut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                startLatch.await(TIMEOUT, TimeUnit.MILLISECONDS);
+
+                grid(nodeIdxSupplier.getAsInt()).snapshot().restoreSnapshot(
+                    SNAPSHOT_NAME, Collections.singleton(DEFAULT_CACHE_NAME)).get(TIMEOUT);
+
+                successCnt.incrementAndGet();
+            }
+            catch (Exception ignore) {
+                // Expected exception.

Review comment:
       Let's also check that the other attempt fails with an exception.

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreBaseTest.java
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.function.Function;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.internal.IgniteEx;
+
+/**
+ * Snapshot restore test base.
+ */
+public abstract class IgniteClusterSnapshotRestoreBaseTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    protected static final long TIMEOUT = 15_000;
+
+    /** Cache value builder. */
+    protected abstract Function<Integer, Object> valueBuilder();
+
+    /**
+     * @param nodesCnt Nodes count.
+     * @param keysCnt Number of keys to create.
+     * @return Ignite coordinator instance.
+     * @throws Exception if failed.
+     */
+    protected IgniteEx startGridsWithSnapshot(int nodesCnt, int keysCnt) throws Exception {
+        return startGridsWithSnapshot(nodesCnt, keysCnt, false);
+    }
+
+    /**
+     * @param nodesCnt Nodes count.
+     * @param keysCnt Number of keys to create.
+     * @param startClient {@code True} to start an additional client node.
+     * @return Ignite coordinator instance.
+     * @throws Exception if failed.
+     */
+    protected IgniteEx startGridsWithSnapshot(int nodesCnt, int keysCnt, boolean startClient) throws Exception {
+        IgniteEx ignite = startGridsWithCache(nodesCnt, keysCnt, valueBuilder(), dfltCacheCfg.setBackups(0));
+
+        if (startClient)
+            ignite = startClientGrid("client");
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        return ignite;
+    }
+
+    /**
+     * @param cache Cache.
+     * @param keysCnt Expected number of keys.
+     */
+    protected void checkCacheKeys(IgniteCache<Object, Object> cache, int keysCnt) {

Review comment:
       I think it would be better to change the name to `assertCacheKeys`

##########
File path: modules/indexing/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreWithIndexingTest.java
##########
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.binary.BinaryBasicNameMapper;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.SqlFieldsQuery;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.query.GridQueryProcessor;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.G;
+import org.junit.Test;
+
+/**
+ * Cluster snapshot restore tests verifying SQL and indexing.
+ */
+public class IgniteClusterSnapshotRestoreWithIndexingTest extends IgniteClusterSnapshotRestoreBaseTest {
+    /** Type name used for binary and SQL. */
+    private static final String TYPE_NAME = IndexedObject.class.getName();
+
+    /** Number of cache keys to pre-create at node start. */
+    private static final int CACHE_KEYS_RANGE = 10_000;
+
+    /** Cache value builder. */
+    private Function<Integer, Object> valBuilder = new BinaryValueBuilder(TYPE_NAME);
+
+    /** {@inheritDoc} */
+    @Override protected <K, V> CacheConfiguration<K, V> txCacheConfig(CacheConfiguration<K, V> ccfg) {
+        return super.txCacheConfig(ccfg).setSqlIndexMaxInlineSize(255).setSqlSchema("PUBLIC")
+            .setQueryEntities(Collections.singletonList(new QueryEntity()
+                .setKeyType(Integer.class.getName())
+                .setValueType(TYPE_NAME)
+                .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+                .setIndexes(Collections.singletonList(new QueryIndex("id")))));
+    }
+
+    /** {@inheritDoc} */
+    @Override protected Function<Integer, Object> valueBuilder() {
+        return valBuilder;
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        valBuilder = new IndexedValueBuilder();
+
+        IgniteEx client = startGridsWithSnapshot(2, CACHE_KEYS_RANGE, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(DEFAULT_CACHE_NAME)).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = client.cache(DEFAULT_CACHE_NAME);
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, CACHE_KEYS_RANGE);

Review comment:
       Let's also check that index rebuild procedure didn't happened.

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreBaseTest.java
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.function.Function;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.internal.IgniteEx;
+
+/**
+ * Snapshot restore test base.
+ */
+public abstract class IgniteClusterSnapshotRestoreBaseTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    protected static final long TIMEOUT = 15_000;
+
+    /** Cache value builder. */
+    protected abstract Function<Integer, Object> valueBuilder();
+
+    /**
+     * @param nodesCnt Nodes count.
+     * @param keysCnt Number of keys to create.
+     * @return Ignite coordinator instance.
+     * @throws Exception if failed.
+     */
+    protected IgniteEx startGridsWithSnapshot(int nodesCnt, int keysCnt) throws Exception {
+        return startGridsWithSnapshot(nodesCnt, keysCnt, false);
+    }
+
+    /**
+     * @param nodesCnt Nodes count.
+     * @param keysCnt Number of keys to create.
+     * @param startClient {@code True} to start an additional client node.
+     * @return Ignite coordinator instance.
+     * @throws Exception if failed.
+     */
+    protected IgniteEx startGridsWithSnapshot(int nodesCnt, int keysCnt, boolean startClient) throws Exception {
+        IgniteEx ignite = startGridsWithCache(nodesCnt, keysCnt, valueBuilder(), dfltCacheCfg.setBackups(0));

Review comment:
       This is not a good practice changing the cache configuration inside this method - `.setBackups(0)`. The cache configuration should be passes as method parameter or used `as-is`.

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreBaseTest.java
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.function.Function;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.internal.IgniteEx;
+
+/**
+ * Snapshot restore test base.
+ */
+public abstract class IgniteClusterSnapshotRestoreBaseTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    protected static final long TIMEOUT = 15_000;

Review comment:
       Can we use the `getTestTimeout` value instead of this one - hardcoded?

##########
File path: modules/indexing/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreWithIndexingTest.java
##########
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.binary.BinaryBasicNameMapper;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.SqlFieldsQuery;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.processors.query.GridQueryProcessor;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.G;
+import org.junit.Test;
+
+/**
+ * Cluster snapshot restore tests verifying SQL and indexing.
+ */
+public class IgniteClusterSnapshotRestoreWithIndexingTest extends IgniteClusterSnapshotRestoreBaseTest {
+    /** Type name used for binary and SQL. */
+    private static final String TYPE_NAME = IndexedObject.class.getName();
+
+    /** Number of cache keys to pre-create at node start. */
+    private static final int CACHE_KEYS_RANGE = 10_000;
+
+    /** Cache value builder. */
+    private Function<Integer, Object> valBuilder = new BinaryValueBuilder(TYPE_NAME);
+
+    /** {@inheritDoc} */
+    @Override protected <K, V> CacheConfiguration<K, V> txCacheConfig(CacheConfiguration<K, V> ccfg) {
+        return super.txCacheConfig(ccfg).setSqlIndexMaxInlineSize(255).setSqlSchema("PUBLIC")
+            .setQueryEntities(Collections.singletonList(new QueryEntity()
+                .setKeyType(Integer.class.getName())
+                .setValueType(TYPE_NAME)
+                .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+                .setIndexes(Collections.singletonList(new QueryIndex("id")))));
+    }
+
+    /** {@inheritDoc} */
+    @Override protected Function<Integer, Object> valueBuilder() {
+        return valBuilder;
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        valBuilder = new IndexedValueBuilder();
+
+        IgniteEx client = startGridsWithSnapshot(2, CACHE_KEYS_RANGE, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(DEFAULT_CACHE_NAME)).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = client.cache(DEFAULT_CACHE_NAME);
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(DEFAULT_CACHE_NAME)).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(DEFAULT_CACHE_NAME).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());

Review comment:
       Should we also check that `typeId` exits after the restore?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r615718065



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       This is exactly what `checkClusterStateChange` is doing:
   `testClusterStateChangeActiveReadonlyOnPrepare`
   `testClusterStateChangeActiveReadonlyOnCacheStart`
   `testClusterDeactivateOnPrepare`
   `testClusterDeactivateOnCacheStart`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r614264404



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(
+        DistributedProcessType procType,
+        Class<? extends Throwable> expCls,
+        String expMsg
+    ) throws Exception {
+        String grpName = "shared";
+        String cacheName = "cache1";
+
+        dfltCacheCfg = txCacheConfig(new CacheConfiguration<Integer, Object>(cacheName)).setGroupName(grpName);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, grpName);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(grpName), IgniteCheckedException.class, null);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(cacheName), expCls, expMsg);
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        checkCacheKeys(grid(0).cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeFail() throws Exception {
+        checkTopologyChange(true);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeJoin() throws Exception {

Review comment:
       This set of tests has the name IgniteCluster**SnapshotRestore**SelfTest do we really need to duplicate `OnRestoreInProgress` for each test?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r599024865



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :

Review comment:
       Can you please fix the Intellij IDEA suggestions here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r614181052



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       Done `testStartClusterSnapshotRestoreMultipleThreadsSameNode`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r614261567



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       added `testCreateSnapshotDuringRestore`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595811104



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +

Review comment:
       Currently, cache destroy doesn't remove the local cache directory.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r615720219



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(
+        DistributedProcessType procType,
+        Class<? extends Throwable> expCls,
+        String expMsg
+    ) throws Exception {
+        String grpName = "shared";
+        String cacheName = "cache1";
+
+        dfltCacheCfg = txCacheConfig(new CacheConfiguration<Integer, Object>(cacheName)).setGroupName(grpName);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, grpName);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(grpName), IgniteCheckedException.class, null);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(cacheName), expCls, expMsg);
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        checkCacheKeys(grid(0).cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeFail() throws Exception {
+        checkTopologyChange(true);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeJoin() throws Exception {

Review comment:
       `testNodeJoin` -> `testNodeJoinDuringRestore`
   `testNodeFail` -> `testNodeFailDuringRestore`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r614108634



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {

Review comment:
       renamed to `testClusterSnapshotRestoreOnBiggerTopology`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r622749674



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreBaseTest.java
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.function.Function;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.internal.IgniteEx;
+
+/**
+ * Snapshot restore test base.
+ */
+public abstract class IgniteClusterSnapshotRestoreBaseTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    protected static final long TIMEOUT = 15_000;

Review comment:
       I don't think this is a good idea. This is a short timeout to fail tests faster. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r622373430



##########
File path: modules/core/src/main/java/org/apache/ignite/IgniteSnapshot.java
##########
@@ -48,4 +50,13 @@
      * @return Future which will be completed when cancel operation finished.
      */
     public IgniteFuture<Void> cancelSnapshot(String name);
+
+    /**
+     * Restore cache group(s) from the snapshot.

Review comment:
       Let's add to the descriptoin that caches that are planning to be restored must be destroyed manually by the user.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r614181335



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       Done, `testStartClusterSnapshotRestoreMultipleThreadsDiffNode`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r598980225



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {

Review comment:
       This is not a guarantee that you don't have any concurrent snapshots. You will have such guarantees only in the discovery thread, so you also need to add this check to the `prepare` phase.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestorePrepareRequest.java
##########
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Set;
+import java.util.UUID;
+import org.apache.ignite.internal.util.typedef.internal.S;
+
+/**
+ * Request to prepare cache group restore from the snapshot.
+ */
+public class SnapshotRestorePrepareRequest implements Serializable {

Review comment:
       I think we can simplify it to > `SnapshotRestoreRequest`

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {

Review comment:
       You should also handle a case when the snapshot start and the snapshot restore procedure run concurrently from a single node. Only one must succeed.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, updateMeta, stopChecker, errHnd).thenAccept(res -> {
+                Throwable err = opCtx.err.get();
+
+                if (err != null) {
+                    log.error("Unable to restore cache group(s) from the snapshot " +
+                        "[reqId=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', err);
+
+                    retFut.onDone(err);
+                } else
+                    retFut.onDone(new ArrayList<>(opCtx.cfgs.values()));
+            });
+
+            return retFut;
+        } catch (IgniteIllegalStateException | IgniteCheckedException | RejectedExecutionException e) {
+            log.error("Unable to restore cache group(s) from the snapshot " +
+                "[reqId=" + req.requestId() + ", snapshot=" + req.snapshotName() + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param snpName Snapshot name.
+     * @param dirs Cache directories to restore from the snapshot.
+     * @param updateMeta Update binary metadata flag.
+     * @param stopChecker Prcoess interrupt checker.
+     * @param errHnd Error handler.
+     * @throws IgniteCheckedException If failed.
+     */
+    private CompletableFuture<Void> restoreAsync(
+        String snpName,
+        Collection<File> dirs,
+        boolean updateMeta,
+        BooleanSupplier stopChecker,
+        Consumer<Exception> errHnd
+    ) throws IgniteCheckedException {
+        IgniteSnapshotManager snapshotMgr = ctx.cache().context().snapshotMgr();
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        List<CompletableFuture<Void>> futs = new ArrayList<>();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(snapshotMgr.snapshotLocalDir(snpName).getAbsolutePath(), pdsFolderName);
+
+            futs.add(CompletableFuture.runAsync(() -> {
+                try {
+                    ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+                }
+                catch (IgniteCheckedException e) {
+                    errHnd.accept(e);
+                }
+            }, snapshotMgr.snapshotExecutorService()));
+        }
+
+        for (File cacheDir : dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            assert snpCacheDir.exists() : "node=" + ctx.localNodeId() + ", dir=" + snpCacheDir;
+
+            for (File snpFile : snpCacheDir.listFiles()) {
+                futs.add(CompletableFuture.runAsync(() -> {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    try {
+                        Files.copy(snpFile.toPath(), target.toPath());
+                    }
+                    catch (IOException e) {
+                        errHnd.accept(e);
+                    }
+                }, ctx.cache().context().snapshotMgr().snapshotExecutorService()));
+            }
+        }
+
+        int futsSize = futs.size();
+
+        return CompletableFuture.allOf(futs.toArray(new CompletableFuture[futsSize]));
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta == null || !meta.consistentId().equals(cctx.localNode().consistentId().toString()))
+            return new SnapshotRestoreContext(req, Collections.emptyList(), Collections.emptyMap());
+
+        if (meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+        FilePageStoreManager pageStore = (FilePageStoreManager)cctx.pageStore();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), meta.folderName())) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            File cacheDir = pageStore.cacheWorkDir(snpCacheDir.getName().startsWith(CACHE_GRP_DIR_PREFIX), grpName);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+
+            pageStore.readCacheConfigurations(snpCacheDir, cfgsByName);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req, cacheDirs, cfgsById);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        if (ctx.clientNode())
+            return;
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        Exception failure = F.first(errs.values());
+
+        assert opCtx0 != null || failure != null : ctx.localNodeId();

Review comment:
       This assertion may fail the node. Is it better to throw an exception to a user which restoring a snapshot instead of failing the node?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;

Review comment:
       You should nullify context prior to completing the user future.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestorePrepareRequest.java
##########
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Set;
+import java.util.UUID;
+import org.apache.ignite.internal.util.typedef.internal.S;
+
+/**
+ * Request to prepare cache group restore from the snapshot.
+ */
+public class SnapshotRestorePrepareRequest implements Serializable {
+    /** Serial version uid. */
+    private static final long serialVersionUID = 0L;
+
+    /** Request ID. */
+    private final UUID reqId;
+
+    /** Snapshot name. */
+    private final String snpName;
+
+    /** Baseline node IDs that must be alive to complete the operation. */
+    private final Set<UUID> nodes;
+
+    /** List of cache group names to restore from the snapshot. */
+    private final Collection<String> grps;
+
+    /** Node ID from which to update the binary metadata. */
+    private final UUID updateMetaNodeId;

Review comment:
       Do we need this property or maybe it's better to use the coordinator node always?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, updateMeta, stopChecker, errHnd).thenAccept(res -> {
+                Throwable err = opCtx.err.get();
+
+                if (err != null) {
+                    log.error("Unable to restore cache group(s) from the snapshot " +
+                        "[reqId=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', err);
+
+                    retFut.onDone(err);
+                } else
+                    retFut.onDone(new ArrayList<>(opCtx.cfgs.values()));
+            });
+
+            return retFut;
+        } catch (IgniteIllegalStateException | IgniteCheckedException | RejectedExecutionException e) {
+            log.error("Unable to restore cache group(s) from the snapshot " +
+                "[reqId=" + req.requestId() + ", snapshot=" + req.snapshotName() + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param snpName Snapshot name.
+     * @param dirs Cache directories to restore from the snapshot.
+     * @param updateMeta Update binary metadata flag.
+     * @param stopChecker Prcoess interrupt checker.
+     * @param errHnd Error handler.
+     * @throws IgniteCheckedException If failed.
+     */
+    private CompletableFuture<Void> restoreAsync(
+        String snpName,
+        Collection<File> dirs,
+        boolean updateMeta,
+        BooleanSupplier stopChecker,
+        Consumer<Exception> errHnd
+    ) throws IgniteCheckedException {
+        IgniteSnapshotManager snapshotMgr = ctx.cache().context().snapshotMgr();
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        List<CompletableFuture<Void>> futs = new ArrayList<>();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(snapshotMgr.snapshotLocalDir(snpName).getAbsolutePath(), pdsFolderName);
+
+            futs.add(CompletableFuture.runAsync(() -> {
+                try {
+                    ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+                }
+                catch (IgniteCheckedException e) {
+                    errHnd.accept(e);
+                }
+            }, snapshotMgr.snapshotExecutorService()));
+        }
+
+        for (File cacheDir : dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            assert snpCacheDir.exists() : "node=" + ctx.localNodeId() + ", dir=" + snpCacheDir;
+
+            for (File snpFile : snpCacheDir.listFiles()) {
+                futs.add(CompletableFuture.runAsync(() -> {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    try {
+                        Files.copy(snpFile.toPath(), target.toPath());
+                    }
+                    catch (IOException e) {
+                        errHnd.accept(e);
+                    }
+                }, ctx.cache().context().snapshotMgr().snapshotExecutorService()));
+            }
+        }
+
+        int futsSize = futs.size();
+
+        return CompletableFuture.allOf(futs.toArray(new CompletableFuture[futsSize]));
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta == null || !meta.consistentId().equals(cctx.localNode().consistentId().toString()))
+            return new SnapshotRestoreContext(req, Collections.emptyList(), Collections.emptyMap());
+
+        if (meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+        FilePageStoreManager pageStore = (FilePageStoreManager)cctx.pageStore();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), meta.folderName())) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            File cacheDir = pageStore.cacheWorkDir(snpCacheDir.getName().startsWith(CACHE_GRP_DIR_PREFIX), grpName);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+
+            pageStore.readCacheConfigurations(snpCacheDir, cfgsByName);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req, cacheDirs, cfgsById);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        if (ctx.clientNode())
+            return;
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        Exception failure = F.first(errs.values());
+
+        assert opCtx0 != null || failure != null : ctx.localNodeId();
+
+        if (opCtx0 == null) {
+            finishProcess(failure);
+
+            return;
+        }
+
+        if (failure == null)
+            failure = checNodeLeft(opCtx0.nodes, res.keySet());

Review comment:
       checNodeLeft > checkNodeLeft

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;

Review comment:
       It seems you should set the `err` and wait for the local tasks to be aborted. It is not enough here to set the `err` only since stopping the node may corrupt some of your local procedures.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);

Review comment:
       I can suggest the following:
   
   1. Remove `errHnd` and use the `opCtx0.err.compareAndSet(null, ex)` directly.
   2. Throw a runtime exception in the `stopChecker`
   3. Throw the `IgniteException` from the `updateMetadata`  instead of `IgniteCheckedExcpeiton`
   4. Use `handle` instead of `.thenAccept` for combining results and get the error directly instead of reading it from the context.
   

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, updateMeta, stopChecker, errHnd).thenAccept(res -> {
+                Throwable err = opCtx.err.get();
+
+                if (err != null) {
+                    log.error("Unable to restore cache group(s) from the snapshot " +
+                        "[reqId=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', err);
+
+                    retFut.onDone(err);
+                } else
+                    retFut.onDone(new ArrayList<>(opCtx.cfgs.values()));
+            });
+
+            return retFut;
+        } catch (IgniteIllegalStateException | IgniteCheckedException | RejectedExecutionException e) {
+            log.error("Unable to restore cache group(s) from the snapshot " +
+                "[reqId=" + req.requestId() + ", snapshot=" + req.snapshotName() + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param snpName Snapshot name.
+     * @param dirs Cache directories to restore from the snapshot.
+     * @param updateMeta Update binary metadata flag.
+     * @param stopChecker Prcoess interrupt checker.

Review comment:
       Prcoess > Process

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, updateMeta, stopChecker, errHnd).thenAccept(res -> {
+                Throwable err = opCtx.err.get();
+
+                if (err != null) {
+                    log.error("Unable to restore cache group(s) from the snapshot " +
+                        "[reqId=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', err);
+
+                    retFut.onDone(err);
+                } else
+                    retFut.onDone(new ArrayList<>(opCtx.cfgs.values()));
+            });
+
+            return retFut;
+        } catch (IgniteIllegalStateException | IgniteCheckedException | RejectedExecutionException e) {
+            log.error("Unable to restore cache group(s) from the snapshot " +
+                "[reqId=" + req.requestId() + ", snapshot=" + req.snapshotName() + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param snpName Snapshot name.
+     * @param dirs Cache directories to restore from the snapshot.
+     * @param updateMeta Update binary metadata flag.
+     * @param stopChecker Prcoess interrupt checker.
+     * @param errHnd Error handler.
+     * @throws IgniteCheckedException If failed.
+     */
+    private CompletableFuture<Void> restoreAsync(
+        String snpName,
+        Collection<File> dirs,
+        boolean updateMeta,
+        BooleanSupplier stopChecker,
+        Consumer<Exception> errHnd
+    ) throws IgniteCheckedException {
+        IgniteSnapshotManager snapshotMgr = ctx.cache().context().snapshotMgr();
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        List<CompletableFuture<Void>> futs = new ArrayList<>();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(snapshotMgr.snapshotLocalDir(snpName).getAbsolutePath(), pdsFolderName);
+
+            futs.add(CompletableFuture.runAsync(() -> {
+                try {
+                    ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+                }
+                catch (IgniteCheckedException e) {
+                    errHnd.accept(e);
+                }
+            }, snapshotMgr.snapshotExecutorService()));
+        }
+
+        for (File cacheDir : dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            assert snpCacheDir.exists() : "node=" + ctx.localNodeId() + ", dir=" + snpCacheDir;
+
+            for (File snpFile : snpCacheDir.listFiles()) {
+                futs.add(CompletableFuture.runAsync(() -> {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    try {
+                        Files.copy(snpFile.toPath(), target.toPath());
+                    }
+                    catch (IOException e) {
+                        errHnd.accept(e);
+                    }
+                }, ctx.cache().context().snapshotMgr().snapshotExecutorService()));
+            }
+        }
+
+        int futsSize = futs.size();
+
+        return CompletableFuture.allOf(futs.toArray(new CompletableFuture[futsSize]));
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta == null || !meta.consistentId().equals(cctx.localNode().consistentId().toString()))
+            return new SnapshotRestoreContext(req, Collections.emptyList(), Collections.emptyMap());
+
+        if (meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+        FilePageStoreManager pageStore = (FilePageStoreManager)cctx.pageStore();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), meta.folderName())) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            File cacheDir = pageStore.cacheWorkDir(snpCacheDir.getName().startsWith(CACHE_GRP_DIR_PREFIX), grpName);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+
+            pageStore.readCacheConfigurations(snpCacheDir, cfgsByName);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req, cacheDirs, cfgsById);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        if (ctx.clientNode())
+            return;
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        Exception failure = F.first(errs.values());
+
+        assert opCtx0 != null || failure != null : ctx.localNodeId();
+
+        if (opCtx0 == null) {
+            finishProcess(failure);
+
+            return;
+        }
+
+        if (failure == null)
+            failure = checNodeLeft(opCtx0.nodes, res.keySet());
+
+        // Context has been created - should rollback changes cluster-wide.
+        if (failure != null) {
+            opCtx0.err.compareAndSet(null, failure);
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                rollbackRestoreProc.start(reqId, reqId);
+
+            return;
+        }
+
+        Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+        for (List<StoredCacheData> storedCfgs : res.values()) {
+            if (storedCfgs == null)
+                continue;
+
+            for (StoredCacheData cacheData : storedCfgs)
+                globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+        }
+
+        opCtx0.cfgs = globalCfgs;
+
+        if (U.isLocalNodeCoordinator(ctx.discovery()))
+            cacheStartProc.start(reqId, reqId);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null) {
+            return new GridFinishedFuture<>(new IgniteIllegalStateException("Context has not been created on server " +
+                "node during prepare operation [reqId=" + reqId + ", nodeId=" + ctx.localNodeId() + ']'));
+        }
+
+        Throwable err = opCtx0.err.get();
+
+        if (err != null)
+            return new GridFinishedFuture<>(err);
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        Collection<StoredCacheData> ccfgs = opCtx0.cfgs.values();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting restored caches " +
+                "[reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName +
+                ", caches=" + F.viewReadOnly(ccfgs, c -> c.config().getName()) + ']');
+        }
+
+        return ctx.cache().dynamicStartCachesByStoredConf(ccfgs, true, true, false, null, true, opCtx0.nodes);

Review comment:
       Can you please add a comment describing that the start procedure will fail if any of the requested nodes leave the cluster?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cacheobject/IgniteCacheObjectProcessor.java
##########
@@ -306,6 +307,15 @@ public void updateMetadata(int typeId, String typeName, @Nullable String affKeyF
      */
     public void saveMetadata(Collection<BinaryType> types, File dir);
 
+    /**
+     * Merge the binary metadata files stored in the specified directory.
+     *
+     * @param metadataDir Directory containing binary metadata files.
+     * @param stopChecker Prcoess interrupt checker.

Review comment:
       Prcoess > Process

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, updateMeta, stopChecker, errHnd).thenAccept(res -> {

Review comment:
       I think it will be better to use the `handle` method instead of `thenAccept`, you will get the error here without using context.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, updateMeta, stopChecker, errHnd).thenAccept(res -> {
+                Throwable err = opCtx.err.get();
+
+                if (err != null) {
+                    log.error("Unable to restore cache group(s) from the snapshot " +
+                        "[reqId=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', err);
+
+                    retFut.onDone(err);
+                } else
+                    retFut.onDone(new ArrayList<>(opCtx.cfgs.values()));
+            });
+
+            return retFut;
+        } catch (IgniteIllegalStateException | IgniteCheckedException | RejectedExecutionException e) {
+            log.error("Unable to restore cache group(s) from the snapshot " +
+                "[reqId=" + req.requestId() + ", snapshot=" + req.snapshotName() + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param snpName Snapshot name.
+     * @param dirs Cache directories to restore from the snapshot.
+     * @param updateMeta Update binary metadata flag.
+     * @param stopChecker Prcoess interrupt checker.
+     * @param errHnd Error handler.
+     * @throws IgniteCheckedException If failed.
+     */
+    private CompletableFuture<Void> restoreAsync(
+        String snpName,
+        Collection<File> dirs,
+        boolean updateMeta,
+        BooleanSupplier stopChecker,
+        Consumer<Exception> errHnd
+    ) throws IgniteCheckedException {
+        IgniteSnapshotManager snapshotMgr = ctx.cache().context().snapshotMgr();
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        List<CompletableFuture<Void>> futs = new ArrayList<>();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(snapshotMgr.snapshotLocalDir(snpName).getAbsolutePath(), pdsFolderName);
+
+            futs.add(CompletableFuture.runAsync(() -> {
+                try {
+                    ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+                }
+                catch (IgniteCheckedException e) {
+                    errHnd.accept(e);
+                }
+            }, snapshotMgr.snapshotExecutorService()));
+        }
+
+        for (File cacheDir : dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            assert snpCacheDir.exists() : "node=" + ctx.localNodeId() + ", dir=" + snpCacheDir;
+
+            for (File snpFile : snpCacheDir.listFiles()) {
+                futs.add(CompletableFuture.runAsync(() -> {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    try {
+                        Files.copy(snpFile.toPath(), target.toPath());
+                    }
+                    catch (IOException e) {
+                        errHnd.accept(e);
+                    }
+                }, ctx.cache().context().snapshotMgr().snapshotExecutorService()));
+            }
+        }
+
+        int futsSize = futs.size();
+
+        return CompletableFuture.allOf(futs.toArray(new CompletableFuture[futsSize]));
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta == null || !meta.consistentId().equals(cctx.localNode().consistentId().toString()))
+            return new SnapshotRestoreContext(req, Collections.emptyList(), Collections.emptyMap());
+
+        if (meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+        FilePageStoreManager pageStore = (FilePageStoreManager)cctx.pageStore();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), meta.folderName())) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            File cacheDir = pageStore.cacheWorkDir(snpCacheDir.getName().startsWith(CACHE_GRP_DIR_PREFIX), grpName);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+
+            pageStore.readCacheConfigurations(snpCacheDir, cfgsByName);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req, cacheDirs, cfgsById);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        if (ctx.clientNode())
+            return;
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        Exception failure = F.first(errs.values());
+
+        assert opCtx0 != null || failure != null : ctx.localNodeId();
+
+        if (opCtx0 == null) {
+            finishProcess(failure);
+
+            return;
+        }
+
+        if (failure == null)

Review comment:
       You already have the `onNodeLeft` method which sets an exception to the `opCtx`. Here you are processing the results in the discovery thread, so it seems this check is completely unnecessary and probably should be removed. 

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, updateMeta, stopChecker, errHnd).thenAccept(res -> {

Review comment:
       I think it would be more readable if you extract the functionality related to the `updateMeta` from this method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595811374



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {

Review comment:
       added handling for `RejectedExecutionException`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r615721495



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {

Review comment:
       Done, now we also check if the index can be used (after each test).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595814234



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r600307475



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);

Review comment:
       Done

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf merged pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf merged pull request #8648:
URL: https://github.com/apache/ignite/pull/8648


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r599024482



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);

Review comment:
       Should we use the system pool from performing the user future completion and avoiding doing that in the discovery thread?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r616798157



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(
+        DistributedProcessType procType,
+        Class<? extends Throwable> expCls,
+        String expMsg
+    ) throws Exception {
+        String grpName = "shared";
+        String cacheName = "cache1";
+
+        dfltCacheCfg = txCacheConfig(new CacheConfiguration<Integer, Object>(cacheName)).setGroupName(grpName);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, grpName);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(grpName), IgniteCheckedException.class, null);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(cacheName), expCls, expMsg);
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        checkCacheKeys(grid(0).cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeFail() throws Exception {
+        checkTopologyChange(true);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeJoin() throws Exception {
+        checkTopologyChange(false);
+    }
+
+    /**
+     * @param stopNode {@code True} to check node fail, {@code False} to check node join.
+     * @throws Exception if failed.
+     */
+    private void checkTopologyChange(boolean stopNode) throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(4, keysCnt);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(3));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, dfltCacheCfg.getName());
+
+        if (stopNode) {
+            IgniteInternalFuture<?> fut0 = runAsync(() -> stopGrid(3, true));
+
+            GridTestUtils.assertThrowsAnyCause(
+                log,
+                () -> fut.get(TIMEOUT),
+                ClusterTopologyCheckedException.class,
+                "Required node has left the cluster"
+            );
+
+            ensureCacheDirEmpty(3, dfltCacheCfg);
+
+            fut0.get(TIMEOUT);
+
+            awaitPartitionMapExchange();
+
+            dfltCacheCfg = null;
+
+            GridTestUtils.assertThrowsAnyCause(

Review comment:
       added `testNodeFailDuringFilesCopy`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r622749674



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreBaseTest.java
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.function.Function;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.internal.IgniteEx;
+
+/**
+ * Snapshot restore test base.
+ */
+public abstract class IgniteClusterSnapshotRestoreBaseTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    protected static final long TIMEOUT = 15_000;

Review comment:
       This is a **short** timeout to fail tests faster.  This is not a whole test timeout and I don't think that's a good idea to remove it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595818409



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);

Review comment:
       We reading `cfgs` from different threads, so we should use ConcurrentMap instead of HashMap for this. But in this case, it is not rational (compared to a couple of volatile reads/writes of object reference). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r596021465



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {

Review comment:
       added parallelism across partitions




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595825756



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return new GridFinishedFuture<>();
+
+        if (!reqId.equals(opCtx0.reqId)) {
+            return new GridFinishedFuture<>(
+                new IgniteCheckedException("Unknown snapshot restore operation was rejected."));
+        }
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())

Review comment:
       How to handle the transition from/to `read-only` mode?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595811671



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r614265426



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(

Review comment:
       It is not very clear why parameterization is better than the current approach. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595528256



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {

Review comment:
       The `RejectedExecutionException` might be thrown if the node is stopping. I think it's better to handle it too.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();

Review comment:
       Probably the `meta.folderName` should be used.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return new GridFinishedFuture<>();
+
+        if (!reqId.equals(opCtx0.reqId)) {
+            return new GridFinishedFuture<>(
+                new IgniteCheckedException("Unknown snapshot restore operation was rejected."));
+        }
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active."));
+
+        Throwable err = opCtx0.err.get();
+
+        if (err != null)
+            return new GridFinishedFuture<>(err);
+
+        if (!allNodesInBaselineAndAlive(opCtx0.nodes))
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        GridFutureAdapter<Boolean> retFut = new GridFutureAdapter<>();
+
+        try {
+            Collection<StoredCacheData> ccfgs = opCtx0.cfgs.values();
+
+            // Ensure that shared cache groups has no conflicts before start caches.
+            for (StoredCacheData cfg : ccfgs) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting restored caches " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName +
+                    ", caches=" + F.viewReadOnly(ccfgs, c -> c.config().getName()) + ']');
+            }
+
+            ctx.cache().dynamicStartCachesByStoredConf(ccfgs, true, true, false, null, true, opCtx0.nodes).listen(

Review comment:
       Can you return the `IgniteInternalFuture` directly here or `chain` the results if you need any additional result transformation?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return new GridFinishedFuture<>();
+
+        if (!reqId.equals(opCtx0.reqId)) {
+            return new GridFinishedFuture<>(
+                new IgniteCheckedException("Unknown snapshot restore operation was rejected."));
+        }
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active."));
+
+        Throwable err = opCtx0.err.get();
+
+        if (err != null)
+            return new GridFinishedFuture<>(err);
+
+        if (!allNodesInBaselineAndAlive(opCtx0.nodes))
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        GridFutureAdapter<Boolean> retFut = new GridFutureAdapter<>();
+
+        try {
+            Collection<StoredCacheData> ccfgs = opCtx0.cfgs.values();
+
+            // Ensure that shared cache groups has no conflicts before start caches.
+            for (StoredCacheData cfg : ccfgs) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());

Review comment:
       I think the message which is thrown here is incorrect since you already have a guarantee that cache doesn't exist here. And if it does it means you violate your process guarantees - probably it's better the `IgniteIllegalStateException` throw here.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return new GridFinishedFuture<>();
+
+        if (!reqId.equals(opCtx0.reqId)) {
+            return new GridFinishedFuture<>(
+                new IgniteCheckedException("Unknown snapshot restore operation was rejected."));
+        }
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active."));
+
+        Throwable err = opCtx0.err.get();
+
+        if (err != null)
+            return new GridFinishedFuture<>(err);
+
+        if (!allNodesInBaselineAndAlive(opCtx0.nodes))
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        GridFutureAdapter<Boolean> retFut = new GridFutureAdapter<>();
+
+        try {
+            Collection<StoredCacheData> ccfgs = opCtx0.cfgs.values();
+
+            // Ensure that shared cache groups has no conflicts before start caches.
+            for (StoredCacheData cfg : ccfgs) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting restored caches " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName +
+                    ", caches=" + F.viewReadOnly(ccfgs, c -> c.config().getName()) + ']');
+            }
+
+            ctx.cache().dynamicStartCachesByStoredConf(ccfgs, true, true, false, null, true, opCtx0.nodes).listen(
+                f -> {
+                    if (f.error() != null) {
+                        log.error("Unable to start restored caches [requestID=" + opCtx0.reqId +
+                            ", snapshot=" + opCtx0.snpName + ']', f.error());
+
+                        retFut.onDone(f.error());
+                    }
+                    else
+                        retFut.onDone(true);
+                }
+            );
+        } catch (IgniteCheckedException e) {
+            log.error("Unable to restore cache group(s) from snapshot " +
+                "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+
+        return retFut;
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishCacheStart(UUID reqId, Map<UUID, Boolean> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null || !reqId.equals(opCtx0.reqId))
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            finishProcess();
+
+            return;
+        }
+
+        if (U.isLocalNodeCoordinator(ctx.discovery()))
+            rollbackRestoreProc.start(reqId, new SnapshotRestoreRollbackRequest(reqId, failure));
+    }
+
+    /**
+     * Check the response for probable failures.
+     *
+     * @param errs Errors.
+     * @param opCtx Snapshot restore operation context.
+     * @param respNodes Set of responding topology nodes.
+     * @return Error, if any.
+     */
+    private Exception checkFailure(Map<UUID, Exception> errs, SnapshotRestoreContext opCtx, Set<UUID> respNodes) {
+        Exception err = F.first(errs.values());

Review comment:
       If you return only the first exception to the user (the `restore` future will be completed with this exception, right?) probably you should log other exceptions which you are skipping.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);

Review comment:
       I think you can put directly to `opCtx0.cfgs`, use the `putIfAbsent` instead and enrich the configurations that you already have on the local node, isn't it?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())

Review comment:
       I think this check makes sense only on heavy-background operations. Do we really need it here?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)

Review comment:
       if `opCtx0 == null && failure != null` you may return from this method earlier, isn't it? I think you should place this condition upper in the method body.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())

Review comment:
       I think the exception must be thrown in the `meta == null`.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),

Review comment:
       Let's use the `FilePageStoreManager` here to get the cache dir.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {

Review comment:
       Can you parallelize the copy operation on the same executor service? It seems the `restore` performs the copy operation only in one thread.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);

Review comment:
       Is it possible to unite the `checkMetadata` and `updateMetadata` methods? They are used both each time.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;

Review comment:
       You also need to check`Thread.isInterrupted` flag since the `shutdown` method may be called on the executor service.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +

Review comment:
       I think it's enough to check that folder exist, isn't it?

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {

Review comment:
       Probably this should be performed under the `rollback` distributed process since in case of any failures you won't receive successful/failure rollback results on other nodes.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return new GridFinishedFuture<>();
+
+        if (!reqId.equals(opCtx0.reqId)) {
+            return new GridFinishedFuture<>(
+                new IgniteCheckedException("Unknown snapshot restore operation was rejected."));
+        }
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())

Review comment:
       I think you should not check the sate of the cluster each time you run the next stage. If the state is changed it might be better to set the correct exception for the `opCtx.err` field and check it each time you want to proceed.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return new GridFinishedFuture<>();
+
+        if (!reqId.equals(opCtx0.reqId)) {
+            return new GridFinishedFuture<>(
+                new IgniteCheckedException("Unknown snapshot restore operation was rejected."));
+        }
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active."));
+
+        Throwable err = opCtx0.err.get();
+
+        if (err != null)
+            return new GridFinishedFuture<>(err);
+
+        if (!allNodesInBaselineAndAlive(opCtx0.nodes))

Review comment:
       The same thing - is it better to set `opCtx.err` if any of the affected baseline nodes leave the cluster.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return new GridFinishedFuture<>();
+
+        if (!reqId.equals(opCtx0.reqId)) {
+            return new GridFinishedFuture<>(
+                new IgniteCheckedException("Unknown snapshot restore operation was rejected."));

Review comment:
       The `restore` procedure state is incorrect if this condition occurred. I don't think another concurrent restore operation can be executed here, so it's better to fix the error messsage.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;

Review comment:
       You should skip `client` nodes here and throw an exception if the `opCtx0` is `null` here, isn't it? The nullable context on cache start means incorrect restore process state.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r622751052



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.OpenOption;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.IntSupplier;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.CacheMode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIO;
+import org.apache.ignite.internal.processors.cache.persistence.file.FileIOFactory;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.G;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.FILE_SUFFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.PART_FILE_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotRestoreProcess.TMP_CACHE_DIR_PREFIX;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends IgniteClusterSnapshotRestoreBaseTest {
+    /** Type name used for binary and SQL. */
+    private static final String TYPE_NAME = "CustomType";
+
+    /** Cache 1 name. */
+    private static final String CACHE1 = "cache1";
+
+    /** Cache 2 name. */
+    private static final String CACHE2 = "cache2";
+
+    /** Default shared cache group name. */
+    private static final String SHARED_GRP = "shared";
+
+    /** Cache value builder. */
+    private Function<Integer, Object> valBuilder = String::valueOf;
+
+    /** {@inheritDoc} */
+    @Override protected Function<Integer, Object> valueBuilder() {
+        return valBuilder;
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreAllGroups() throws Exception {
+        CacheConfiguration<Integer, Object> cacheCfg1 =
+            txCacheConfig(new CacheConfiguration<Integer, Object>(CACHE1)).setGroupName(SHARED_GRP);
+
+        CacheConfiguration<Integer, Object> cacheCfg2 =
+            txCacheConfig(new CacheConfiguration<Integer, Object>(CACHE2)).setGroupName(SHARED_GRP);
+
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder,
+            dfltCacheCfg.setBackups(0), cacheCfg1, cacheCfg2);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cache(CACHE1).destroy();
+        ignite.cache(CACHE2).destroy();
+        ignite.cache(DEFAULT_CACHE_NAME).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Restore all cache groups.
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, null).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(DEFAULT_CACHE_NAME), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(CACHE1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(CACHE2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testStartClusterSnapshotRestoreMultipleThreadsSameNode() throws Exception {
+        checkStartClusterSnapshotRestoreMultithreaded(() -> 0);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testStartClusterSnapshotRestoreMultipleThreadsDiffNode() throws Exception {
+        AtomicInteger nodeIdx = new AtomicInteger();
+
+        checkStartClusterSnapshotRestoreMultithreaded(nodeIdx::getAndIncrement);
+    }
+
+    /**
+     * @param nodeIdxSupplier Ignite node index supplier.
+     */
+    public void checkStartClusterSnapshotRestoreMultithreaded(IntSupplier nodeIdxSupplier) throws Exception {
+        Ignite ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        CountDownLatch startLatch = new CountDownLatch(1);
+        AtomicInteger successCnt = new AtomicInteger();
+
+        IgniteInternalFuture<Long> fut = GridTestUtils.runMultiThreadedAsync(() -> {
+            try {
+                startLatch.await(TIMEOUT, TimeUnit.MILLISECONDS);
+
+                grid(nodeIdxSupplier.getAsInt()).snapshot().restoreSnapshot(
+                    SNAPSHOT_NAME, Collections.singleton(DEFAULT_CACHE_NAME)).get(TIMEOUT);
+
+                successCnt.incrementAndGet();
+            }
+            catch (Exception ignore) {
+                // Expected exception.

Review comment:
       There possible 2 exceptions, first about another process started, second (rare) - cache exists (if the second process will be delayed).
   Do you suggest checking for exceptions manually? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r614264404



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(
+        DistributedProcessType procType,
+        Class<? extends Throwable> expCls,
+        String expMsg
+    ) throws Exception {
+        String grpName = "shared";
+        String cacheName = "cache1";
+
+        dfltCacheCfg = txCacheConfig(new CacheConfiguration<Integer, Object>(cacheName)).setGroupName(grpName);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, grpName);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(grpName), IgniteCheckedException.class, null);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(cacheName), expCls, expMsg);
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        checkCacheKeys(grid(0).cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeFail() throws Exception {
+        checkTopologyChange(true);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeJoin() throws Exception {

Review comment:
       This set of tests has the name IgniteCluster**SnapshotRestore**SelfTest do we really need to duplicate `OnRestoreInProgress` for each test?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r609654380



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       These tests are placed in the `core` module but executed in the `indexing` module. I suggest splitting this class into tests without indexes and for tests with indexes. As another option, we can override the cache configuration for the indexing module to run all tests with indexes too.

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;

Review comment:
       This can be `private`.

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {

Review comment:
       Let's rename to `testClusterSnapshotRestoreOnGreaterTopology`

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotOperationRequest.java
##########
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Set;
+import java.util.UUID;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.tostring.GridToStringInclude;
+import org.apache.ignite.internal.util.typedef.internal.S;
+
+/**
+ * Snapshot operation start request for {@link DistributedProcess} initiate message.
+ */
+public class SnapshotOperationRequest implements Serializable {
+    /** Serial version uid. */
+    private static final long serialVersionUID = 0L;
+
+    /** Request ID. */
+    private final UUID reqId;
+
+    /** Snapshot name. */
+    private final String snpName;
+
+    /** Baseline node IDs that must be alive to complete the operation. */
+    @GridToStringInclude
+    private final Set<UUID> nodes;
+
+    /** List of cache group names. */
+    @GridToStringInclude
+    private final Collection<String> grps;
+
+    /** Operational node ID. */
+    private final UUID opNodeId;
+
+    /** Exception occurred during snapshot operation processing. */
+    private volatile Throwable err;
+
+    /**
+     * @param reqId Request ID.
+     * @param opNodeId Operational node ID.
+     * @param snpName Snapshot name.
+     * @param grps List of cache group names.
+     * @param nodes Baseline node IDs that must be alive to complete the operation.
+     */
+    public SnapshotOperationRequest(
+        UUID reqId,
+        UUID opNodeId,
+        String snpName,
+        Collection<String> grps,
+        Set<UUID> nodes
+    ) {
+        this.reqId = reqId;
+        this.opNodeId = opNodeId;
+        this.snpName = snpName;
+        this.grps = grps;
+        this.nodes = nodes;
+    }
+
+    /**
+     * @return Request ID.
+     */
+    public UUID requestId() {
+        return reqId;
+    }
+
+    /**
+     * @return Snapshot name.
+     */
+    public String snapshotName() {
+        return snpName;
+    }
+
+    /**
+     * @return List of cache group names.
+     */
+    public Collection<String> groups() {
+        return grps;
+    }
+
+    /**
+     * @return Baseline node IDs that must be alive to complete the operation.
+     */
+    public Set<UUID> nodes() {
+        return nodes;
+    }
+
+    /**
+     * @return Operational node ID.
+     */
+    public UUID operNodeId() {

Review comment:
       Let's use the full name here - operationalNodeId

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {

Review comment:
       Let's check also that the index is restored too and can be successfully used (for the indexing module).

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();

Review comment:
       This can be private.

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);

Review comment:
       This method may simplify the data filling here and start multiple caches at once:
   ```
   AbstractSnapshotSelfTest#startGridsWithCache(int, 
   int,
    java.util.function.Function<java.lang.Integer,V>, 
    org.apache.ignite.configuration.CacheConfiguration<java.lang.Integer,V>...)
   ```

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();

Review comment:
       You're changing the `valBuilder` for some tests locally. You should nullify or change it to the default value each time a new test executes (probably the `@BeforeTest` annotation should be used).

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";

Review comment:
       Let's move these names to the constant variables.

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {

Review comment:
       Let's also check here, that the restore operation fails when a new baseline topology doesn't contain some of a node from which a snapshot has been taken.

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/AbstractSnapshotSelfTest.java
##########
@@ -124,6 +124,9 @@
 
         discoSpi.setIpFinder(((TcpDiscoverySpi)cfg.getDiscoverySpi()).getIpFinder());
 
+        if (dfltCacheCfg != null)

Review comment:
       Why do we need this change? This configuration sets each time a new test starts. See the `@Before`

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(
+        DistributedProcessType procType,
+        Class<? extends Throwable> expCls,
+        String expMsg
+    ) throws Exception {
+        String grpName = "shared";
+        String cacheName = "cache1";
+
+        dfltCacheCfg = txCacheConfig(new CacheConfiguration<Integer, Object>(cacheName)).setGroupName(grpName);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, grpName);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(grpName), IgniteCheckedException.class, null);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(cacheName), expCls, expMsg);
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        checkCacheKeys(grid(0).cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeFail() throws Exception {
+        checkTopologyChange(true);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeJoin() throws Exception {
+        checkTopologyChange(false);
+    }
+
+    /**
+     * @param stopNode {@code True} to check node fail, {@code False} to check node join.
+     * @throws Exception if failed.
+     */
+    private void checkTopologyChange(boolean stopNode) throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(4, keysCnt);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(3));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, dfltCacheCfg.getName());
+
+        if (stopNode) {
+            IgniteInternalFuture<?> fut0 = runAsync(() -> stopGrid(3, true));
+
+            GridTestUtils.assertThrowsAnyCause(
+                log,
+                () -> fut.get(TIMEOUT),
+                ClusterTopologyCheckedException.class,
+                "Required node has left the cluster"
+            );
+
+            ensureCacheDirEmpty(3, dfltCacheCfg);
+
+            fut0.get(TIMEOUT);
+
+            awaitPartitionMapExchange();
+
+            dfltCacheCfg = null;
+
+            GridTestUtils.assertThrowsAnyCause(
+                log,
+                () -> startGrid(3),
+                IgniteSpiException.class,
+                "to add the node to cluster - remove directories with the caches"
+            );
+
+            return;
+        }
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> startGrid(4),
+            IgniteSpiException.class,
+            "Joining node during caches restore is not allowed"
+        );
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterStateChangeActiveReadonlyOnPrepare() throws Exception {
+        checkClusterStateChange(ClusterState.ACTIVE_READ_ONLY, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE,
+            IgniteException.class, "Failed to perform start cache operation (cluster is in read-only mode)");
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterStateChangeActiveReadonlyOnCacheStart() throws Exception {
+        checkClusterStateChange(ClusterState.ACTIVE_READ_ONLY, RESTORE_CACHE_GROUP_SNAPSHOT_START, null, null);
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterDeactivateOnPrepare() throws Exception {
+        checkClusterStateChange(ClusterState.INACTIVE, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE,
+            IgniteException.class, "The cluster has been deactivated.");
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterDeactivateOnCacheStart() throws Exception {
+        checkClusterStateChange(ClusterState.INACTIVE, RESTORE_CACHE_GROUP_SNAPSHOT_START, null, null);
+    }
+
+    /**
+     * @param state Cluster state.
+     * @param procType The type of distributed process on which communication is blocked.
+     * @param exCls Expected exception class.
+     * @param expMsg Expected exception message.
+     * @throws Exception if failed.
+     */
+    private void checkClusterStateChange(
+        ClusterState state,
+        DistributedProcessType procType,
+        @Nullable Class<? extends Throwable> exCls,
+        @Nullable String expMsg
+    ) throws Exception {
+        checkClusterStateChange(state, procType, exCls, expMsg, false);
+    }
+
+    /**
+     * @param state Cluster state.
+     * @param procType The type of distributed process on which communication is blocked.
+     * @param exCls Expected exception class.
+     * @param expMsg Expected exception message.
+     * @param stopNode Stop node flag.
+     * @throws Exception if failed.
+     */
+    private void checkClusterStateChange(

Review comment:
       Let's inline this method.

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(
+        DistributedProcessType procType,
+        Class<? extends Throwable> expCls,
+        String expMsg
+    ) throws Exception {
+        String grpName = "shared";
+        String cacheName = "cache1";
+
+        dfltCacheCfg = txCacheConfig(new CacheConfiguration<Integer, Object>(cacheName)).setGroupName(grpName);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, grpName);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(grpName), IgniteCheckedException.class, null);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(cacheName), expCls, expMsg);
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        checkCacheKeys(grid(0).cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeFail() throws Exception {
+        checkTopologyChange(true);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeJoin() throws Exception {
+        checkTopologyChange(false);
+    }
+
+    /**
+     * @param stopNode {@code True} to check node fail, {@code False} to check node join.
+     * @throws Exception if failed.
+     */
+    private void checkTopologyChange(boolean stopNode) throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(4, keysCnt);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(3));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, dfltCacheCfg.getName());
+
+        if (stopNode) {
+            IgniteInternalFuture<?> fut0 = runAsync(() -> stopGrid(3, true));
+
+            GridTestUtils.assertThrowsAnyCause(
+                log,
+                () -> fut.get(TIMEOUT),
+                ClusterTopologyCheckedException.class,
+                "Required node has left the cluster"
+            );
+
+            ensureCacheDirEmpty(3, dfltCacheCfg);
+
+            fut0.get(TIMEOUT);
+
+            awaitPartitionMapExchange();
+
+            dfltCacheCfg = null;
+
+            GridTestUtils.assertThrowsAnyCause(
+                log,
+                () -> startGrid(3),
+                IgniteSpiException.class,
+                "to add the node to cluster - remove directories with the caches"
+            );
+
+            return;
+        }
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> startGrid(4),
+            IgniteSpiException.class,
+            "Joining node during caches restore is not allowed"
+        );
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterStateChangeActiveReadonlyOnPrepare() throws Exception {
+        checkClusterStateChange(ClusterState.ACTIVE_READ_ONLY, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE,
+            IgniteException.class, "Failed to perform start cache operation (cluster is in read-only mode)");
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterStateChangeActiveReadonlyOnCacheStart() throws Exception {
+        checkClusterStateChange(ClusterState.ACTIVE_READ_ONLY, RESTORE_CACHE_GROUP_SNAPSHOT_START, null, null);
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterDeactivateOnPrepare() throws Exception {
+        checkClusterStateChange(ClusterState.INACTIVE, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE,
+            IgniteException.class, "The cluster has been deactivated.");
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterDeactivateOnCacheStart() throws Exception {
+        checkClusterStateChange(ClusterState.INACTIVE, RESTORE_CACHE_GROUP_SNAPSHOT_START, null, null);
+    }
+
+    /**
+     * @param state Cluster state.
+     * @param procType The type of distributed process on which communication is blocked.
+     * @param exCls Expected exception class.
+     * @param expMsg Expected exception message.
+     * @throws Exception if failed.
+     */
+    private void checkClusterStateChange(

Review comment:
       Let's re-write these tests using the `@Parameterized.Parameters`. 

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       Let's add a test for multiple activation/deactivation commands, e.g.:
   - start cache restoring
   - deactivate cluster (restoring fails)
   - activate cluster
   - start cache restoring (restoring succeed)

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       Let's add a test when multiply restore requests start from different nodes (only one should succeed).

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(
+        DistributedProcessType procType,
+        Class<? extends Throwable> expCls,
+        String expMsg
+    ) throws Exception {
+        String grpName = "shared";
+        String cacheName = "cache1";
+
+        dfltCacheCfg = txCacheConfig(new CacheConfiguration<Integer, Object>(cacheName)).setGroupName(grpName);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, grpName);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(grpName), IgniteCheckedException.class, null);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(cacheName), expCls, expMsg);
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        checkCacheKeys(grid(0).cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeFail() throws Exception {
+        checkTopologyChange(true);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeJoin() throws Exception {

Review comment:
       `testNodeJoin` > `testNodeJoinOnRestoreInProgress`

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(

Review comment:
       Can we `@Parameterized` these checks?

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(
+        DistributedProcessType procType,
+        Class<? extends Throwable> expCls,
+        String expMsg
+    ) throws Exception {
+        String grpName = "shared";
+        String cacheName = "cache1";
+
+        dfltCacheCfg = txCacheConfig(new CacheConfiguration<Integer, Object>(cacheName)).setGroupName(grpName);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, grpName);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(grpName), IgniteCheckedException.class, null);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(cacheName), expCls, expMsg);
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        checkCacheKeys(grid(0).cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeFail() throws Exception {
+        checkTopologyChange(true);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeJoin() throws Exception {
+        checkTopologyChange(false);
+    }
+
+    /**
+     * @param stopNode {@code True} to check node fail, {@code False} to check node join.
+     * @throws Exception if failed.
+     */
+    private void checkTopologyChange(boolean stopNode) throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(4, keysCnt);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(3));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, dfltCacheCfg.getName());
+
+        if (stopNode) {
+            IgniteInternalFuture<?> fut0 = runAsync(() -> stopGrid(3, true));
+
+            GridTestUtils.assertThrowsAnyCause(
+                log,
+                () -> fut.get(TIMEOUT),
+                ClusterTopologyCheckedException.class,
+                "Required node has left the cluster"
+            );
+
+            ensureCacheDirEmpty(3, dfltCacheCfg);
+
+            fut0.get(TIMEOUT);
+
+            awaitPartitionMapExchange();
+
+            dfltCacheCfg = null;
+
+            GridTestUtils.assertThrowsAnyCause(
+                log,
+                () -> startGrid(3),
+                IgniteSpiException.class,
+                "to add the node to cluster - remove directories with the caches"
+            );
+
+            return;
+        }
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> startGrid(4),
+            IgniteSpiException.class,
+            "Joining node during caches restore is not allowed"
+        );
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterStateChangeActiveReadonlyOnPrepare() throws Exception {
+        checkClusterStateChange(ClusterState.ACTIVE_READ_ONLY, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE,
+            IgniteException.class, "Failed to perform start cache operation (cluster is in read-only mode)");
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterStateChangeActiveReadonlyOnCacheStart() throws Exception {
+        checkClusterStateChange(ClusterState.ACTIVE_READ_ONLY, RESTORE_CACHE_GROUP_SNAPSHOT_START, null, null);
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterDeactivateOnPrepare() throws Exception {
+        checkClusterStateChange(ClusterState.INACTIVE, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE,
+            IgniteException.class, "The cluster has been deactivated.");
+    }
+
+    /**
+     * @throws Exception if failed.
+     */
+    @Test
+    public void testClusterDeactivateOnCacheStart() throws Exception {
+        checkClusterStateChange(ClusterState.INACTIVE, RESTORE_CACHE_GROUP_SNAPSHOT_START, null, null);
+    }
+
+    /**
+     * @param state Cluster state.
+     * @param procType The type of distributed process on which communication is blocked.
+     * @param exCls Expected exception class.
+     * @param expMsg Expected exception message.
+     * @throws Exception if failed.
+     */
+    private void checkClusterStateChange(
+        ClusterState state,
+        DistributedProcessType procType,
+        @Nullable Class<? extends Throwable> exCls,
+        @Nullable String expMsg
+    ) throws Exception {
+        checkClusterStateChange(state, procType, exCls, expMsg, false);
+    }
+
+    /**
+     * @param state Cluster state.
+     * @param procType The type of distributed process on which communication is blocked.
+     * @param exCls Expected exception class.
+     * @param expMsg Expected exception message.
+     * @param stopNode Stop node flag.
+     * @throws Exception if failed.
+     */
+    private void checkClusterStateChange(
+        ClusterState state,
+        DistributedProcessType procType,
+        @Nullable Class<? extends Throwable> exCls,
+        @Nullable String expMsg,
+        boolean stopNode
+    ) throws Exception {
+        int nodesCnt = stopNode ? 3 : 2;
+
+        Ignite ignite = startGridsWithSnapshot(nodesCnt, CACHE_KEYS_RANGE, true);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(nodesCnt - 1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, dfltCacheCfg.getName());
+
+        ignite.cluster().state(state);
+
+        if (stopNode)
+            stopGrid(nodesCnt - 1);
+        else
+            spi.stopBlock();
+
+        if (exCls == null) {
+            fut.get(TIMEOUT);
+
+            ignite.cluster().state(ClusterState.ACTIVE);
+
+            checkCacheKeys(ignite.cache(dfltCacheCfg.getName()), CACHE_KEYS_RANGE);
+
+            return;
+        }
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), exCls, expMsg);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        ensureCacheDirEmpty(stopNode ? nodesCnt - 1 : nodesCnt, dfltCacheCfg);
+
+        String cacheName = dfltCacheCfg.getName();
+
+        if (stopNode) {
+            dfltCacheCfg = null;
+
+            startGrid(nodesCnt - 1);
+
+            resetBaselineTopology();
+        }
+
+        grid(nodesCnt - 1).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(cacheName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /**
+     * @param nodesCnt Count of nodes.
+     * @param ccfg Cache configuration.
+     * @throws IgniteCheckedException if failed.
+     */
+    private void ensureCacheDirEmpty(int nodesCnt, CacheConfiguration<?, ?> ccfg) throws IgniteCheckedException {

Review comment:
       Can we use here `G.allGrids` instead of the `nodeCnt` param?

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       Let's add a test when multiply restore requests start from the same node (the only one should succeed).

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(
+        DistributedProcessType procType,
+        Class<? extends Throwable> expCls,
+        String expMsg
+    ) throws Exception {
+        String grpName = "shared";
+        String cacheName = "cache1";
+
+        dfltCacheCfg = txCacheConfig(new CacheConfiguration<Integer, Object>(cacheName)).setGroupName(grpName);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, grpName);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(grpName), IgniteCheckedException.class, null);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(cacheName), expCls, expMsg);
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        checkCacheKeys(grid(0).cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeFail() throws Exception {
+        checkTopologyChange(true);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeJoin() throws Exception {
+        checkTopologyChange(false);
+    }
+
+    /**
+     * @param stopNode {@code True} to check node fail, {@code False} to check node join.
+     * @throws Exception if failed.
+     */
+    private void checkTopologyChange(boolean stopNode) throws Exception {

Review comment:
       I think this method is unnecessary and all check must be performed in separate tests. See my comment for `to add the node to cluster - remove directories with the caches`.

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(
+        DistributedProcessType procType,
+        Class<? extends Throwable> expCls,
+        String expMsg
+    ) throws Exception {
+        String grpName = "shared";
+        String cacheName = "cache1";
+
+        dfltCacheCfg = txCacheConfig(new CacheConfiguration<Integer, Object>(cacheName)).setGroupName(grpName);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, grpName);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(grpName), IgniteCheckedException.class, null);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(cacheName), expCls, expMsg);
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        checkCacheKeys(grid(0).cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeFail() throws Exception {
+        checkTopologyChange(true);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeJoin() throws Exception {
+        checkTopologyChange(false);
+    }
+
+    /**
+     * @param stopNode {@code True} to check node fail, {@code False} to check node join.
+     * @throws Exception if failed.
+     */
+    private void checkTopologyChange(boolean stopNode) throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(4, keysCnt);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(3));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, dfltCacheCfg.getName());
+
+        if (stopNode) {
+            IgniteInternalFuture<?> fut0 = runAsync(() -> stopGrid(3, true));
+
+            GridTestUtils.assertThrowsAnyCause(
+                log,
+                () -> fut.get(TIMEOUT),
+                ClusterTopologyCheckedException.class,
+                "Required node has left the cluster"
+            );
+
+            ensureCacheDirEmpty(3, dfltCacheCfg);
+
+            fut0.get(TIMEOUT);
+
+            awaitPartitionMapExchange();
+
+            dfltCacheCfg = null;

Review comment:
       Why do we need to do it?

##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    private static final long TIMEOUT = 15_000;
+
+    /** Binary type name. */
+    private static final String BIN_TYPE_NAME = "customType";
+
+    /** Static cache configurations. */
+    protected CacheConfiguration<?, ?>[] cacheCfgs;
+
+    /** Cache value builder. */
+    protected Function<Integer, Object> valBuilder = new IndexedValueBuilder();
+
+    /** {@inheritDoc} */
+    @Override protected IgniteConfiguration getConfiguration(String name) throws Exception {
+        IgniteConfiguration cfg = super.getConfiguration(name);
+
+        if (cacheCfgs != null)
+            cfg.setCacheConfiguration(cacheCfgs);
+        else if (dfltCacheCfg != null) {
+            dfltCacheCfg.setSqlIndexMaxInlineSize(255);
+            dfltCacheCfg.setQueryEntities(
+                Arrays.asList(queryEntity(BIN_TYPE_NAME), queryEntity(IndexedObject.class.getName())));
+        }
+
+        return cfg;
+    }
+
+    /**
+     * @param typeName Type name.
+     */
+    private QueryEntity queryEntity(String typeName) {
+        return new QueryEntity()
+            .setKeyType(Integer.class.getName())
+            .setValueType(typeName)
+            .setFields(new LinkedHashMap<>(F.asMap("id", Integer.class.getName(), "name", String.class.getName())))
+            .setIndexes(Arrays.asList(new QueryIndex("id"), new QueryIndex("name")));
+    }
+
+    /**
+     * Ensures that the cache doesn't start if one of the baseline nodes fails.
+     *
+     * @throws Exception If failed.
+     */
+    @Test
+    public void testCacheStartFailOnNodeLeft() throws Exception {
+        int keysCnt = 10_000;
+
+        startGridsWithSnapshot(3, keysCnt, true);
+
+        BlockingCustomMessageDiscoverySpi discoSpi = discoSpi(grid(0));
+
+        discoSpi.block((msg) -> msg instanceof DynamicCacheChangeBatch);
+
+        IgniteFuture<Void> fut =
+            grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        discoSpi.waitBlocked(TIMEOUT);
+
+        stopGrid(2, true);
+
+        discoSpi.unblock();
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut.get(TIMEOUT), ClusterTopologyCheckedException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestore() throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt, true);
+
+        grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName());
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testBasicClusterSnapshotRestoreWithMetadata() throws Exception {
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, keysCnt);
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreRejectOnInActiveCluster() throws Exception {
+        IgniteEx ignite = startGridsWithCache(2, CACHE_KEYS_RANGE, valBuilder, dfltCacheCfg);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        ignite.cluster().state(ClusterState.INACTIVE);
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(
+            log, () -> fut.get(TIMEOUT), IgniteException.class, "The cluster should be active");
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testClusterSnapshotRestoreDiffTopology() throws Exception {
+        int nodesCnt = 4;
+
+        int keysCnt = 10_000;
+
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        startGridsWithCache(nodesCnt - 2, keysCnt, valBuilder, dfltCacheCfg);
+
+        grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        startGrid(nodesCnt - 2);
+
+        IgniteEx ignite = startGrid(nodesCnt - 1);
+
+        resetBaselineTopology();
+
+        awaitPartitionMapExchange();
+
+        ignite.cache(dfltCacheCfg.getName()).destroy();
+
+        awaitPartitionMapExchange();
+
+        // Remove metadata.
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        forceCheckpoint();
+
+        // Restore from an empty node.
+        ignite.snapshot().restoreSnapshot(
+            SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName())).get(TIMEOUT);
+
+        IgniteCache<Object, Object> cache = ignite.cache(dfltCacheCfg.getName()).withKeepBinary();
+
+        assertTrue(cache.indexReadyFuture().isDone());
+
+        awaitPartitionMapExchange();
+
+        checkCacheKeys(cache, keysCnt);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testRestoreSharedCacheGroup() throws Exception {
+        String grpName = "shared";
+        String cacheName1 = "cache1";
+        String cacheName2 = "cache2";
+
+        CacheConfiguration<?, ?> cacheCfg1 = txCacheConfig(new CacheConfiguration<>(cacheName1)).setGroupName(grpName);
+        CacheConfiguration<?, ?> cacheCfg2 = txCacheConfig(new CacheConfiguration<>(cacheName2)).setGroupName(grpName);
+
+        cacheCfgs = new CacheConfiguration[] {cacheCfg1, cacheCfg2};
+
+        IgniteEx ignite = startGrids(2);
+
+        ignite.cluster().state(ClusterState.ACTIVE);
+
+        IgniteCache<Integer, Object> cache1 = ignite.cache(cacheName1);
+        putKeys(cache1, 0, CACHE_KEYS_RANGE);
+
+        IgniteCache<Integer, Object> cache2 = ignite.cache(cacheName2);
+        putKeys(cache2, 0, CACHE_KEYS_RANGE);
+
+        ignite.snapshot().createSnapshot(SNAPSHOT_NAME).get(TIMEOUT);
+
+        cache1.destroy();
+
+        awaitPartitionMapExchange();
+
+        IgniteSnapshot snp = ignite.snapshot();
+
+        GridTestUtils.assertThrowsAnyCause(
+            log,
+            () -> snp.restoreSnapshot(SNAPSHOT_NAME, Arrays.asList(cacheName1, cacheName2)).get(TIMEOUT),
+            IllegalArgumentException.class,
+            "Cache group(s) was not found in the snapshot"
+        );
+
+        cache2.destroy();
+
+        awaitPartitionMapExchange();
+
+        snp.restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(grpName)).get(TIMEOUT);
+
+        checkCacheKeys(ignite.cache(cacheName1), CACHE_KEYS_RANGE);
+        checkCacheKeys(ignite.cache(cacheName2), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testIncompatibleMetasUpdate() throws Exception {
+        valBuilder = new BinaryValueBuilder(0, BIN_TYPE_NAME);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        int typeId = ignite.context().cacheObjects().typeId(BIN_TYPE_NAME);
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        BinaryObject[] objs = new BinaryObject[CACHE_KEYS_RANGE];
+
+        IgniteCache<Integer, Object> cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", n);
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        fut.get(TIMEOUT);
+
+        // Ensure that existing type has been updated.
+        BinaryType type = ignite.context().cacheObjects().metadata(typeId);
+
+        assertTrue(type.fieldNames().contains("name"));
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+
+        cache1.destroy();
+
+        grid(0).cache(dfltCacheCfg.getName()).destroy();
+
+        ignite.context().cacheObjects().removeType(typeId);
+
+        // Create cache with incompatible binary type.
+        cache1 = createCacheWithBinaryType(ignite, "cache1", n -> {
+            BinaryObjectBuilder builder = ignite.binary().builder(BIN_TYPE_NAME);
+
+            builder.setField("id", UUID.randomUUID());
+
+            objs[n] = builder.build();
+
+            return objs[n];
+        });
+
+        IgniteFuture<Void> fut0 =
+            ignite.snapshot().restoreSnapshot(SNAPSHOT_NAME, Collections.singleton(dfltCacheCfg.getName()));
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> fut0.get(TIMEOUT), BinaryObjectException.class, null);
+
+        ensureCacheDirEmpty(2, dfltCacheCfg);
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            assertEquals(objs[i], cache1.get(i));
+    }
+
+    /**
+     * @param ignite Ignite.
+     * @param cacheName Cache name.
+     * @param valBuilder Binary value builder.
+     * @return Created cache.
+     */
+    private IgniteCache<Integer, Object> createCacheWithBinaryType(
+        Ignite ignite,
+        String cacheName,
+        Function<Integer, BinaryObject> valBuilder
+    ) {
+        IgniteCache<Integer, Object> cache = ignite.createCache(new CacheConfiguration<>(cacheName)).withKeepBinary();
+
+        for (int i = 0; i < CACHE_KEYS_RANGE; i++)
+            cache.put(i, valBuilder.apply(i));
+
+        return cache;
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnPrepare() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, IgniteCheckedException.class,
+            "Cache start failed. A cache or group with the same name is currently being restored from a snapshot");
+    }
+
+    /**
+     * @throws Exception if failed
+     */
+    @Test
+    public void testParallelCacheStartWithTheSameNameOnStart() throws Exception {
+        checkCacheStartWithTheSameName(RESTORE_CACHE_GROUP_SNAPSHOT_START, CacheExistsException.class,
+            "Failed to start cache (a cache with the same name is already started):");
+    }
+
+    /**
+     * @param procType The type of distributed process on which communication is blocked.
+     * @throws Exception if failed.
+     */
+    private void checkCacheStartWithTheSameName(
+        DistributedProcessType procType,
+        Class<? extends Throwable> expCls,
+        String expMsg
+    ) throws Exception {
+        String grpName = "shared";
+        String cacheName = "cache1";
+
+        dfltCacheCfg = txCacheConfig(new CacheConfiguration<Integer, Object>(cacheName)).setGroupName(grpName);
+
+        IgniteEx ignite = startGridsWithSnapshot(2, CACHE_KEYS_RANGE);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(1));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, procType, grpName);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(grpName), IgniteCheckedException.class, null);
+
+        GridTestUtils.assertThrowsAnyCause(log, () -> ignite.createCache(cacheName), expCls, expMsg);
+
+        spi.stopBlock();
+
+        fut.get(TIMEOUT);
+
+        checkCacheKeys(grid(0).cache(cacheName), CACHE_KEYS_RANGE);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeFail() throws Exception {
+        checkTopologyChange(true);
+    }
+
+    /** @throws Exception If failed. */
+    @Test
+    public void testNodeJoin() throws Exception {
+        checkTopologyChange(false);
+    }
+
+    /**
+     * @param stopNode {@code True} to check node fail, {@code False} to check node join.
+     * @throws Exception if failed.
+     */
+    private void checkTopologyChange(boolean stopNode) throws Exception {
+        int keysCnt = 10_000;
+
+        IgniteEx ignite = startGridsWithSnapshot(4, keysCnt);
+
+        TestRecordingCommunicationSpi spi = TestRecordingCommunicationSpi.spi(grid(3));
+
+        IgniteFuture<Void> fut = waitForBlockOnRestore(spi, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, dfltCacheCfg.getName());
+
+        if (stopNode) {
+            IgniteInternalFuture<?> fut0 = runAsync(() -> stopGrid(3, true));
+
+            GridTestUtils.assertThrowsAnyCause(
+                log,
+                () -> fut.get(TIMEOUT),
+                ClusterTopologyCheckedException.class,
+                "Required node has left the cluster"
+            );
+
+            ensureCacheDirEmpty(3, dfltCacheCfg);
+
+            fut0.get(TIMEOUT);
+
+            awaitPartitionMapExchange();
+
+            dfltCacheCfg = null;
+
+            GridTestUtils.assertThrowsAnyCause(

Review comment:
       This is not correct check:
   The node should fail on its start rather than joining the cluster on discovery.
   
   The correct test should look like this:
   - node left the cluster during the restore procedure (assume perform phase completed)
   - the restore procedure fails
   - the node which left the cluster must unable to start (before running discovery manager) and must fail with an exception 
   - in a test, you should clean up the mentioned directories and start the node again
   - node must be able to start and join the cluster




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r600304066



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, updateMeta, stopChecker, errHnd).thenAccept(res -> {
+                Throwable err = opCtx.err.get();
+
+                if (err != null) {
+                    log.error("Unable to restore cache group(s) from the snapshot " +
+                        "[reqId=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', err);
+
+                    retFut.onDone(err);
+                } else
+                    retFut.onDone(new ArrayList<>(opCtx.cfgs.values()));
+            });
+
+            return retFut;
+        } catch (IgniteIllegalStateException | IgniteCheckedException | RejectedExecutionException e) {
+            log.error("Unable to restore cache group(s) from the snapshot " +
+                "[reqId=" + req.requestId() + ", snapshot=" + req.snapshotName() + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param snpName Snapshot name.
+     * @param dirs Cache directories to restore from the snapshot.
+     * @param updateMeta Update binary metadata flag.
+     * @param stopChecker Prcoess interrupt checker.
+     * @param errHnd Error handler.
+     * @throws IgniteCheckedException If failed.
+     */
+    private CompletableFuture<Void> restoreAsync(
+        String snpName,
+        Collection<File> dirs,
+        boolean updateMeta,
+        BooleanSupplier stopChecker,
+        Consumer<Exception> errHnd
+    ) throws IgniteCheckedException {
+        IgniteSnapshotManager snapshotMgr = ctx.cache().context().snapshotMgr();
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        List<CompletableFuture<Void>> futs = new ArrayList<>();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(snapshotMgr.snapshotLocalDir(snpName).getAbsolutePath(), pdsFolderName);
+
+            futs.add(CompletableFuture.runAsync(() -> {
+                try {
+                    ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+                }
+                catch (IgniteCheckedException e) {
+                    errHnd.accept(e);
+                }
+            }, snapshotMgr.snapshotExecutorService()));
+        }
+
+        for (File cacheDir : dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            assert snpCacheDir.exists() : "node=" + ctx.localNodeId() + ", dir=" + snpCacheDir;
+
+            for (File snpFile : snpCacheDir.listFiles()) {
+                futs.add(CompletableFuture.runAsync(() -> {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    try {
+                        Files.copy(snpFile.toPath(), target.toPath());
+                    }
+                    catch (IOException e) {
+                        errHnd.accept(e);
+                    }
+                }, ctx.cache().context().snapshotMgr().snapshotExecutorService()));
+            }
+        }
+
+        int futsSize = futs.size();
+
+        return CompletableFuture.allOf(futs.toArray(new CompletableFuture[futsSize]));
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta == null || !meta.consistentId().equals(cctx.localNode().consistentId().toString()))
+            return new SnapshotRestoreContext(req, Collections.emptyList(), Collections.emptyMap());
+
+        if (meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+        FilePageStoreManager pageStore = (FilePageStoreManager)cctx.pageStore();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), meta.folderName())) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            File cacheDir = pageStore.cacheWorkDir(snpCacheDir.getName().startsWith(CACHE_GRP_DIR_PREFIX), grpName);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+
+            pageStore.readCacheConfigurations(snpCacheDir, cfgsByName);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req, cacheDirs, cfgsById);
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        if (ctx.clientNode())
+            return;
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        Exception failure = F.first(errs.values());
+
+        assert opCtx0 != null || failure != null : ctx.localNodeId();
+
+        if (opCtx0 == null) {
+            finishProcess(failure);
+
+            return;
+        }
+
+        if (failure == null)

Review comment:
       We cannot rely on single node errors during the "finish"-phase of the distributed process. For example, one of the nodes may observe a node failure, and the other may not, in which case we will have different behavior on different nodes at the completion stage, so the decision is to track only those errors that are visible for all nodes. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r600307342



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cacheobject/IgniteCacheObjectProcessor.java
##########
@@ -306,6 +307,15 @@ public void updateMetadata(int typeId, String typeName, @Nullable String affKeyF
      */
     public void saveMetadata(Collection<BinaryType> types, File dir);
 
+    /**
+     * Merge the binary metadata files stored in the specified directory.
+     *
+     * @param metadataDir Directory containing binary metadata files.
+     * @param stopChecker Prcoess interrupt checker.

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r615722367



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       Why we shouldn't check SQL and indexing?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595810072



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())

Review comment:
       added explicit `meta==null` processing 
   (`meta == null` means we should create "empty" context, `throw exception` means - don't create context at allt)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r607143843



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,832 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestoreRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** Future to be completed when the cache restore process is complete (this future will be returned to the user). */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /** Stopped flag. */
+    private volatile boolean stopped;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        try {
+            if (ctx.clientNode())
+                throw new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation.");
+
+            DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+            if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+                throw new IgniteException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster.");
+
+            if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+                throw new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation.");
+
+            if (ctx.cache().context().snapshotMgr().isSnapshotCreating())
+                throw new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress.");
+
+            synchronized (this) {
+                if (isRestoring() && fut == null)
+                    throw new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed.");
+
+                fut = new GridFutureAdapter<>();
+            }
+        } catch (IgniteException e) {
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+
+        ctx.cache().context().snapshotMgr().collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    finishProcess(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    finishProcess(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                ctx.cache().context().snapshotMgr().runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            finishProcess(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestoreRequest req = new SnapshotRestoreRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return isRestoring(null, null);
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(@Nullable String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        if (cacheName == null)
+            return true;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        String details = opCtx0 == null ? "" : " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']';
+
+        if (err != null)
+            log.error("Failed to restore snapshot cache group" + details, err);
+        else if (log.isInfoEnabled())
+            log.info("Successfully restored cache group(s) from the snapshot" + details);
+
+        opCtx = null;
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (fut0 != null) {
+                fut = null;
+
+                ctx.getSystemExecutorService().submit(() -> fut0.onDone(null, err));
+            }
+        }
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     */
+    public void stop() {
+        interrupt(new NodeStoppingException("Node is stopping."), true);

Review comment:
       The `cancel` flag should be used as an input parameter from the method above.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595822665



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return new GridFinishedFuture<>();
+
+        if (!reqId.equals(opCtx0.reqId)) {
+            return new GridFinishedFuture<>(
+                new IgniteCheckedException("Unknown snapshot restore operation was rejected."));
+        }
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active."));
+
+        Throwable err = opCtx0.err.get();
+
+        if (err != null)
+            return new GridFinishedFuture<>(err);
+
+        if (!allNodesInBaselineAndAlive(opCtx0.nodes))
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        GridFutureAdapter<Boolean> retFut = new GridFutureAdapter<>();
+
+        try {
+            Collection<StoredCacheData> ccfgs = opCtx0.cfgs.values();
+
+            // Ensure that shared cache groups has no conflicts before start caches.
+            for (StoredCacheData cfg : ccfgs) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting restored caches " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName +
+                    ", caches=" + F.viewReadOnly(ccfgs, c -> c.config().getName()) + ']');
+            }
+
+            ctx.cache().dynamicStartCachesByStoredConf(ccfgs, true, true, false, null, true, opCtx0.nodes).listen(
+                f -> {
+                    if (f.error() != null) {
+                        log.error("Unable to start restored caches [requestID=" + opCtx0.reqId +
+                            ", snapshot=" + opCtx0.snpName + ']', f.error());
+
+                        retFut.onDone(f.error());
+                    }
+                    else
+                        retFut.onDone(true);
+                }
+            );
+        } catch (IgniteCheckedException e) {
+            log.error("Unable to restore cache group(s) from snapshot " +
+                "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+
+        return retFut;
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishCacheStart(UUID reqId, Map<UUID, Boolean> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null || !reqId.equals(opCtx0.reqId))
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            finishProcess();
+
+            return;
+        }
+
+        if (U.isLocalNodeCoordinator(ctx.discovery()))
+            rollbackRestoreProc.start(reqId, new SnapshotRestoreRollbackRequest(reqId, failure));
+    }
+
+    /**
+     * Check the response for probable failures.
+     *
+     * @param errs Errors.
+     * @param opCtx Snapshot restore operation context.
+     * @param respNodes Set of responding topology nodes.
+     * @return Error, if any.
+     */
+    private Exception checkFailure(Map<UUID, Exception> errs, SnapshotRestoreContext opCtx, Set<UUID> respNodes) {
+        Exception err = F.first(errs.values());

Review comment:
       It seems we should log local failures locally, I think it's better to add additional logging on each exception catching, what do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r599295360



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestorePrepareRequest.java
##########
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Set;
+import java.util.UUID;
+import org.apache.ignite.internal.util.typedef.internal.S;
+
+/**
+ * Request to prepare cache group restore from the snapshot.
+ */
+public class SnapshotRestorePrepareRequest implements Serializable {
+    /** Serial version uid. */
+    private static final long serialVersionUID = 0L;
+
+    /** Request ID. */
+    private final UUID reqId;
+
+    /** Snapshot name. */
+    private final String snpName;
+
+    /** Baseline node IDs that must be alive to complete the operation. */
+    private final Set<UUID> nodes;
+
+    /** List of cache group names to restore from the snapshot. */
+    private final Collection<String> grps;
+
+    /** Node ID from which to update the binary metadata. */
+    private final UUID updateMetaNodeId;

Review comment:
       Coordinator may not have a local snapshot




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595810437



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r622749674



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreBaseTest.java
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.util.function.Function;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.internal.IgniteEx;
+
+/**
+ * Snapshot restore test base.
+ */
+public abstract class IgniteClusterSnapshotRestoreBaseTest extends AbstractSnapshotSelfTest {
+    /** Timeout. */
+    protected static final long TIMEOUT = 15_000;

Review comment:
       This is a **short** timeout to fail tests faster. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r622978525



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,916 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.nio.file.StandardCopyOption;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.ClusterSnapshotFuture;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Temporary cache directory prefix. */
+    public static final String TMP_CACHE_DIR_PREFIX = ".tmp.snp.restore.";

Review comment:
       dots replaced with underscore




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595818409



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);

Review comment:
       We reading and writing `cfgs` from different threads, so we should use ConcurrentMap instead of HashMap for this. But in this case, it is not rational (compared to a couple of volatile reads/writes of object reference). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595813920



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())

Review comment:
       Isn't cluster-wide metadata registering a long enough operation?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r596061837



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r596023149



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return new GridFinishedFuture<>();
+
+        if (!reqId.equals(opCtx0.reqId)) {
+            return new GridFinishedFuture<>(
+                new IgniteCheckedException("Unknown snapshot restore operation was rejected."));

Review comment:
       done, this check has been removed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r623197026



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java
##########
@@ -5126,6 +5126,12 @@ public void onNodeLeft(final ClusterNode node) {
 
                             if (crd0 == null)
                                 finishState = new FinishState(null, initialVersion(), null);
+
+                            if (dynamicCacheStartExchange() &&
+                                exchActions.cacheStartRequiredAliveNodes().contains(node.id())) {
+                                exchangeGlobalExceptions.put(cctx.localNodeId(), new ClusterTopologyCheckedException(

Review comment:
       added DynamicCacheStartFailsOnNodeLeftTest

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/distributed/dht/preloader/GridDhtPartitionsExchangeFuture.java
##########
@@ -5126,6 +5126,12 @@ public void onNodeLeft(final ClusterNode node) {
 
                             if (crd0 == null)
                                 finishState = new FinishState(null, initialVersion(), null);
+
+                            if (dynamicCacheStartExchange() &&
+                                exchActions.cacheStartRequiredAliveNodes().contains(node.id())) {
+                                exchangeGlobalExceptions.put(cctx.localNodeId(), new ClusterTopologyCheckedException(

Review comment:
       added ``DynamicCacheStartFailsOnNodeLeftTest``




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r617535387



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       I reviewed the test suite and came to the conclusion that some of the tests do not check keys at all, so it makes no sense to test them with SQL/indexing.
   So, I split the test suite into `IgniteClusterSnapshotRestoreWithIndexingTest` (SQL/indexing) and `IgniteClusterSnapshotRestoreSelfTest` (core)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r595824251



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,799 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<SnapshotRestoreRollbackRequest, SnapshotRestoreRollbackResponse> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new LinkedHashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                dataNodes.add(ctx.localNodeId());
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isSnapshotRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isCacheRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName == null) {
+                if (CU.cacheId(locGrpName) == cacheId)
+                    return true;
+            }
+            else {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new IgniteException(OP_REJECT_MSG +
+                "Server node(s) has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     * @throws IgniteCheckedException If cache is present.
+     */
+    private void ensureCacheAbsent(String name) throws IgniteCheckedException {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteCheckedException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!allNodesInBaselineAndAlive(req.nodes()))
+                throw new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster.");
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+        } catch (IgniteCheckedException e) {
+            return new GridFinishedFuture<>(e);
+        }
+
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0.dirs.isEmpty())
+            return new GridFinishedFuture<>();
+
+        if (log.isInfoEnabled()) {
+            log.info("Starting local snapshot restore operation [requestID=" + req.requestId() +
+                ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+        }
+
+        GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+        ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+            try {
+                restore(opCtx0, ctx.localNodeId().equals(req.updateMetaNodeId()));
+
+                Throwable err = opCtx0.err.get();
+
+                if (err == null) {
+                    retFut.onDone(new ArrayList<>(opCtx0.cfgs.values()));
+
+                    return;
+                }
+
+                log.error("Snapshot restore process has been interrupted " +
+                    "[requestID=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']', err);
+
+                rollback(opCtx0);
+
+                retFut.onDone(err);
+            }
+            catch (Throwable t) {
+                retFut.onDone(t);
+            }
+        });
+
+        return retFut;
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Snapshot restore operation context.
+     * @throws IgniteCheckedException If failed.
+     */
+    private SnapshotRestoreContext prepareContext(SnapshotRestorePrepareRequest req) throws IgniteCheckedException {
+        if (isSnapshotRestoring()) {
+            throw new IgniteCheckedException(OP_REJECT_MSG +
+                "The previous snapshot restore operation was not completed.");
+        }
+
+        GridCacheSharedContext<?, ?> cctx = ctx.cache().context();
+
+        SnapshotMetadata meta = F.first(cctx.snapshotMgr().readSnapshotMetadatas(req.snapshotName()));
+
+        if (meta != null && meta.consistentId().equals(cctx.localNode().consistentId().toString())
+            && meta.pageSize() != cctx.database().pageSize()) {
+            throw new IgniteCheckedException("Incompatible memory page size " +
+                "[snapshotPageSize=" + meta.pageSize() +
+                ", local=" + cctx.database().pageSize() +
+                ", snapshot=" + req.snapshotName() +
+                ", nodeId=" + cctx.localNodeId() + ']');
+        }
+
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+        List<File> cacheDirs = new ArrayList<>();
+        Map<String, StoredCacheData> cfgsByName = new HashMap<>();
+
+        // Collect cache configuration(s) and verify cache groups page size.
+        for (File snpCacheDir : cctx.snapshotMgr().snapshotCacheDirectories(req.snapshotName(), pdsFolderName)) {
+            String grpName = FilePageStoreManager.cacheGroupName(snpCacheDir);
+
+            if (!req.groups().contains(grpName))
+                continue;
+
+            ((FilePageStoreManager)cctx.pageStore()).readCacheConfigurations(snpCacheDir, cfgsByName);
+
+            File cacheDir = U.resolveWorkDirectory(ctx.config().getWorkDirectory(),
+                Paths.get(databaseRelativePath(pdsFolderName), snpCacheDir.getName()).toString(), false);
+
+            if (!cacheDir.exists())
+                cacheDir.mkdir();
+            else if (cacheDir.list().length > 0) {
+                throw new IgniteCheckedException("Unable to restore cache group, directory is not empty " +
+                    "[group=" + grpName + ", dir=" + cacheDir + ']');
+            }
+
+            cacheDirs.add(cacheDir);
+        }
+
+        Map<Integer, StoredCacheData> cfgsById = cfgsByName.isEmpty() ? Collections.emptyMap() :
+            cfgsByName.values().stream().collect(Collectors.toMap(v -> CU.cacheId(v.config().getName()), v -> v));
+
+        return new SnapshotRestoreContext(req.requestId(), req.snapshotName(), req.nodes(), cacheDirs, cfgsById);
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param opCtx Snapshot restore operation context.
+     * @param updateMeta Update binary metadata flag.
+     * @throws IgniteCheckedException If failed.
+     */
+    protected void restore(SnapshotRestoreContext opCtx, boolean updateMeta) throws IgniteCheckedException {
+        BooleanSupplier stopChecker = () -> opCtx.err.get() != null;
+        String pdsFolderName = ctx.pdsFolderResolver().resolveFolders().folderName();
+
+        if (updateMeta) {
+            File binDir = binaryWorkDir(
+                ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName).getAbsolutePath(), pdsFolderName);
+
+            if (stopChecker.getAsBoolean())
+                return;
+
+            // Check binary metadata compatibility.
+            ctx.cacheObjects().checkMetadata(binDir);
+
+            // Cluster-wide update binary metadata.
+            ctx.cacheObjects().updateMetadata(binDir, stopChecker);
+        }
+
+        for (File cacheDir : opCtx.dirs) {
+            File snpCacheDir = new File(ctx.cache().context().snapshotMgr().snapshotLocalDir(opCtx.snpName),
+                Paths.get(databaseRelativePath(pdsFolderName), cacheDir.getName()).toString());
+
+            try {
+                if (log.isInfoEnabled())
+                    log.info("Copying files of the cache group [from=" + snpCacheDir + ", to=" + cacheDir + ']');
+
+                for (File snpFile : snpCacheDir.listFiles()) {
+                    if (stopChecker.getAsBoolean())
+                        return;
+
+                    File target = new File(cacheDir, snpFile.getName());
+
+                    if (log.isDebugEnabled()) {
+                        log.debug("Copying file from the snapshot " +
+                            "[snapshot=" + opCtx.snpName +
+                            ", src=" + snpFile +
+                            ", target=" + target + "]");
+                    }
+
+                    Files.copy(snpFile.toPath(), target.toPath());
+                }
+            }
+            catch (IOException e) {
+                throw new IgniteCheckedException("Unable to copy file [snapshot=" + opCtx.snpName +
+                    ", grp=" + FilePageStoreManager.cacheGroupName(cacheDir) + ']', e);
+            }
+        }
+    }
+
+    /**
+     * Rollback changes made by process in specified cache group.
+     *
+     * @param opCtx Snapshot restore operation context.
+     */
+    private void rollback(@Nullable SnapshotRestoreContext opCtx) {
+        if (opCtx == null || F.isEmpty(opCtx.dirs))
+            return;
+
+        if (log.isInfoEnabled())
+            log.info("Performing local rollback routine for restored cache groups [requestID=" + opCtx.reqId + ']');
+
+        try {
+            for (File cacheDir : opCtx.dirs) {
+                if (!cacheDir.exists())
+                    continue;
+
+                if (log.isInfoEnabled())
+                    log.info("Cleaning up directory " + cacheDir);
+
+                U.delete(cacheDir);
+            }
+        }
+        catch (Exception e) {
+            log.error("Failed to perform rollback [requestID=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', e);
+        }
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @param res Results.
+     * @param errs Errors.
+     */
+    private void finishPrepare(UUID reqId, Map<UUID, ArrayList<StoredCacheData>> res, Map<UUID, Exception> errs) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (ctx.clientNode())
+            return;
+
+        Exception failure = checkFailure(errs, opCtx0, res.keySet());
+
+        if (failure == null) {
+            assert opCtx0 != null : ctx.localNodeId();
+
+            Map<Integer, StoredCacheData> globalCfgs = new HashMap<>();
+
+            for (List<StoredCacheData> storedCfgs : res.values()) {
+                if (storedCfgs == null)
+                    continue;
+
+                for (StoredCacheData cacheData : storedCfgs)
+                    globalCfgs.put(CU.cacheId(cacheData.config().getName()), cacheData);
+            }
+
+            opCtx0.cfgs = globalCfgs;
+
+            if (U.isLocalNodeCoordinator(ctx.discovery()))
+                cacheStartProc.start(reqId, reqId);
+
+            return;
+        }
+
+        if (opCtx0 == null)
+            finishProcess(failure);
+        else // Remove files asynchronously.
+            ctx.cache().context().snapshotMgr().snapshotExecutorService().execute(() -> {
+                rollback(opCtx0);
+
+                finishProcess(failure);
+            });
+    }
+
+    /**
+     * @param reqId Request ID.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<Boolean> cacheStart(UUID reqId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return new GridFinishedFuture<>();
+
+        if (!reqId.equals(opCtx0.reqId)) {
+            return new GridFinishedFuture<>(
+                new IgniteCheckedException("Unknown snapshot restore operation was rejected."));
+        }
+
+        if (!U.isLocalNodeCoordinator(ctx.discovery()))
+            return new GridFinishedFuture<>();
+
+        DiscoveryDataClusterState state = ctx.state().clusterState();
+
+        if (state.state() != ClusterState.ACTIVE || state.transition())
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active."));
+
+        Throwable err = opCtx0.err.get();
+
+        if (err != null)
+            return new GridFinishedFuture<>(err);
+
+        if (!allNodesInBaselineAndAlive(opCtx0.nodes))
+            return new GridFinishedFuture<>(new IgniteCheckedException(OP_REJECT_MSG + "Server node(s) has left the cluster."));
+
+        GridFutureAdapter<Boolean> retFut = new GridFutureAdapter<>();
+
+        try {
+            Collection<StoredCacheData> ccfgs = opCtx0.cfgs.values();
+
+            // Ensure that shared cache groups has no conflicts before start caches.
+            for (StoredCacheData cfg : ccfgs) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());

Review comment:
       Changed to `IgniteIllegalStateException`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] Mmuzaf commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
Mmuzaf commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r607093303



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,836 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestoreRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */

Review comment:
       Let's also describe that this is a future that will be returned to the user.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,836 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestoreRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /** Stopped flag. */
+    private volatile boolean stopped;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        try {
+            if (ctx.clientNode())
+                throw new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation.");
+
+            DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+            if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+                throw new IgniteException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster.");
+
+            if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+                throw new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation.");
+
+            if (ctx.cache().context().snapshotMgr().isSnapshotCreating())
+                throw new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress.");
+
+            synchronized (this) {
+                if (isRestoring())
+                    throw new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed.");
+
+                fut = new GridFutureAdapter<>();
+            }
+        } catch (IgniteException e) {
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+
+        ctx.cache().context().snapshotMgr().collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    finishProcess(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    finishProcess(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                ctx.cache().context().snapshotMgr().runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            finishProcess(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestoreRequest req = new SnapshotRestoreRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null || fut != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        String details = opCtx0 == null ? "" : " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']';
+
+        if (err != null)
+            log.error("Failed to restore snapshot cache group" + details, err);
+        else if (log.isInfoEnabled())
+            log.info("Successfully restored cache group(s) from the snapshot" + details);
+
+        opCtx = null;
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null) {
+            fut = null;
+
+            ctx.getSystemExecutorService().submit(() -> fut0.onDone(null, err));
+        }
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     */
+    public void stop() {
+        interrupt(new NodeStoppingException("Node is stopping."), true);
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     */
+    public void deactivate() {
+        interrupt(new IgniteCheckedException("The cluster has been deactivated."), false);
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     * @param stop Stop flag.
+     */
+    private void interrupt(Exception reason, boolean stop) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return;
+
+        opCtx0.err.compareAndSet(null, reason);
+
+        IgniteInternalFuture<?> stopFut;
+
+        synchronized (this) {
+            stopFut = opCtx0.stopFut;
+
+            if (stop)
+                stopped = true;
+        }
+
+        if (stopFut == null || stopFut.isDone())
+            return;
+
+        try {
+            stopFut.get();
+        }
+        catch (IgniteCheckedException ignore) {
+            // No-op.
+        }
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestoreRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (ctx.cache().context().snapshotMgr().isSnapshotCreating())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "A cluster snapshot operation is in progress.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());

Review comment:
       Let's in-line the `updateMeta` variable.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,836 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more

Review comment:
       Let's fix the header and remove unnecessary `//`.

##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,836 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.NodeStoppingException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestoreRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /** Stopped flag. */
+    private volatile boolean stopped;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        try {
+            if (ctx.clientNode())
+                throw new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation.");
+
+            DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+            if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+                throw new IgniteException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (!clusterState.hasBaselineTopology())
+                throw new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster.");
+
+            if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP))
+                throw new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation.");
+
+            if (ctx.cache().context().snapshotMgr().isSnapshotCreating())
+                throw new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress.");
+
+            synchronized (this) {
+                if (isRestoring())
+                    throw new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed.");
+
+                fut = new GridFutureAdapter<>();
+            }
+        } catch (IgniteException e) {
+            return new IgniteFinishedFutureImpl<>(e);
+        }
+
+        ctx.cache().context().snapshotMgr().collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    finishProcess(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    finishProcess(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                ctx.cache().context().snapshotMgr().runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            finishProcess(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestoreRequest req = new SnapshotRestoreRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null || fut != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        String details = opCtx0 == null ? "" : " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']';
+
+        if (err != null)
+            log.error("Failed to restore snapshot cache group" + details, err);
+        else if (log.isInfoEnabled())
+            log.info("Successfully restored cache group(s) from the snapshot" + details);
+
+        opCtx = null;
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null) {
+            fut = null;
+
+            ctx.getSystemExecutorService().submit(() -> fut0.onDone(null, err));
+        }
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     */
+    public void stop() {
+        interrupt(new NodeStoppingException("Node is stopping."), true);
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     */
+    public void deactivate() {
+        interrupt(new IgniteCheckedException("The cluster has been deactivated."), false);
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     * @param stop Stop flag.
+     */
+    private void interrupt(Exception reason, boolean stop) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return;
+
+        opCtx0.err.compareAndSet(null, reason);
+
+        IgniteInternalFuture<?> stopFut;
+
+        synchronized (this) {
+            stopFut = opCtx0.stopFut;
+
+            if (stop)
+                stopped = true;
+        }
+
+        if (stopFut == null || stopFut.isDone())
+            return;
+
+        try {
+            stopFut.get();
+        }
+        catch (IgniteCheckedException ignore) {
+            // No-op.
+        }
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestoreRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            if (ctx.cache().context().snapshotMgr().isSnapshotCreating())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "A cluster snapshot operation is in progress.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            synchronized (this) {
+                if (stopped || ctx.isStopping())
+                    throw new NodeStoppingException("Node is stopping.");
+
+                opCtx0.stopFut = retFut.chain(f -> null);

Review comment:
       We can wrap this call with `IgniteFutureImpl` and thus we wouldn't need to use the try-catch on future `get`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r615718065



##########
File path: modules/core/src/test/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/IgniteClusterSnapshotRestoreSelfTest.java
##########
@@ -0,0 +1,774 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.Objects;
+import java.util.UUID;
+import java.util.function.Function;
+import org.apache.ignite.Ignite;
+import org.apache.ignite.IgniteCache;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteSnapshot;
+import org.apache.ignite.binary.BinaryObject;
+import org.apache.ignite.binary.BinaryObjectBuilder;
+import org.apache.ignite.binary.BinaryObjectException;
+import org.apache.ignite.binary.BinaryType;
+import org.apache.ignite.cache.CacheExistsException;
+import org.apache.ignite.cache.QueryEntity;
+import org.apache.ignite.cache.QueryIndex;
+import org.apache.ignite.cache.query.annotations.QuerySqlField;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.configuration.CacheConfiguration;
+import org.apache.ignite.configuration.IgniteConfiguration;
+import org.apache.ignite.internal.IgniteEx;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.TestRecordingCommunicationSpi;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.CacheGroupDescriptor;
+import org.apache.ignite.internal.processors.cache.DynamicCacheChangeBatch;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType;
+import org.apache.ignite.internal.util.distributed.SingleNodeMessage;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.lang.IgniteFuture;
+import org.apache.ignite.spi.IgniteSpiException;
+import org.apache.ignite.testframework.GridTestUtils;
+import org.jetbrains.annotations.Nullable;
+import org.junit.Test;
+
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+import static org.apache.ignite.testframework.GridTestUtils.runAsync;
+
+/**
+ * Snapshot restore tests.
+ */
+public class IgniteClusterSnapshotRestoreSelfTest extends AbstractSnapshotSelfTest {

Review comment:
       This is exactly what `checkClusterStateChange` is doing:
   `testClusterDeactivateOnPrepare`
   `testClusterDeactivateOnCacheStart`
   and also read-only mode
   `testClusterStateChangeActiveReadonlyOnPrepare`
   `testClusterStateChangeActiveReadonlyOnCacheStart`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ignite] xtern commented on a change in pull request #8648: IGNITE-13805

Posted by GitBox <gi...@apache.org>.
xtern commented on a change in pull request #8648:
URL: https://github.com/apache/ignite/pull/8648#discussion_r600306937



##########
File path: modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/snapshot/SnapshotRestoreProcess.java
##########
@@ -0,0 +1,777 @@
+///*
+// * Licensed to the Apache Software Foundation (ASF) under one or more
+// * contributor license agreements.  See the NOTICE file distributed with
+// * this work for additional information regarding copyright ownership.
+// * The ASF licenses this file to You under the Apache License, Version 2.0
+// * (the "License"); you may not use this file except in compliance with
+// * the License.  You may obtain a copy of the License at
+// *
+// *      http://www.apache.org/licenses/LICENSE-2.0
+// *
+// * Unless required by applicable law or agreed to in writing, software
+// * distributed under the License is distributed on an "AS IS" BASIS,
+// * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// * See the License for the specific language governing permissions and
+// * limitations under the License.
+// */
+
+package org.apache.ignite.internal.processors.cache.persistence.snapshot;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.UUID;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.RejectedExecutionException;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.function.BooleanSupplier;
+import java.util.function.Consumer;
+import java.util.stream.Collectors;
+import org.apache.ignite.IgniteCheckedException;
+import org.apache.ignite.IgniteException;
+import org.apache.ignite.IgniteIllegalStateException;
+import org.apache.ignite.IgniteLogger;
+import org.apache.ignite.cluster.ClusterNode;
+import org.apache.ignite.cluster.ClusterState;
+import org.apache.ignite.internal.GridKernalContext;
+import org.apache.ignite.internal.IgniteFeatures;
+import org.apache.ignite.internal.IgniteInternalFuture;
+import org.apache.ignite.internal.IgniteInterruptedCheckedException;
+import org.apache.ignite.internal.cluster.ClusterTopologyCheckedException;
+import org.apache.ignite.internal.processors.cache.GridCacheSharedContext;
+import org.apache.ignite.internal.processors.cache.StoredCacheData;
+import org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager;
+import org.apache.ignite.internal.processors.cache.verify.IdleVerifyResultV2;
+import org.apache.ignite.internal.processors.cluster.DiscoveryDataClusterState;
+import org.apache.ignite.internal.util.distributed.DistributedProcess;
+import org.apache.ignite.internal.util.future.GridFinishedFuture;
+import org.apache.ignite.internal.util.future.GridFutureAdapter;
+import org.apache.ignite.internal.util.future.IgniteFinishedFutureImpl;
+import org.apache.ignite.internal.util.future.IgniteFutureImpl;
+import org.apache.ignite.internal.util.typedef.F;
+import org.apache.ignite.internal.util.typedef.internal.CU;
+import org.apache.ignite.internal.util.typedef.internal.U;
+import org.apache.ignite.lang.IgniteFuture;
+import org.jetbrains.annotations.Nullable;
+
+import static org.apache.ignite.internal.IgniteFeatures.SNAPSHOT_RESTORE_CACHE_GROUP;
+import static org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.binaryWorkDir;
+import static org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.CACHE_GRP_DIR_PREFIX;
+import static org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteSnapshotManager.databaseRelativePath;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK;
+import static org.apache.ignite.internal.util.distributed.DistributedProcess.DistributedProcessType.RESTORE_CACHE_GROUP_SNAPSHOT_START;
+
+/**
+ * Distributed process to restore cache group from the snapshot.
+ */
+public class SnapshotRestoreProcess {
+    /** Reject operation message. */
+    private static final String OP_REJECT_MSG = "Cache group restore operation was rejected. ";
+
+    /** Kernal context. */
+    private final GridKernalContext ctx;
+
+    /** Cache group restore prepare phase. */
+    private final DistributedProcess<SnapshotRestorePrepareRequest, ArrayList<StoredCacheData>> prepareRestoreProc;
+
+    /** Cache group restore cache start phase. */
+    private final DistributedProcess<UUID, Boolean> cacheStartProc;
+
+    /** Cache group restore rollback phase. */
+    private final DistributedProcess<UUID, Boolean> rollbackRestoreProc;
+
+    /** Logger. */
+    private final IgniteLogger log;
+
+    /** The future to be completed when the cache restore process is complete. */
+    private volatile GridFutureAdapter<Void> fut;
+
+    /** Snapshot restore operation context. */
+    private volatile SnapshotRestoreContext opCtx;
+
+    /**
+     * @param ctx Kernal context.
+     */
+    public SnapshotRestoreProcess(GridKernalContext ctx) {
+        this.ctx = ctx;
+
+        log = ctx.log(getClass());
+
+        prepareRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_PREPARE, this::prepare, this::finishPrepare);
+
+        cacheStartProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_START, this::cacheStart, this::finishCacheStart);
+
+        rollbackRestoreProc = new DistributedProcess<>(
+            ctx, RESTORE_CACHE_GROUP_SNAPSHOT_ROLLBACK, this::rollback, this::finishRollback);
+    }
+
+    /**
+     * Start cache group restore operation.
+     *
+     * @param snpName Snapshot name.
+     * @param cacheGrpNames Name of the cache groups for restore.
+     * @return Future that will be completed when the restore operation is complete and the cache groups are started.
+     */
+    public IgniteFuture<Void> start(String snpName, Collection<String> cacheGrpNames) {
+        if (ctx.clientNode()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Client and daemon nodes can not perform this operation."));
+        }
+
+        synchronized (this) {
+            GridFutureAdapter<Void> fut0 = fut;
+
+            if (opCtx != null || (fut0 != null && !fut0.isDone())) {
+                return new IgniteFinishedFutureImpl<>(
+                    new IgniteException(OP_REJECT_MSG + "The previous snapshot restore operation was not completed."));
+            }
+
+            fut = new GridFutureAdapter<>();
+        }
+
+        DiscoveryDataClusterState clusterState = ctx.state().clusterState();
+
+        if (clusterState.state() != ClusterState.ACTIVE || clusterState.transition())
+            return new IgniteFinishedFutureImpl<>(new IgniteException(OP_REJECT_MSG + "The cluster should be active."));
+
+        if (!clusterState.hasBaselineTopology()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "The baseline topology is not configured for cluster."));
+        }
+
+        IgniteSnapshotManager snpMgr = ctx.cache().context().snapshotMgr();
+
+        if (snpMgr.isSnapshotCreating()) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "A cluster snapshot operation is in progress."));
+        }
+
+        if (!IgniteFeatures.allNodesSupports(ctx.grid().cluster().nodes(), SNAPSHOT_RESTORE_CACHE_GROUP)) {
+            return new IgniteFinishedFutureImpl<>(
+                new IgniteException(OP_REJECT_MSG + "Not all nodes in the cluster support restore operation."));
+        }
+
+        snpMgr.collectSnapshotMetadata(snpName).listen(
+            f -> {
+                if (f.error() != null) {
+                    fut.onDone(f.error());
+
+                    return;
+                }
+
+                Set<UUID> dataNodes = new HashSet<>();
+                Map<ClusterNode, List<SnapshotMetadata>> metas = f.result();
+                Map<Integer, String> reqGrpIds = cacheGrpNames.stream().collect(Collectors.toMap(CU::cacheId, v -> v));
+
+                for (Map.Entry<ClusterNode, List<SnapshotMetadata>> entry : metas.entrySet()) {
+                    SnapshotMetadata meta = F.first(entry.getValue());
+
+                    assert meta != null : entry.getKey().id();
+
+                    if (!entry.getKey().consistentId().equals(meta.consistentId()))
+                        continue;
+
+                    dataNodes.add(entry.getKey().id());
+
+                    reqGrpIds.keySet().removeAll(meta.partitions().keySet());
+                }
+
+                if (!reqGrpIds.isEmpty()) {
+                    fut.onDone(new IllegalArgumentException(OP_REJECT_MSG + "Cache group(s) was not found in the " +
+                        "snapshot [groups=" + reqGrpIds.values() + ", snapshot=" + snpName + ']'));
+
+                    return;
+                }
+
+                snpMgr.runSnapshotVerfification(metas).listen(
+                    f0 -> {
+                        if (f0.error() != null) {
+                            fut.onDone(f0.error());
+
+                            return;
+                        }
+
+                        IdleVerifyResultV2 res = f0.result();
+
+                        if (!F.isEmpty(res.exceptions()) || res.hasConflicts()) {
+                            StringBuilder sb = new StringBuilder();
+
+                            res.print(sb::append, true);
+
+                            fut.onDone(new IgniteException(sb.toString()));
+
+                            return;
+                        }
+
+                        SnapshotRestorePrepareRequest req = new SnapshotRestorePrepareRequest(UUID.randomUUID(),
+                            snpName, dataNodes, cacheGrpNames, F.first(dataNodes));
+
+                        prepareRestoreProc.start(req.requestId(), req);
+                    }
+                );
+            }
+        );
+
+        return new IgniteFutureImpl<>(fut);
+    }
+
+    /**
+     * Check if snapshot restore process is currently running.
+     *
+     * @return {@code True} if the snapshot restore operation is in progress.
+     */
+    public boolean isRestoring() {
+        return opCtx != null;
+    }
+
+    /**
+     * Check if the cache or group with the specified name is currently being restored from the snapshot.
+     *
+     * @param cacheName Cache name.
+     * @param grpName Cache group name.
+     * @return {@code True} if the cache or group with the specified name is currently being restored.
+     */
+    public boolean isRestoring(String cacheName, @Nullable String grpName) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 == null)
+            return false;
+
+        Map<Integer, StoredCacheData> cacheCfgs = opCtx0.cfgs;
+
+        int cacheId = CU.cacheId(cacheName);
+
+        if (cacheCfgs.containsKey(cacheId))
+            return true;
+
+        for (File grpDir : opCtx0.dirs) {
+            String locGrpName = FilePageStoreManager.cacheGroupName(grpDir);
+
+            if (grpName != null) {
+                if (cacheName.equals(locGrpName))
+                    return true;
+
+                if (CU.cacheId(locGrpName) == CU.cacheId(grpName))
+                    return true;
+            }
+            else if (CU.cacheId(locGrpName) == cacheId)
+                return true;
+        }
+
+        return false;
+    }
+
+    /**
+     * Finish local cache group restore process.
+     */
+    private void finishProcess() {
+        finishProcess(null);
+    }
+
+    /**
+     * Finish local cache group restore process.
+     *
+     * @param err Error, if any.
+     */
+    private void finishProcess(@Nullable Throwable err) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (err != null) {
+            log.error("Failed to restore snapshot cache group" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'), err);
+        }
+        else if (log.isInfoEnabled()) {
+            log.info("Successfully restored cache group(s) from the snapshot" + (opCtx0 == null ? "" :
+                " [reqId=" + opCtx0.reqId + ", snapshot=" + opCtx0.snpName + ']'));
+        }
+
+        GridFutureAdapter<Void> fut0 = fut;
+
+        if (fut0 != null)
+            fut0.onDone(null, err);
+
+        opCtx = null;
+    }
+
+    /**
+     * Node left callback.
+     *
+     * @param leftNodeId Left node ID.
+     */
+    public void onNodeLeft(UUID leftNodeId) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null && opCtx0.nodes.contains(leftNodeId)) {
+            opCtx0.err.compareAndSet(null, new ClusterTopologyCheckedException(OP_REJECT_MSG +
+                "Required node has left the cluster [nodeId=" + leftNodeId + ']'));
+        }
+    }
+
+    /**
+     * Abort the currently running restore procedure (if any).
+     *
+     * @param reason Interruption reason.
+     */
+    public void stop(Exception reason) {
+        SnapshotRestoreContext opCtx0 = opCtx;
+
+        if (opCtx0 != null)
+            opCtx0.err.compareAndSet(null, reason);
+    }
+
+    /**
+     * Ensures that a cache with the specified name does not exist locally.
+     *
+     * @param name Cache name.
+     */
+    private void ensureCacheAbsent(String name) {
+        int id = CU.cacheId(name);
+
+        if (ctx.cache().cacheGroupDescriptors().containsKey(id) || ctx.cache().cacheDescriptor(id) != null) {
+            throw new IgniteIllegalStateException("Cache \"" + name +
+                "\" should be destroyed manually before perform restore operation.");
+        }
+    }
+
+    /**
+     * @param req Request to prepare cache group restore from the snapshot.
+     * @return Result future.
+     */
+    private IgniteInternalFuture<ArrayList<StoredCacheData>> prepare(SnapshotRestorePrepareRequest req) {
+        if (ctx.clientNode())
+            return new GridFinishedFuture<>();
+
+        try {
+            DiscoveryDataClusterState state = ctx.state().clusterState();
+
+            if (state.state() != ClusterState.ACTIVE || state.transition())
+                throw new IgniteCheckedException(OP_REJECT_MSG + "The cluster should be active.");
+
+            for (UUID nodeId : req.nodes()) {
+                ClusterNode node = ctx.discovery().node(nodeId);
+
+                if (node == null || !CU.baselineNode(node, state) || !ctx.discovery().alive(node)) {
+                    throw new IgniteCheckedException(
+                        OP_REJECT_MSG + "Required node has left the cluster [nodeId-" + nodeId + ']');
+                }
+            }
+
+            for (String grpName : req.groups())
+                ensureCacheAbsent(grpName);
+
+            opCtx = prepareContext(req);
+
+            SnapshotRestoreContext opCtx0 = opCtx;
+
+            if (opCtx0.dirs.isEmpty())
+                return new GridFinishedFuture<>();
+
+            // Ensure that shared cache groups has no conflicts.
+            for (StoredCacheData cfg : opCtx0.cfgs.values()) {
+                if (!F.isEmpty(cfg.config().getGroupName()))
+                    ensureCacheAbsent(cfg.config().getName());
+            }
+
+            if (log.isInfoEnabled()) {
+                log.info("Starting local snapshot restore operation [reqId=" + req.requestId() +
+                    ", snapshot=" + req.snapshotName() + ", group(s)=" + req.groups() + ']');
+            }
+
+            boolean updateMeta = ctx.localNodeId().equals(req.updateMetaNodeId());
+            Consumer<Exception> errHnd = (ex) -> opCtx.err.compareAndSet(null, ex);
+            BooleanSupplier stopChecker = () -> {
+                if (opCtx.err.get() != null)
+                    return true;
+
+                if (Thread.currentThread().isInterrupted()) {
+                    errHnd.accept(new IgniteInterruptedCheckedException("Thread has been interrupted."));
+
+                    return true;
+                }
+
+                return false;
+            };
+
+            GridFutureAdapter<ArrayList<StoredCacheData>> retFut = new GridFutureAdapter<>();
+
+            restoreAsync(opCtx0.snpName, opCtx0.dirs, updateMeta, stopChecker, errHnd).thenAccept(res -> {
+                Throwable err = opCtx.err.get();
+
+                if (err != null) {
+                    log.error("Unable to restore cache group(s) from the snapshot " +
+                        "[reqId=" + opCtx.reqId + ", snapshot=" + opCtx.snpName + ']', err);
+
+                    retFut.onDone(err);
+                } else
+                    retFut.onDone(new ArrayList<>(opCtx.cfgs.values()));
+            });
+
+            return retFut;
+        } catch (IgniteIllegalStateException | IgniteCheckedException | RejectedExecutionException e) {
+            log.error("Unable to restore cache group(s) from the snapshot " +
+                "[reqId=" + req.requestId() + ", snapshot=" + req.snapshotName() + ']', e);
+
+            return new GridFinishedFuture<>(e);
+        }
+    }
+
+    /**
+     * Copy partition files and update binary metadata.
+     *
+     * @param snpName Snapshot name.
+     * @param dirs Cache directories to restore from the snapshot.
+     * @param updateMeta Update binary metadata flag.
+     * @param stopChecker Prcoess interrupt checker.

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org